HTML - Charset

HTML Charset

The chapter demonstrates about the HTML Charset which are defined as the set of abstract characters and the corresponding set of integers reference to the character. Following are the concepts covered.

  • Charset Encoding

  • Document Charset Set

  • Charset Entities

American Standard Code for Information Interchange (ASCII) is the first Character set which defines nearly 127 alphanumeric characters that can be used on the internet, those are like numbers from (0-9), letters (A_Z), special characters such as !, @, $, <, >, (,). American National Standards Institute (ANSI) is the original windows character set which supports 256 character codes these all characters are covered in UTF-8.

Char Set attributes are defined in the <meta> tags. The snippet code below demonstrates the char set attribute for the HTML


<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

Description − Document Character set is known as the set of abstract characters and corresponding set of integer reference to those characters. Documents are considered in the sequence of the character set in Standard Generalized Markup Language (SGML).

HTML document character set is the Universal Character Set(UCS) of ISO10646.

The current specifications refers to ISO/IEC-10646. Character encoding represents the some subsets of the document character set which are encoding as ISO-8859-1, ISO-8859-5, SHIFT_JIS, euc-jp save bandwidth by representing only part of the document set.

Description − SGML Character encoding allows software or hardware to refer all unicode characters through a simple mechanisms for specifying any charterer form the character set which are listed below.

  • Numeric character references

  • Named character references

Numeric character references

Numeric Character reference is used to specify the integer reference of a unicode characters the syntax of these reference is shown below.


The above line is used to unlock the Decimal character number D.

Named character references

HTML offers a set of named character entities which replace the integer references with symbolic names. The name entity &aring; refers to the same unicode as &#229 and there is no name for the Cyrillic capital letters ā€œIā€.