Unicode Features




Unicode Features

Today, unicode is a universally accepted character-encoding standard because:-

  • It provides a consistent way of encoding multilingual plain text. This enables data transfer through different systems without the risk of corruption.

  • It defines codes for characters used in all major languages of the world used for written communication. This enables a single software product to target multiple platforms, language, and countries without re-engineering.

  • It also defines codes for special characters (such as various types punctuation marks), mathematical symbols, technical symbols, and diacritics. Diacritics are modifying character marks such as tidle (~), that are used in conjunction with base characters to represent accented leters (indicating different sound - for example,

  • It has the capacity to encode as many as a million characters. This is large enough for encoding all known characters including all historic scripts of the world as well as common notational systems.

  • It assigns each character a unique numeric value and name keeping character coding simple and efficient.

  • It reserves a part of the code space for private use to enable users to assign codes for their own characters and symbols.

  • It affords simplicity and consistency of ASCII. Unicode characters that correspond to the familiar ASCII character set have the same byte values as that of ASCII. This enables use of Unicode in a convenient and backward compatible manner in environments designed entirely around ASCII, like UNIX. Hence, Unicode is usable with existing software without extensive software rewrites.

  • It specifies an algorithm for presentation of text with bi-directional behavior., For example, it can deal with a text containing a mixture of English (which uses left-to right scripts) and Arabic (which uses right-to-left scripts). For this, it includes special characters to specify changes direction when scripts of different directions are mixed. For all scripts, Unicode stores a text in logical order within the memory representation corresponding to the order of typing on the keyboard.

As mentioned earlier, Unicode has a lot of room to accommodate new characters. Moreover, its growth process is strictly additive in the sense that new characters can be added easily but existing characters cannot be removed. This feature ensures that interpretation of data once encoded in Unicode standard will remain in the same way by all future implementations that conform to original or versions of the Unicode standard.