12/8/2022 0 Comments Using codepoints javascriptMozilla has some suggestions how to work around this problem which surfaces especially in regular expressions. If you want the full Unicode support, it is more than a bit messy. So everything is fine as long as you stick to the Basic Multilingual Plane ( BMP), ie the first hex ffff characters of Unicode which can be expressed in 16bit. The same problem exists for the Unicode escape sequence described as “\u plus four hexadecimal digits” (chapter 6), enough for a 16 bit character but not the full Unicode range. So it does not work with characters of more than 16 bits. It goes an with: “However, ECMAScript does not place any restrictions or requirements on the values except that they must be 16-bit unsigned integers.” Later the spec describes the charCodeAt(position) method of the String object as returning an integer between 0 and 2^16-1 (chapter 15.5.4.5). Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text.” Note that is says “16-bit unit” not “16-bit character” and note the “usually” which is unusually vague for a spec. NOTE: A String value is a member of the String type. This is what the ECMA-262 language specifications (5th ed) (which is the basis for Javascript) has to say about this: “4.3.16 String value: primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integer. Most of the Javascript documentation I could find on the web ignore this issue, so I dug a bit deeper. 15 years later it still doesn’t properly support the full Unicode character set. Unfortunately Javascript (invented in 1995) never got that message. At some point it was decided that that wasn’t enough and the version 2.0 released in 1996 switched to the larger character set. Unicode started out with only 16 bit characters, or about 65000 code points. The Unicode character set contains somewhat over one million code points from 0 to hex 10ffff.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |