villaib.blogg.se - Emoji codepoints

#Emoji codepoints software#

"These should be decomposed into sequences of two or three characters, each three bytes long, and then you need a special algorithm to combine them into a square block." -> This pretty much means the software must be developed with Korean users in mind (or someone must heroically go through every part of the code dealing with displaying text), otherwise we might as well assume that it's English-only. Each three-byte sequence (assuming UTF-8) corresponds to a square-shaped character." -> Easy for everyone to understand, and less chance of screwup (as long as the software supports any Unicode at all). "These are characters from a country you've never been to. I believe that was a necessary compromise to use Hangul on any software not authored by Koreans.

I am annoyed that the unicode spec introduced more complexity into their algorithms to support Unicode, but this is because they could have achieved mostly the same task by not introducing emoji-specific complexity and reusing features that existing scripts already have and have already been accounted for. Which you sometimes see in modern text, actually.Īll Indic scripts (well, all scripts derived from Brahmi, so this includes many scripts from Southeast Asia as well like Thai) would have trouble doing the NFC thing. Yes, we actually already encode all possible modern hangul syllable blocks in NFC form as well, but this ignores characters with double choseongs or double jungseongs that can be found in older text. Please stop spreading this bit of misinformation.

For combining characters in written languages, you can do an NFC normalization and, with moderate success, get a 1 codepoint = 1 grapheme mapping, but "Emoji 2.0" introduced some ridiculous emoji compositions with the ZWJ character. The thing that frustrates me the most about Unicode emoji is the astounding number of combining characters.