tools/uni2characterwidth/overrides.txt · cfff2326f98e1077c519bbb8d18c80262d1e2d1b · Utilities / Konsole

Make non-initial Korean Hangul Jamo width 0 · cfff2326
Luis Javier Merino authored Dec 22, 2021 and
Tomaz Canabrava committed Dec 28, 2021


Korean Hangul can be represented in Unicode either as precomposed Hangul
syllables, or as sequences of alphabetic components called Jamo.

Syllables should occupy 2 cells (there are halfwidth variants at
U+FFA0..U+FFDF).  A fully decomposed syllable consists of an initial
jamo (choseong - leading consonant - may be a filler U+115F), a medial
jamo (jungseong - vowel - may be a filler U+1160), and an optional final
jamo (jongseong - trailing consonant).  Old Korean can have more than
one of each of those.  In any case, to make the total width 2, we assign
width 2 to choseong, and 0 to jungseong and jongseong, which, absent a
context-aware wcswidth, will still break with Old Korean syllables with
more than one jamo for leading consonants.

This aligns with glibc:

commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76
Author: Thorsten Glaser <tg@mirbsd.de>
Date:   Fri Jul 14 14:02:50 2017 +0200

    Refresh generated charmap data and ChangeLog

            [BZ #21750]
            * charmaps/UTF-8: Refresh.

diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 04ef5ad071..9e05b4a652 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,17 @@
+2017-07-14  Thorsten Glaser  <tg@mirbsd.de>
+
+       [BZ #21750]
+       * charmaps/UTF-8: Refresh.
+       * unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
+       * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
+       * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
+       * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
+       * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining.
+       [BZ #19852]
+       * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
+       UnicodeData lines so the latter have precedence; remove hack
+       to group output by EastAsianWidth ranges.
+

[ ... snip ...]

commit 6e540caa21616d5ec5511fafb22819204525138e
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Tue Jun 16 08:29:40 2020 +0200

    Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
 <UABE8>        0
 <UABED>        0
 <UAC00>...<UD7A3>      2
+<UD7B0>...<UD7C6>      0
+<UD7CB>...<UD7FB>      0
 <UF900>...<UFA6D>      2
 <UFA70>...<UFAD9>      2
 <UFB1E>        0
cfff2326