Skip to content
  • Luis Javier Merino's avatar
    Make non-initial Korean Hangul Jamo width 0 · cfff2326
    Luis Javier Merino authored and Tomaz  Canabrava's avatar Tomaz Canabrava committed
    
    
    Korean Hangul can be represented in Unicode either as precomposed Hangul
    syllables, or as sequences of alphabetic components called Jamo.
    
    Syllables should occupy 2 cells (there are halfwidth variants at
    U+FFA0..U+FFDF).  A fully decomposed syllable consists of an initial
    jamo (choseong - leading consonant - may be a filler U+115F), a medial
    jamo (jungseong - vowel - may be a filler U+1160), and an optional final
    jamo (jongseong - trailing consonant).  Old Korean can have more than
    one of each of those.  In any case, to make the total width 2, we assign
    width 2 to choseong, and 0 to jungseong and jongseong, which, absent a
    context-aware wcswidth, will still break with Old Korean syllables with
    more than one jamo for leading consonants.
    
    This aligns with glibc:
    
    commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76
    Author: Thorsten Glaser <tg@mirbsd.de>
    Date:   Fri Jul 14 14:02:50 2017 +0200
    
        Refresh generated charmap data and ChangeLog
    
                [BZ #21750]
                * charmaps/UTF-8: Refresh.
    
    diff --git a/localedata/ChangeLog b/localedata/ChangeLog
    index 04ef5ad071..9e05b4a652 100644
    --- a/localedata/ChangeLog
    +++ b/localedata/ChangeLog
    @@ -1,3 +1,17 @@
    +2017-07-14  Thorsten Glaser  <tg@mirbsd.de>
    +
    +       [BZ #21750]
    +       * charmaps/UTF-8: Refresh.
    +       * unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
    +       * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
    +       * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
    +       * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
    +       * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining.
    +       [BZ #19852]
    +       * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
    +       UnicodeData lines so the latter have precedence; remove hack
    +       to group output by EastAsianWidth ranges.
    +
    
    [ ... snip ...]
    
    commit 6e540caa21616d5ec5511fafb22819204525138e
    Author: Mike FABIAN <mfabian@redhat.com>
    Date:   Tue Jun 16 08:29:40 2020 +0200
    
        Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
    
    Reviewed-by: default avatarCarlos O'Donell <carlos@redhat.com>
    
    diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
    index 14c5d4fa33..8cce47cd97 100644
    --- a/localedata/charmaps/UTF-8
    +++ b/localedata/charmaps/UTF-8
    @@ -48920,6 +48920,8 @@ WIDTH
     <UABE8>        0
     <UABED>        0
     <UAC00>...<UD7A3>      2
    +<UD7B0>...<UD7C6>      0
    +<UD7CB>...<UD7FB>      0
     <UF900>...<UFA6D>      2
     <UFA70>...<UFAD9>      2
     <UFB1E>        0
    cfff2326