Skip to content
  • Luis Javier Merino's avatar
    Don't strip 0-width Other_Format characters · c26693af
    Luis Javier Merino authored and Tomaz  Canabrava's avatar Tomaz Canabrava committed
    These include ZWJ (Zero Width Joiner), ZWNJ (Zero Width Non-Joiner) and
    Zero Width Space, which can be used to change the rendering of text,
    e.g. forcing or preventing the formation of conjunct forms in Indic
    scripts.
    
    Treat them as combining characters, so they end up in an extended
    character in the previous character cell.
    
    To test, the output of:
    
    printf "[\u915\u94d\u927]  "[\u915\u94d\u200c\u937]  [\u915\u94d\u200d\u937]\n"
    
    can be compared against the examples in Figures 12.4 and 12.5 of the
    Unicode standard, from the "Explicit Virama (Halant)" and "Explicit
    Half-Consonants" sub-sections of the Devanagari section on "South and
    Central Asia I" chapter (page 465 in Unicode 14).
    ~
    c26693af