Skip to content
  • Mariusz Glebocki's avatar
    Add a tool for generating character width tables · 5f32cb3c
    Mariusz Glebocki authored and Kurt Hindenburg's avatar Kurt Hindenburg committed
    Summary:
    The uni2characterwidth tool, converts Unicode Character Database files
    into character width lookup tables. It uses a template file to place
    the tables in a source code file together with a function for finding
    the width for specified character. It also allows to generate few forms
    of lists with width data for debug and test purposes, or for future use
    as a replacement of Unicode files.
    
    Set `KONSOLE_BUILD_UNI2CHARACTERWIDTH` cmake flag to build the tool.
    Use `--help` argument for more detailed usage.
    
    There is a possibility to generate separate "width" for Ambiguous
    characters. It can be used to add ability to configure the characters
    width in Konsole settings.
    
    The `example.template` file contains all possible named tags, and some
    additional tags to show how to use them.
    
    CCBUG: 396435
    
    Depends on D15756
    
    Test Plan:
    Download files listed below from `11.0.0` and `emoji/11.0` directories
    on `https://unicode.org/Public/`. You can also directly use URLs to the
    files.
    
    * UnicodeData.txt
    * EastAsianWidth.txt
    * emoji-data.txt
    
    Generate any available list except compact-ranges (e.g. `details`):
    
    ```
    uni2characterwidth \
        -U UnicodeData.txt  -A EastAsianWidth.txt  -E emoji-data.txt \
        -g details  result.txt
    ```
    
    The list should contain ranges for all possible widths
    (-2, -1, 0, 1, 2). You can choose some characters with a width you know
    and check how they were classified. -2 is a special non-standard width
    for ambiguous characters, which can be overriden by adding `-a 1` or
    `-a 2` parameter. With this flag, all ranges from -2 group should
    disappear and become assigned to selected width (1 or 2).
    
    Generate output using a template:
    
    ```
    uni2characterwidth \
        -U UnicodeData.txt  -A EastAsianWidth.txt  -E emoji-data.txt \
        -g code,./template.example  result.txt
    ```
    
    Reviewers: #konsole, hindenburg
    
    Reviewed By: #konsole, hindenburg
    
    Subscribers: hindenburg, konsole-devel
    
    Tags: #konsole
    
    Differential Revision: https://phabricator.kde.org/D15757
    5f32cb3c