Skip to content

Unicode: fix the extended grapheme cluster algorithm

[PATCH 1/2] Unicode: fix the extended grapheme cluster algorithm

UAX #29 in Unicode 11 changed the EGC algorithm to its current form.
Although Qt has upgraded the Unicode tables all the way up to
Unicode 13, the algorithm has never been adapted; in other words,
it has been working by chance for years. Luckily, MOST
of the cases were dealt with correctly, but emoji handling
actually manages to break it.

This commit:

* Adds parsing of emoji-data.txt into the unicode table generator.
  That is necessary to extract the Extended_Pictographic property,
  which is used by the EGC algorithm.

* Regenerates the tables.

* Removes some obsoleted grapheme cluster break properties, and
  adds the ones added in the meanwhile.

* Rewrites the EGC algorithm according to Unicode 13. This is
  done by simplifying a lot the lookup table. Some rules (GB11,
  GB12, GB13) can't be done by the table alone so some hand-rolled
  code is necessary in that case.

* Thanks to these fixes, the complete upstream GraphemeBreakTest
  now passes. Remove the "edited" version that ignored some rows
  (because they were failing).

Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b
Pick-to: 6.1 6.0 5.15
Fixes: QTBUG-92822
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
(cherry picked from commit a794c5e287381bd056008b20ae55f9b1e0acf138)

* asturmlechner 2022-06-09: Fix conflict with dev branch commit
    0ae5b8af9c2af514f316211b71a03defc0f1b65a

[PATCH 2/2] Unicode: fix the grapheme clustering algorithm

An oversight in the code kept the algorithm in the GB11 state, even if
the codepoint that is being processed wouldn't allow for that (for
instance a sequence of ExtPic, Ext and Any).

Refactor the code of GB11/GB12/GB13 to deal with code points that break
the sequences (falling back to "normal" handling).

Add some manual tests; interestingly enough, the failing cases are not
covered by Unicode's tests, as we now pass the entire test suite.

Amends a794c5e287381bd056008b20ae55f9b1e0acf138.

Fixes: QTBUG-94951
Pick-to: 6.1 5.15
Change-Id: If987d5ccf7c6b13de36d049b1b3d88a3c4b6dd00
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
(cherry picked from commit d48058f1970a795afb4cedaae54dde7ca69cb252)

...with commit 4a51276f necessary to fix build after patch 1.

Note: This is not yet tested much at runtime.

Merge request reports