Instead of serializing/desirializing text columns row by row, convert the...

Description

Instead of serializing/de-serializing text columns row by row, convert the text elements to a byte array (zero character separated) and put it base64-encodeded in the XML.

The performance gain is 2-3x for many text columns but can be also significantly bigger if the text structure is simpler and the base64 encoding is faster.

Performance benchmarks, measured on M3 Pro, NEW vs OLD:

Dataset "Movies database" with 5043 rows:

  • column "movie_imdb_link": save - 37ms vs 120ms, load - 1ms vs 4ms
  • column "plot_keywords": save - 76ms vs 96ms, load - 1ms vs 7ms
  • column "language": save - 4ms vs 75ms, load - 0ms vs 3ms
  • column "country": save - 5ms vs 64ms, load - 0ms vs 3ms

Dataset "Lightning strikes" with 3401012 rows:

  • column "center_point_geom": save - 33434ms - 65624ms, load - 939ms vs 2852ms

Conformity

Edited by Alexander Semke

Merge request reports

Loading