pomtrans: multi-message batch separator problem with Apertium
(note: I encountered this issue when using pology
with python2
although I think that this issue would also be replicable using python3
)
The pomtrans
script introduces a HTML br
element and a single trailing full-stop as a separator between each message when batching translation using Apertium.
The constructed separator string is later used to split the response from the engine into individual results.
This behaviour relies on the engine's response including the same HTML separator string verbatim, and it seems that this isn't always guaranteed.
In particular, when using apertium
v3.7.1 on Debian (bullseye), single-quotes in the "input" separator sent to the engine are transformed into double-quotes in the "output" separator.
To demonstrate that it's the apertium
engine that is performing this transformation, here's an example of using the engine standalone from the command-line:
$ echo "<br class='...'>.first example<br class='...'>.second example" | apertium -f html-noent en-es
<br class="...">.Primer ejemplo<br class="...">.Segundo ejemplo
Because the separator in the response has changed, the pomtrans
script fails to split the response into individual messages. An example of the script failure is included below:
$ python3 scripts/pomtrans.py -p locales/es:locales/en --source en --target es apertium ~/Documents/reciperadar/frontend/i18n/locales/es/categories.po
pomtrans.py: [warning] Apertium reported wrong number of translations, 1 instead of 7.
pomtrans.py: [warning] Translation service failure on '/home/jka/Documents/reciperadar/frontend/i18n/locales/es/categories.po'
If it'd be helpful I could try to offer a potential fix as a merge request here (opened as !5 (merged)).
Edit: correction for Python version, and update text re: potential fix