Skip to content

Improve translations extraction by using a custom extractor

Carl Schwan requested to merge work/translations into master

po4a is limited for the extractions of markdown files in Hugo websites. For example, it doesn't support the front matter or Hugo shortcodes.

The new extractor tries to be clever in some cases and removes leading #, *'. It also supports extracting the values from the frontmatter (the page's YAML metadata) and supports list elements better.

Most of the strings weren't changed. To remove the work for the translators, I also created a script to update the old po files. This probably won't fix all the issues but most of them. This won't fix all the issues but most of them. The script is

import polib
import os
import re

directory = 'svn_clone'

for translation in os.listdir(directory):
    po_file = directory + '/' + translation + "/messages/websites-kde-org-announcements-releases/kde-org-announcements-releases.po"
    if not os.path.exists(po_file):
        continue
    po = polib.pofile(po_file)
    for entry in po:
        # strip space
        entry.msgid = ' '.join((entry.msgid + " ").split())
        entry.msgstr = ' '.join((entry.msgstr + " ").split())

        if entry.msgid.startswith('title: '):
            entry.msgid = entry.msgid[7:]
            entry.msgstr = entry.msgstr[7:]

        if entry.msgid.startswith('summary: '):
            entry.msgid = entry.msgid[9:]
            entry.msgstr = entry.msgstr[9:]

        prog = re.compile('caption="([^"]*)"')
        result = prog.findall(entry.msgid)

        if len(result) > 0:
            entry.msgid = result[0]
            prog = re.compile('caption="([^"]*)"')
            result = prog.findall(entry.msgstr)
            entry.msgstr = result[0]
    po.save(po_file)

Reviewers

Edited by Carl Schwan

Merge request reports