Skip to content
Commit 0d7259f8 authored by Alvin Wong's avatar Alvin Wong 🤔
Browse files

win: Set activeCodePage to UTF-8

This should enable impex plugins of all file formats to access files
containing Unicode characters in their paths on Windows.

Historically, Windows has used the ANSI code page (ACP) as the encoding
for char strings, which can only represent a limited range of
characters. Windows also provides wide versions of the APIs using
`wchar_t`, which is 2-byte chars for strings encoded in UTF-16 (or UCS-2
in some cases). For libraries to support Unicode filenames on Windows,
they have to go out of the way to implement it with the wide API. They
don't do it consistently either -- some choose to implement wide
variants for their API, while some choose to interpret char* paths in
UTF-8 (which led to confusion when the caller assumed the API takes the
local 8-bit char encoding).

Now, by setting the activeCodePage to UTF-8, this changes the code page
for our process to UTF-8. This effectively means that, all -A variants
of WinAPI calls now accept UTF-8 strings instead of strings in the
system ACP. By extension, C and C++ functions for accessing files that
are not the 'wide' variant will now also accept UTF-8 file paths.

With regards to the impex plugins, this changes their behaviour around
file paths:

* If the external library already accepts `wchar_t *` there should be no
  change in behaviour.
* If the external library accepts `char *` and treats them as UTF-8:
  * If we correctly use `QString::toUtf8()`, there should be no change
    in behaviour.
  * If we use `QString::toLocal8Bit()` or `QFile::encodeName()` by
    mistake, having activeCodePage in UTF-8 will render it a non-issue.
* If the external library accepts `char *` and uses C or C++ library
  functions to open them directly:
  * If we correctly use `QString::toLocal8Bit()` or
    `QFile::encodeName()`, they would not have been able to open files
    with names containing Unicode chars outside of the system ACP in the
    past, but will now be able to do so.
  * If we use `QString::toUtf8()` by mistake, having activeCodePage in
    UTF-8 will render it a non-issue.

As illustrated above, the result is a net improvement.

Potential side effect: If a Python plugin expects to be using the system
ACP to interact with an external process via IPC, this can cause the
encoding to become mismatch.

Note that this only works starting from Windows 10 Version 1903.

Reference: https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

CCMAIL: kimageshop@kde.org
parent 42d78a74
Loading
Loading
Loading
Pipeline #178806 passed with stage
in 37 minutes and 16 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment