UTF8 everywhere

todoooo · January 23, 2020, 9:33am

I have been working on several merge requests in VTK, https://gitlab.kitware.com/vtk/vtk/merge_requests/6122 https://gitlab.kitware.com/vtk/vtk/merge_requests/6291 & https://gitlab.kitware.com/vtk/vtk/merge_requests/6301, which introduce the constraint that all file and path names (and eventually all string data) passed across the VTK API are utf-8 encoded.

It has come to my attention that some Paraview users make use of extended character sets with non-utf8 locales, so I’m wondering how many people would object to this new requirement?

mwestphal · January 23, 2020, 9:38am

which introduce the constraint that all file and path names (and eventually all string data) passed to/from VTK are utf-8 encoded.

Once again, it is not clear to me where, in the code, is this requirement present. could you precise that ?

so I’m wondering how many people would object to this new requirement ?

Anyone without a UTF-8 locale will be impacted by the proposed change to ParaView.
Type locale -a in a terminal on linux to know.
On windows, the info is in the language settings.

todoooo · January 23, 2020, 10:03am

Anyone using ANSI (128 char) file and path names will not be affected, regardless of their locale settings, since utf-8 encoding is identical to plain English text.

mwestphal · January 23, 2020, 10:05am

Why do you say that all file and path name have to be UTF-8 ?

todoooo · January 23, 2020, 10:14am

The requirement comes about due to the introduction of vtksys for opening files/streams and as a result of this change in MR 6122

mwestphal · January 23, 2020, 10:23am

Much clearer now. Thanks.

There is indeed no way around it. However it means we drop support of non-UTF8 locale on windows, so we must be clear on that beforehand.

@utkarsh.ayachit @cory.quammen : thoughts ?

cory.quammen · January 23, 2020, 6:56pm

Afraid I don’t know enough about character encodings to contribute. Are the limitations on non-UTF8 locales likely to affect many Windows users?

todoooo · January 23, 2020, 8:27pm

The purpose of the changes is actually to allow text in any language to be treated consistently and reliably by VTK regardless of the user locale.

mwestphal · January 24, 2020, 4:14am

Indeed, all these changes will indeed improve support for UTF-8 encodings greatly.

However, all Chinese user may be impacted by this, see here :

todoooo · January 24, 2020, 4:20am

I don’t see the problem. Every character from every language can be encoded in UTF-8.

mwestphal · January 24, 2020, 4:22am

Great ! I missed this part ! GB232, which was historically used before GB 18030 did not support it.
There still must be user in China using GB232 but that may be fine to tell them to upgrade.