Internationalizing software

Ph Vouters · ‎01-23-2011

With companies being bought by Chinese capital or companies dealing with the Chinese market, it may be necessary to code either messages produced by software or filenames using the UTF8 codeset which looks to be the only one correctly handled by Chinese computers.

Two articles of mine which are of interest :
1/ http://vouters.dyndns.org/tima/OpenVMS-Fortran-Internationalizing_messages.html
(for other languages see the REFERENCE section)
2/ http://vouters.dyndns.org/tima/OpenVMS-Linux-iconv-Converting_filenames_from_one_codeset_to_another.html

In the hope this may help some of you.

Philippe

WW304289 · ‎01-24-2011

Hi Philippe,

the official character set in China is GB18030. Support for GB18030 was added to VMS in 2001, including GB18030 <-> Unicode converters: UCS-2, UC2-4 and UTF-8.

Thanks,
-Boris

Ph Vouters · ‎01-24-2011

Hi Boris,

To produce and test the DCL procedure that my Fortran article contains, I did the following onto my Linux Fedora 14 computer with an xterm terminal:
[philippe@victor ~]$ LANG=zh_HK.utf8 vi utf8.msgx
[philippe@victor ~]$ uname -s
Linux
then sftp transferred utf8.msgx to the VMS system.
Then SSH logged into the VMS computer still through an xterm.

Then using the VMS C locale, I assembled the command procedure using TPU (no display for the Chinese characters). Then chose UTF8-20, because when I previously did a cut'n paste of the Chinese characters from Thunderbird to an aterm terminal, I noticed \u sequences each followed 4 hex digits.

I chose UTF8-20 as the locale as it looked to be the only one able to display extended 8-bit ASCII (one of the Swedish characters), the Chinese characters as well as the 7-bit ASCII characters (the string "Best regards").

The VMS computer where I tested has the following locales installed (see hereafter). Perhaps should I have used the
[SYS$I18N.LOCALES.SYSTEM]ZH_HK_UTF-8 locale to match my Linux choice ? I did not test this and perhaps would I have also got an excellent result for the three codesets (US-ASCII, extended ASCII, and Chinese codeset) ?

$ locale show public
C (Built-in)
POSIX (Built-in)
[SYS$I18N.LOCALES.SYSTEM]CS_CZ_ISO8859-2
[SYS$I18N.LOCALES.SYSTEM]DA_DK_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]DA_DK_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]DE_CH_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]DE_CH_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]DE_DE_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]DE_DE_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]EL_GR_ISO8859-7
[SYS$I18N.LOCALES.SYSTEM]EN_GB_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]EN_GB_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]EN_US_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]EN_US_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]ES_ES_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]ES_ES_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]FI_FI_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]FI_FI_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]FR_BE_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]FR_BE_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]FR_CA_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]FR_CA_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]FR_CH_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]FR_CH_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]FR_FR_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]FR_FR_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]HU_HU_ISO8859-2
[SYS$I18N.LOCALES.SYSTEM]IS_IS_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]IS_IS_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]IT_IT_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]IT_IT_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]IW_IL_ISO8859-8
[SYS$I18N.LOCALES.SYSTEM]IW_IL_UTF-8
[SYS$I18N.LOCALES.SYSTEM]JA_JP_DECKANJI
[SYS$I18N.LOCALES.SYSTEM]JA_JP_DECKANJI2000
[SYS$I18N.LOCALES.SYSTEM]JA_JP_EUCJP
[SYS$I18N.LOCALES.SYSTEM]JA_JP_SDECKANJI
[SYS$I18N.LOCALES.SYSTEM]JA_JP_SJIS
[SYS$I18N.LOCALES.SYSTEM]JA_JP_UTF-8
[SYS$I18N.LOCALES.SYSTEM]KO_KR_DECKOREAN
[SYS$I18N.LOCALES.SYSTEM]KO_KR_UTF-8
[SYS$I18N.LOCALES.SYSTEM]NL_BE_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]NL_BE_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]NL_NL_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]NL_NL_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]NO_NO_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]NO_NO_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]PL_PL_ISO8859-2
[SYS$I18N.LOCALES.SYSTEM]PT_PT_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]PT_PT_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]RU_RU_ISO8859-5
[SYS$I18N.LOCALES.SYSTEM]SK_SK_ISO8859-2
[SYS$I18N.LOCALES.SYSTEM]SV_SE_ISO8859-1-EURO
[SYS$I18N.LOCALES.SYSTEM]SV_SE_ISO8859-1
[SYS$I18N.LOCALES.SYSTEM]TH_TH_TACTIS
[SYS$I18N.LOCALES.SYSTEM]TR_TR_ISO8859-9
[SYS$I18N.LOCALES.SYSTEM]UTF8-20
[SYS$I18N.LOCALES.SYSTEM]UTF8-30
[SYS$I18N.LOCALES.SYSTEM]ZH_CN_DECHANZI
[SYS$I18N.LOCALES.SYSTEM]ZH_CN_DECHANZI_PINYIN
[SYS$I18N.LOCALES.SYSTEM]ZH_CN_DECHANZI_RADICAL
[SYS$I18N.LOCALES.SYSTEM]ZH_CN_DECHANZI_STROKE
[SYS$I18N.LOCALES.SYSTEM]ZH_CN_GB18030
[SYS$I18N.LOCALES.SYSTEM]ZH_CN_UTF-8
[SYS$I18N.LOCALES.SYSTEM]ZH_HK_BIG5
[SYS$I18N.LOCALES.SYSTEM]ZH_HK_DECHANYU
[SYS$I18N.LOCALES.SYSTEM]ZH_HK_DECHANZI
[SYS$I18N.LOCALES.SYSTEM]ZH_HK_EUCTW
[SYS$I18N.LOCALES.SYSTEM]ZH_HK_UTF-8
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_BIG5
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_BIG5_CHUYIN
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_BIG5_RADICAL
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_BIG5_STROKE
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_DECHANYU
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_DECHANYU_CHUYIN
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_DECHANYU_RADICAL
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_DECHANYU_STROKE
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_EUCTW
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_EUCTW_CHUYIN
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_EUCTW_RADICAL
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_EUCTW_STROKE
[SYS$I18N.LOCALES.SYSTEM]ZH_TW_UTF-8

Yours truly,
Philippe

WW304289 · ‎01-24-2011

Hi Philippe,

yes, ZH_HK_UTF-8 locale on VMS is supposed to match zh_HK.utf8 locale on Linux. Note, however, that iconv facility is agnostic to the current program's locale (I did not look at your code and don't know if you assume otherwise).

There are .cmap (character map) files in SYS$I18N_LOCALE directory showing character encoding. You can look at .cmap file and, in C or C++, construct multibyte character using hexadecimal-escape-sequence form of character constant, taking directly from .cmap file, e.g.

'const char *s = "\x9a\xa1";'

The fact that GB 18030 locale is not installed on your system is, probably, because you did not install the full VMS I18N kit. IIRC, installation of this locale is optional. Still, you should have codeset converters for GB 18030 character set in SYS$I18N_ICONV directory, just search for *GB18030*.ICONV. Again, IIRC, the codeset converters are installed unconditionally.

Hope you find this useful.

Thanks,
-Boris

Ph Vouters · ‎01-24-2011

I brought the solution (see my URL link). Thank you to Boris to have actively participated into public knowledge,

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Internationalizing software

Internationalizing software

Re: Internationalizing software

Re: Internationalizing software

Re: Internationalizing software

Re: Internationalizing software