Operating System - OpenVMS
1753666 Members
6037 Online
108799 Solutions
New Discussion

Re: unicode character set

 
SOLVED
Go to solution
Brian Duddy
Occasional Advisor

unicode character set

I want to process unicode format files, specifically unicode 16 on an alpha running vms v7.2-1, but the current char set does not support this.
Is there any patch that would upgrade the character set on vms V7.2-1 or is there another work around

Thanks

Brian
13 REPLIES 13
Bojan Nemec
Honored Contributor

Re: unicode character set

Brian,

What do you mean with "process unicode format files"?

If you mean file names in unicode you need an ODS-5 disk. ODS-5 disk structure is supported from VMS 7.2. Please see http://h71000.www7.hp.com/doc/72final/6536/6536pro.html

If you mean file contents then C and C++ (maybe others) have some support for so called wide characters.

Can you post a more specific question?

Bojan
Antoniov.
Honored Contributor

Re: unicode character set

Brian,
unicode support is avaiable since V7.2 but there is no any utility or program to make this work.
I guess you have to write a little C/C++ program that read unicode file, pass string to unicode conversion function and than write into ISO Latin1.
About file system name convention as Bojan posted you need the new filesystem ODS5 but I guess you cannot see any unicode symbol.
The main trouble is VMS is character cell based so you need DecWindows to make full support of unicode.

Antonio Vigliotti
Antonio Maria Vigliotti
Craig A Berry
Honored Contributor

Re: unicode character set

As the other posters have said, you really need to say more about what you mean by "process". I've said pretty much everything I know about this subject here:

http://groups.google.com/groups?selm=6881069ab32a6ead6489d52afc32764a%40news.teranews.com&output=gplain

I don't know if character conversion per se is what you're interested in, but you can find out what conversions you already have by looking here:

$ directory sys$i18n_iconv:
Brian Duddy
Occasional Advisor

Re: unicode character set

Sorry, didnt want to get into too much detail. I will be receiving text files with names and addresses via ftp, I have a test file now sent by email which I can look at in txtpad on pc, as soon as I ftp the file to VMS one of the chars becomes a backwards ?
these unicode/wide chars will be in the file body not in the name.
What I need to do is very simple, read this file on Alpha using VAx Basic and store on file I will at some point have to retrieve this info, again with Vax Basic and output it out to a text/csv file to print letters or send back out to client. Obviously with the chars disappearing as soon as I ftp to alpha this is a problem

thanks

brian
Antoniov.
Honored Contributor

Re: unicode character set

Brian,
if your text files are written by noted they have not unicode format!
Unicode is based on 16 bit character set and it used mainly by java applications.
Text files from PC have 8 bit character set and it's divided into two pages; first page (code from 00 to 127) are standard and it's called ANSI code; second page (code from 128 to 255) are national page; the common used page on PC are PC437 and PC850 while on vms the common page are ISO-Latin1.
Before coonvertion you need known what country is set on PC.

HTH
Antonio Vigliotti
Antonio Maria Vigliotti
Brian Duddy
Occasional Advisor

Re: unicode character set

Antonio, thank for your help

In the Small test file I have VMS cannot handle the following chars wï
where the w is a welsh w character with circumflex, but is not shown, and an I with two dots above. Due to new european legislation our client and therefore I must be able to process any european char or possibly Japanese or Chinese chars

let me correct a previous error on my part, the files are UTF-8 format, they are produced by a british client's systems that is apparently UTF-8 compliant, and even when I open the file in txtpad I still do not see these two chars as described. they appear as
à µà ¯ when I then transfer to vms it doesn't like the last char but I have only noticed that these should have been wï as described above

hope this makes
Brian Duddy
Occasional Advisor

Re: unicode character set

Antonio, thank for your help

In the Small test file I have VMS cannot handle the following chars wï
where the w is a welsh w character with circumflex, but is not shown, and an I with two dots above. Due to new european legislation our client and therefore I must be able to process any european char or possibly Japanese or Chinese chars

let me correct a previous error on my part, the files are UTF-8 format, they are produced by a british client's systems that is apparently UTF-8 compliant, and even when I open the file in txtpad I still do not see these two chars as described. they appear as
à µà ¯ when I then transfer to vms it doesn't like the last char but I have only noticed that these should have been wï as described above

hope this makes
Brian Duddy
Occasional Advisor

Re: unicode character set

thanks for your help so far

Due to new european legislation our client and therefore I must be able to process any european char or possibly Japanese or Chinese chars

The Small test file is supposed to have the following chars wï
where the w is a welsh w character with circumflex, but is not shown here, and an I with two dots above.

let me correct a previous error on my part, the files are UTF-8 format, they are produced by my british client's system that is apparently UTF-8 compliant. Also even when I open the email attachemnt on my pc I still do not see these two chars as described and I have been told they are, they appear as
à µà ¯ when I then transfer to vms it doesn't like the last char but I have only realised that these should have been wï as described above, I have tried to open the files in word, txtpad, iexplorer but cannot see these chars as described.

Therefore my problem has got worse, I cannot look at these chars on my PC and know that VMS will not be able to handle them either

hope this makes
Antoniov.
Honored Contributor

Re: unicode character set

Brian,
rappresentation of char set is a complex work :-(
The big trouble is the device of rappresentation. If you use old VT you can display only some code (ususally accented, greek and cyrillic letters and some others) and you cannot display all togheter.
So if you need view all character (include japaneese and chineese symbols) you MUST use graphical station using DecWindows.
You met same problem in this thread when you tryed display a symbol "the w is a welsh w character with circumflex".

After of this you MUST use unicode (16 bit) rappresentation instead classic 8 bit.

The alternate way is change character set of display device for specific requirement but you will became crazy to make convertion form char set to another :-O

Antonio Vigliotti
Antonio Maria Vigliotti