Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Character Conversion

DougMiller
Occasional Visitor

Character Conversion

This problem appeared on an Alpha system after an upgrade from VMS 7.3-2 to 8.3 and a conversion from ODS-2 to ODS-5. Previously I'd always used ftp to transfer postcode data to our Alpha system, and the accented European letters such as a o and u with umlauts were fine. Now however I seem to get them substituted with a 2 character replacement of A with a tilde above (ASCII195) and different value depending on whether a o or u umlaut, for a umlaut the second character is ASCII 164, for o umlaut it's ASCII 182 and for u umlaut it's ASCII 188. If I copy this file back to the windows system it originated on the file looks fine again. How can I get the Alpha to see the ä ö and ü?
5 REPLIES
Steven Schweda
Honored Contributor

Re: Character Conversion

TCPIP SHOW VERSION

I don't know what "postcode data" are.

Are you talking about data in the file, or
file names?

> [...] I'd always used ftp [...]

There are many ways to use FTP. How,
exactly, did you use it? (Who's the client?
Who's the server? What's the FTP software
on the Windows system? Used how, exactly?
...)

> Now however I seem to get them substituted
> [...]

How does this seem to happen? How are you
looking at the file contents (or names?)?

> If I copy this file back [...]

Again, how, exactly?

Are these 16-bit Unicode characters? Which
program on the VMS system knows what those are?

As usual, actual commands with their actual
output can be more informative than vague
descriptions.
Hoff
Honored Contributor

Re: Character Conversion

You're up in the range of the ISO Latin-1 or the DEC MCS or whatever (other) encoding you're using, if you're up past position 127. AFAIK, ASCII doesn't define characters above position %x7F; only up through DEL is in the standard. FWIW.

Character mapping can get interesting during a file transfer, as it is distinctly possible that the Microsoft Windows encoding is not getting transferred as expected. There are various ways to translate (and mistranslate) characters.

DUMP the OpenVMS file and see what's encoded in the file on OpenVMS; OpenVMS itself doesn't really know from Unicode or such, so this is usually determined by the application and (in the context of the transfer) the ftp or sftp daemon. On OpenVMS, you'll generally find DEC MCS in the files. (MCS is almost the same as ISO Latin-1, FWIW.)

If you have xxd or such tools available on your Windows box, dump the file over there, too. (If not, go locate a Windows hexadecimal file dump tool. The vim editor port will typically include a version of this tool.)

If this is a case of the display and not characters embedded in the DUMP (as your character references imply), then check the character set setting on your terminal or terminal emulator. Ensure you have eight-bit, and ensure the terminal or terminal emulator has the correct character set for the upper characters. (How to do that depends on what terminal or terminal emulator is in use here.)

Look to use Filezilla or such to perform the transfer as a test, if this is showing up with odd characters in the file, as displayed by DUMP. (the Microsoft Windows command prompt ftp client can have some oddities, but does usually work. I'm guessing the characters in the file are correct, but (short of confirmation using DUMP or such) character translations during the ftp or sftp could be a possibility.

If the above does not answer it, start with posting the DUMP of a small! file, and post up details on the exact symptoms, the terminal or terminal type, and the particular IP stack and version.
DougMiller
Occasional Visitor

Re: Character Conversion

Thank you for the replies. I think I understand the problem a little more now. Firstly by postcode data I meant text files containing data which links address information to postcode, or ZIP code in American English.
In terms of transferring between systems on one side is an Alpha running VMS 8.3, and on the other a PC with Windows XP pro (although the situation is the same when using a Vista machine rather than XP). The Alpha is running ftp server, and the client on the windows machine can be any of the following, ftp client as part of Reflections for OpenVMS, Ipswitch WS_FTP, or the ftp client within Esker's Smartterm, they all behave the same in this case.

But the precise issue is formally the data was supplied as ISO-8859, and is now supplied as Unicode. It took me a while to get to this point so I apologise for the incomplete/incorrect explanation previously. Using the defaults that any of the above 3 ftp clients install with I used to copy the data to the Alpha. Then using an editor such as TPU I used to see for example ä but now see A tilde followed by ascii 164, so that the single character is replaced by a pair of characters.



Is there a simple way of converting or switching it from one to the other.
DougMiller
Occasional Visitor

Re: Character Conversion

Whoops, was previewing when I clicked submit by mistake. But interestingly I had typed and was seeing an a with an umlaut which the browser displayed as the A tilde followed by another weird character.
DougMiller
Occasional Visitor

Re: Character Conversion

In case anyone else has this problem the answer is to install the layered product vmsi18n and then

$iconv convert input_file /fromcode=UTF-8 output_file /tocode=ISO8859-1

Obviously the fromcode and tocode need to be what you want. You can see the choices by doing a dir of sys$i18n_iconv: