Operating System - HP-UX
1831131 Members
2629 Online
110020 Solutions
New Discussion

Determine codepage for file in HP-UX

 

Determine codepage for file in HP-UX

Looking for a way to determine original codepage for a file on HP-UX.

e.g linux RedHat file -i

Any suggestions?

Best regards
Niclas
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: Determine codepage for file in HP-UX

Shalom,

file

-i is not supported.

See the man page for other options.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: Determine codepage for file in HP-UX

file won't get me the original codepage it only return what kind of file it is.
I am aware that files -i does not exist on HP-UX but I am lookig for some command in HP-UX that does the same thing as file -i does in Linux.

Best regards
Niclas
Dennis Handly
Acclaimed Contributor

Re: Determine codepage for file in HP-UX

The man page I saw for Linux says it gives the mime type output. What do you mean by "codepage"?
And on 11.31, -i says don't do anything more than check for a "regular file".
Matti_Kurkela
Honored Contributor

Re: Determine codepage for file in HP-UX

Note that even on Linux, "file -i" does not give exact information, just best guesses.

For example: it cannot tell iso-8859-1 and iso-8859-15 apart in plain text files. It will list both as iso-8859-1.

I believe the correct term would be "character encodings" instead of IBMese "codepages".

Do you need to identify any possible character encoding, or can you reasonably assume that the files will use one out of some known set of encodings? If the latter, what encodings can be expected?

Some encodings simply cannot be identified without some extra metadata or the ability to understand the meaning of the data.

MK
MK

Re: Determine codepage for file in HP-UX

Sorry for late responce but I've been home sick for some days.

That's correct Matti I am after the character encoding.

In Linux if I have made a file with LANG set to en_US.UTF-8 and create a file then file -i returns
hba.txt: text/plain; charset=us-ascii

But if I set LANG to sv_SE.iso885915 and create a file file -i returns test.txt: text/plain; charset=iso-8859-1

How do I accomplish this in HP-UX since file -i don't exist?

I notice that this might be missleading sometimes if, as you say, file -i "cannot tell iso-8859-1 and iso-8859-15 apart in plain text files" but I think I can live with that.

Best regards
Niclas
Matti_Kurkela
Honored Contributor

Re: Determine codepage for file in HP-UX

If your file contains only US-ASCII characters, then it will be identified as US-ASCII regardless of whether you use en_US.UTF-8 or sv_SE.iso885915: both these encodings map US-ASCII in a 100% US-ASCII compatible way.

If you use Scandinavian characters (åäöà à à ), Euro signs or other non-US-ASCII characters, only then will the different encodings produce different results.

Fortunately, the difference between iso8859 encodings and UTF-8 is easy to detect by eye: if you see a single Scandinavian character expressed as _two_ characters, the data is UTF-8 and it's being mis-interpreted using iso8859-style encoding.

If all the Scandinavian characters are displayed as some generic "invalid symbol" characters (depends on the font, but may be a square standing on one corner with a question mark in it) or no character at all, you're most likely seeing an iso8859-style file mis-interpreted as UTF-8.

Also remember that simply changing your LANG setting may not be enough: you should also check the settings of your terminal emulator.

MK
MK

Re: Determine codepage for file in HP-UX

Thank you for the responce but still, If I have a file containing swedish or russian character (or whatever character from in a foreign language), is there any command in HP-UX that will return the charset with which this file have been created?