- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Strange characters in text file
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-12-2008 01:31 PM
тАО02-12-2008 01:31 PM
Strange characters in text file
This is the wierd part. The file contains several lines where a dash appears with a space on either side. Just ONE of those occurances in the entire file, the dash appears, still with spaces on both sides, as:
╬У├З├┤
If I put it through more, this what-should-be ONE dash character appears as:
M-bM-^@M-^S
Can anyone interpret that? And here's the kicker. If I ftp it to another HP-UX system, the resulting file has the same defect. But if I use ftp to copy it to a Windows PC (pulling the file from HP-UX), using in either ascii or binary, the dash appears as a dash when viewed in notepad.
As they say, WTF ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-12-2008 02:08 PM
тАО02-12-2008 02:08 PM
Re: Strange characters in text file
E2 80 93
which is UTF8 for dash.
New problem, if I try to convert the file using:
iconv -f utf8 -t iso81 filename>newfile
the E28093 and other UTF8 sequences are changed to 1A (^Z).
Is this the best iconv can do?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-12-2008 11:35 PM
тАО02-12-2008 11:35 PM
Re: Strange characters in text file
>which is UTF8 for dash.
I don't see that. In /usr/lib/nls/loc/charmaps/utf8.cm I see
>the E28093 and other UTF8 sequences are changed to 1A (^Z).
This must be the "galley character".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 02:07 AM
тАО02-13-2008 02:07 AM
Re: Strange characters in text file
Some Windows applications (MS Word in particular) will automaticaly change a hyphen to a long dash (or em-dash) for typographical reasons, and then save that in UTF-8. If that then goes into an Oracle database which is running in UTF-8, you could get similar problems to this. (I've actually seen this happen.)
The presence of ctrl-Z hints at Windows/DOS as well, since ctrl-Z was the EOF marker in DOS, and still crops up occasionally for historical reasons.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 02:39 AM
тАО02-13-2008 02:39 AM
Re: Strange characters in text file
No, in this case it comes from iconv(1) as Carl said. Probably because ^Z is SUB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 02:44 AM
тАО02-13-2008 02:44 AM
Re: Strange characters in text file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 07:37 AM
тАО02-13-2008 07:37 AM
Re: Strange characters in text file
(en dash, e28093)
http://www.fileformat.info/info/unicode/char/2013/index.htm
(right double quote, e2809d)
http://www.fileformat.info/info/unicode/char/201d/index.htm
Andrew, I can believe the part about Windows client - probably copy/pasted from word or some such thing.
So now I guess I'm looking for a patch to make iconv handle these and possibly other missing characters. Only problem is figuring out which one. Any hints welcome.
BTW I found this:
http://www.docs.hp.com/en/5991-1194/5991-1194.pdf
which clearly talks about EN DASH at e28093, but makes no mention of e2809d. It also talks about patches but doesn't identify them in any meaningful way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 07:50 AM
тАО02-13-2008 07:50 AM
Re: Strange characters in text file
(en dash, e28093)
http://www.fileformat.info/info/unicode/char/2013/index.htm
(right double quote, e2809d)
http://www.fileformat.info/info/unicode/char/201d/index.htm
Andrew, I can believe the part about Windows client - probably copy/pasted from word or some such thing.
So now I guess I'm looking for a patch to make iconv handle these and possibly other missing utf8 characters. Only problem is figuring out which patch(es) address the problem. Any hints welcome. Searching on "utf8" in the Patch Database found matches but none specifically talk about adding missing utf8 characters. And searching on "utf8.cm" didn't produce anything.
BTW I found this:
http://www.docs.hp.com/en/5991-1194/5991-1194.pdf
which clearly talks about EN DASH at e28093, but makes no mention of e2809d. It also talks about patches but doesn't identify them in any meaningful way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 07:51 AM
тАО02-13-2008 07:51 AM
Re: Strange characters in text file
What I see is that probably it is taking all the windows format (as ussual), then you need to modify its format to unix readable, have you tried using the dos2ux command ?
dos2ux is useful then transferring files between different OS's, etc.
Syntax:
# dos2ux weirdfile > ux_formatedfile
Then after that try to check the ux_formatedfile , maybe it has a new format and recognized by unix.
Try it and let us know.
Regards,
Marco
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 08:04 AM
тАО02-13-2008 08:04 AM
Re: Strange characters in text file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 08:08 AM
тАО02-13-2008 08:08 AM
Re: Strange characters in text file
All those ^Z ^M etc etc, are windows "chars" , obviously when transferring them a Win OS the system recognize what it has to recognize, avoiding the flags in all the document, like the spaces, enters, tabs, etc, etc.
Dos2ux did help me with that, but if that's not the issue, let me review more.
Regards,
Marco
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 08:16 AM
тАО02-13-2008 08:16 AM
Re: Strange characters in text file
What I need now are are the patch #'s that add these missing utf8 characters to HP-UX 11.11.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 06:17 PM
тАО02-13-2008 06:17 PM
Re: Strange characters in text file
(Thanks for the links.)
Well, I'm not sure what good it would do. iconv(1) will only work if both charmaps are changed. Of course utf8.cm should have them for completeness. You should contact the Response Center and file an enhancement.
(It hasn't changed for 11.31 either.)
>which clearly talks about EN DASH at e28093, but makes no mention of e2809d.
Yes.
>So now I guess I'm looking for a patch to make iconv handle these and possibly other missing utf8 characters. Only problem is figuring out which patch(es) address the problem.
I found PHCO_29903
11.11 iconv cumulative patch
But it may not help you.
>Searching on "utf8" in the Patch Database found matches but none specifically talk about adding missing utf8 characters. And searching on "utf8.cm" didn't produce anything.
Right. Though they could fix the shared libs but not the charmaps?
You do know that you can create your own maps?
See genxlt(1), dmpxlt(1) and iconv(3C)
$ dmpxlt /usr/lib/nls/iconv/tables/ucs2=iso81
shows:
#What: A.10.02 $ucs2 =) iso81
#Galley: 0X1a
Your iconv(1) command sees to open:
/usr/lib/nls/iconv/hpux32/tables.1/ucs2=iso81
I don't see any but the 8 bit identity translations, even in the raw binary file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 08:58 PM
тАО02-13-2008 08:58 PM
Re: Strange characters in text file
Meanwhile I'm going to wait and see if the workaround (don't paste from Word) becomes too much for users to bear before going the next mile for a solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-13-2008 09:11 PM
тАО02-13-2008 09:11 PM
Re: Strange characters in text file
instead of, or in addition to, iconv how about a workaround in the form of a quick sed or perl filter for the file.
Untested!! Maybe something like:
perl -pe "s/\xe2\x80\x95/--/g;s/\xc2\xad/-/g" old > new
fwiw,
Hein.