Operating System - HP-UX
1821829 Members
3705 Online
109638 Solutions
New Discussion юеВ

Weird Characters In My File...

 
SOLVED
Go to solution
Lee Harris_5
Valued Contributor

Weird Characters In My File...

I am confused...

I have some scripts gathering information and then reading smaller files together into one larger file using cat. The original files have some strange characters in them. You don't see them if you do a more of the file, but when you vi or cat it, these little things show up at in the middle of each line...

╬ж

Trouble is, I can't recreate that character on the command line, nor can i copy paste it into my terminal session. I was originally going to use sed to do a global find / replace and replace it with nothing, but i cant find it in the first place because its a weird character I assume.

Any ideas? thanks in advance
9 REPLIES 9
Peter Godron
Honored Contributor

Re: Weird Characters In My File...

Lee,
if you want to know what the weird characters are, you could dump the file into a blank file via xd. (See man xd)
That would give you the codes for your characters.

You can use the Cntrl-V combination to create the control characters:

To enter Cntrl-E,ESC
Press 'i' for insert
Ctrl-V,E,Ctrl-V,ESC
ESC to finish
Lee Harris_5
Valued Contributor

Re: Weird Characters In My File...

Huh, i just noticed that my post contains a wierd character that shows as a | (pipe) however this isn't the character, i've never seen it before...

it's kind of like this <|> but more rounded and one character on it's own rather than the three i've used to represent it.



James R. Ferguson
Acclaimed Contributor

Re: Weird Characters In My File...

Hi Lee:

You noted that "The original files have some strange characters in them. You don't see them if you do a more of the file, but when you vi or cat it, these little things show up at in the middle of each line...".

Ig the "starnage" character includes a "^M" then you are seeing a carriage-return. This suggests that you FTP'd the file from a Windows platform in binary mode. If this is the case, you can filter the file with 'dos2ux':

# dos2ux filein > fileout

Regards!

...JRF...
Bill Hassell
Honored Contributor

Re: Weird Characters In My File...

You need to use either cat -v or xd -xc to "see" the special characters. There are only 95 standard characters in ASCII that are visible (96 minus the space character which isn't "visible). There are also a myriad of language-specific characters with the 8th bit turned on and a whole bunch of characters that are 'invisible' because your terminal does not have a way to display them. If you list your file with:

xd -xc my_file

you'll see the actual hex code displayed. Ten look at the man page for ASCII:

man ascii

As mentioned, if there is just a single special character at the end of each file, it will likely show up as ^M which is vi's way of displaying the carriage return character. This character is part of DOS/Windows PC files and must be removed using the proper option (for ASCII files) in ftp, namely the ascii command. Or you can remove the characters with the HP-UX command: dos2ux

dos2ux my_file > my_newfile


Bill Hassell, sysadmin
Lee Harris_5
Valued Contributor

Re: Weird Characters In My File...

The files I have the problem with are being generated on HP-UX systems. A script runs to gather a bunch of system information. I scp these to a central machine, and cat all the separate files together into one larger file.

I take the large file, binary FTP it onto my laptop, then open it in Excel. The characters show up as ├Г┬й and a little square in excel.

In a vi of the files on HP-UX, they show as a little squiggly thing and a ^H respectively.

I tried running dos2ux just on the off chance, but this made no difference. I also tried the xd -xc filename but the characters showed up, but I'm not sure what to do from that point on?

At the moment my easiest solution is to open the file up in notepad on my laptop and do a replace all finding the character and replacing it with nothing.

I'd like to be able to strip these characters out somehow on the HP-UX box though, cuz then I can script this to be automated.

I don't want to have to do it in Windows because I will feel like I have been defeated...and microsoft has won :-(
Enrico P.
Honored Contributor

Re: Weird Characters In My File...

Hi,
if your source system is unix you need to run "ux2dos" command to converting in dos format. I think you need to ftp in "ascii" mode also.

Enrico
Enrico P.
Honored Contributor

Re: Weird Characters In My File...

Hi,
Example:

ux2dos file1 file2 > file3

to convert file 1 and file2 to dos format and put them in file3.

Enrico
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Weird Characters In My File...

Your character descriptions are all but useless since they depend not only upon the characters themselves but also the character set used on the display device. In any event, since you seem to want text files w/o binary data the the requirements are rather simple.

tr -cd "[ -~\012]" < infile > outfile
and then FTP your file in ASCII mode to convert LF to CRLF pairs that the PC wants.

The tr command will strip anything that is not a space through tilde " -~" or a LF (octal 012).

If it ain't broke, I can fix that.
Lee Harris_5
Valued Contributor

Re: Weird Characters In My File...

Thanks Clay, that worked perfectly!