Operating System - HP-UX
1825689 Members
3554 Online
109686 Solutions
New Discussion

strange characters in a file

 
SOLVED
Go to solution
Marc Ahrendt
Super Advisor

strange characters in a file

some lines in my text file have strange characters in them like "^?" and others

how can i remove any line that contains a strange character? ...basically each line should only contain alphanumerics and maybe a "-", "_", or some other reasonable character

attached is an example file with some of these strange characters
hola
16 REPLIES 16
Ross Zubritski
Trusted Contributor

Re: strange characters in a file

Hi

Run the dos2ux command on the file in question. Looks like this file was transferred from a PC to a UX box.

Regards,

RZ
S.K. Chan
Honored Contributor

Re: strange characters in a file

This is expected if you have files that were accessed between the pc world and unix. See is dos2ux utility may help ..
# dos2ux badfile > goodfile
Man "dos2ux" for greater details.
Bill McNAMARA_1
Honored Contributor

Re: strange characters in a file

to quote the obvious,
ux2dos also exists!
It works for me (tm)
Frank Slootweg
Honored Contributor

Re: strange characters in a file

Your attachment is not very clear, but try if this works:

tr -d '[\200-\377]' outfile

This command deletes ("-d") all the non-ASCII characters (all characters from 200 through 377 octal).
James R. Ferguson
Acclaimed Contributor

Re: strange characters in a file

Hi Marc:

If you want to analyze the contents of your file some more, you could use 'cat -v' to expose the non-printing characters. See the man pages for 'cat' and 'ascii(5)' to understand the interpretation.

Too, you may find 'xd' quite useful too. Have a look at it's man pages.

Regards!

...JRF...
Marc Ahrendt
Super Advisor

Re: strange characters in a file

Ross, SK, Bill: sorry for not explaining that this has nothing to to with UNIX <-> PC ...only that i did ftp this file from my UNIX box to my PC to send it as an attachment to this question (this file was made on UNIX and the problem seen on UNIX)

Frank: i am aware of "tr -d ..." but thx for those octal codes ...yet they had no effect on this file (and also "tr" will not delete the line i think unless the new line character is referenced too?)

James: yes, "cat -v ..." shows the strange characters, and i can use "xd" and "od" to get octal values for these strange charatacters ...but i do not know what all the starnge characters may be or could be later (like Frank's idea of a range ...or the inverse of it?)

i thought there might be an easy one liner using grep with reg. expr. or something else tricky
hola
James R. Ferguson
Acclaimed Contributor

Re: strange characters in a file

Hi (again) Marc:

As I hinted, you will need to resort to consulting the ASCII table (man 'ascii(5)').

Some additional help is offered by 'xd -t a' which creates output as "named characters".

Regards!

...JRF...
Dave La Mar
Honored Contributor

Re: strange characters in a file

Marc -
Interesting. We have encountered ^? and many other strange characters when sending a file via ftp from our mainframe to unix.
It appears there is no unix translation for certain hex characters, in our case, and we end up with the "strange characters.

As a result, we have scripted the following to translate to readable chacters without destrying the data. In many cases, these strange characters are simply field delimiters in our cases.
As noted in previous post, we too use the cat -v for the translation process.
Translate i.e.
cat -v filename | sed 's/^M/~/g' | sed 's/\^M/~/g' > new_filename

The above would translate a control M and a caret M to a tilde.

Note: You can pipe as many sed statements as necessary with other "strange" character translations prior to output to the new file.

Best of luck.

Regards,
dl
"I'm not dumb. I just have a command of thoroughly useless information."
Tom Danzig
Honored Contributor

Re: strange characters in a file

Try:

cat | col -b

A. Clay Stephenson
Acclaimed Contributor
Solution

Re: strange characters in a file

You want a one-liner: Okay,

perl -pe 's/[^\040-\176\012]//g' oldfile > newfile

You might want to add \014 after \012 to also allow ASCII FF's in addition to the LF's. Basically anything that ain't space through tilde (octal 40 thru 176) or a LF (octal 12) gets throwed away.
If it ain't broke, I can fix that.
john korterman
Honored Contributor

Re: strange characters in a file

Hi Marc,
I take it that you want to ignore lines with certain characters rather than do actual conversion. If correct, then try the attached script. You can define any which character to prevent a input line from from being printed in the CHAR_LIST variable. If you want to use a character that has another meaning than the literal as a regular expression, it must be preceeded by backslash ($ even by two).
Run the script like this:
# attached_script.sh inputfile


regards,
John K.
it would be nice if you always got a second chance
Dave La Mar
Honored Contributor

Re: strange characters in a file

Marc -
Tom D. has the 10 point answer.
I wish this thread had appeared a month ago.
Tom's solution would have saved me a lot of heart ache.
For our needs in the example I state, I will still need to do some form of substitution, but in other jobs, Tom's solution will be of great help.
Thanks Tom.

Best regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."

Re: strange characters in a file

A bit more combersome than using a handy HPUX command, within the file, i.e. vi editor, you can globally remove all the dos carriage returns with the following vi substitute command:
:1,$s/^v^m//

...note, in vi, to type the ascii carriage return, ^M, you must break it down such as the carrot, ^, type V, than the M is M.

For a quick perl script that mimics the dos2ux command:
perl -pe -l.lfcr "s/[\012\015]//;" file
Dipu
Occasional Advisor

Re: strange characters in a file

Hi,
A simple solution for this would be

cat | col -b >

Regards!!
Dipu
Dave La Mar
Honored Contributor

Re: strange characters in a file

Dipu -
Nice confirmation of Dan's already suggested solution minus the OBVIOUS output to newfile!!!

Sorry forum, no coffee yet.

grumpy
"I'm not dumb. I just have a command of thoroughly useless information."
Marc Ahrendt
Super Advisor

Re: strange characters in a file

James: thx again! ...i'm not too bright and at times need to be told twice
man ascii //very helpful
cat | xd -t a //also very helpful

Dave: thx for the sed reference, but in the end found other commands given in this thread to be better suited

Tom: very cool command, yet does not do a good job with the "esc" character (\033) ...i would have given you a 10 but Clay took it away

Clay: i think when perl is used its like cheating...
this was a great one liner and gives me the flexibility to control the search and action (your one-liner got all the strange characters in the file i was dealing with)

John: i like you script idea, just was having some problems with using "grep" against certain entries in $CHAR_LIST (even the ones given in your script were making grep unhappy)

Brain: thx for the tips ...i had trouble with the perl options you gave (warning: i am not that familiar with perl) and when i got it working it was not as thorough as Clay's one-liner

Dipu: "col" is a nifty little command

...now they don't feel like strange characters anymore
hola