1834498 Members
2558 Online
110067 Solutions
New Discussion

Strange Characters

 
Shannon Petry
Honored Contributor

Strange Characters

HISTORY: I am writing a script to automate the translation of CATIA models to IGES, and visa-versa. I have this working 99.9% but ran into a new glitch.
PROBLEM: CATIA has strange file names in UNIX. I.E.
????F??I??N??I??S??H??E??D??-??07??22??02??-??04782638AA????KNUCKLE.model

I have written a few lines of script to remove the ?? sign from the name and replace it with spaces, and remove the .model, and convert any lower case char's to upper case.

The last problem is that the 23rd and 26th character need to be converted to slashes. This character while looking like a single quote is not.
I tried to do
ls ????F??I??N??I??S??H??E??D??-??07??22??02??-??04782638AA????KNUCKLE.model | tr "'" "/"
and it does not change the name.

How do i find what this character really is, and how do i change it to a '/' afterwards? Looks like it has the hex value of B1

here is the output of ls | od -xt

0000000 b1b146b1 49b14eb1 49b153b1 48b145b1
0000020 44b12db1 3037b432 32b43032 b12db130
0000040 34373832 36333841 41b1b14b 4e55434b
0000060 4c452e6d 6f64656c 0a000000

Thanks!
Shannon
Microsoft. When do you want a virus today?
11 REPLIES 11
A. Clay Stephenson
Acclaimed Contributor

Re: Strange Characters

Rember that tr can use the \nnn to represent octal character values. B1h=\261 (octal)
If it ain't broke, I can fix that.
Anonymous
Not applicable

Re: Strange Characters

for such conversions I prefer sed, eg like
f=$VAR/$(echo $i:$DEPOT | sed "s%/%=%g")

man ascii does not ref b1...
you may want to give cut'n paste a shot
or...how to translate ";" to a "newline"?
cat $s | tr ";" "\012"

so what about
cat $s | tr "\0b1" "\020"
?
James Beamish-White
Trusted Contributor

Re: Strange Characters

Hiya,

You might want to try a script that does a simple test to find what character that is. I.e.

if [ "\'" = "\'" ] ; then
echo This is it!
fi

...and cycle through a few possibilities (grav, simgle quote, Alt-0145, Alt-0146, Crtl-Alt-').

And when you are doing the tr, you may have to escape the character if it is special - like the grav ( ` )

Cheers!
James
GARDENOFEDEN> create light
harry d brown jr
Honored Contributor

Re: Strange Characters

Shannon,

There is a new feature in our profiles that allows us to look at all of the previous questions we have posted, which is especially helpful when trying to resove this:

This member has assigned points to 19 of 61 responses to his/her questions

live free or die
harry
Live Free or Die
Volker Borowski
Honored Contributor

Re: Strange Characters

Shannon,

check out your stty setting "istrip -istrip".
It might be that the filename is stored with each char representing a byte, but when you do a ls on a terminal with "istrip" set, it might be that it is reduced to 7 bit on the screen.
Check "cs7 cs8" in addition.

It might be that what you see is not what is stored, if it comes to special chars in terminal sessions (ever had to deal with Umlauts in german environments ?).

Hope this will help
Volker
Shannon Petry
Honored Contributor

Re: Strange Characters

First to address harry, the 19 of 61 is correct and what is relevant. It takes answers to questions to assign points, and most of the questions I have asked had 0 answers, with the exception of 2 comment posts.
I have been here 2 years longer than you, so know the rules of the Forums.
Any questions that had answers to them have been awarded points, and magic answers where applicable.

So my comment back to you is check the facts before making statements and judgements!


Now back to the question:

Since this character is not a standard single quote I can not use "tr" or "sed" to just swap the character.

Someone mentioned a script to find it's octal value, how would I do this?

Thanks,
Shannon
Microsoft. When do you want a virus today?
Shannon Petry
Honored Contributor

Re: Strange Characters

Doing the ls with an istrip did change the character to a "C" with a hook on the bottom of it. Time to get the ASCII guide out ;/
Microsoft. When do you want a virus today?
A. Clay Stephenson
Acclaimed Contributor

Re: Strange Characters

You can use the ls -b filename command to force printing of non-ASCII characters in \nnn
octal format. You can then set up a filter to look for \nnn and tr -d "\xxx\yyy\zzz" where xxx, yyy, and zzz are octal values to remove the spurious characters.
If it ain't broke, I can fix that.
Shannon Petry
Honored Contributor

Re: Strange Characters

I can use the ls -b, but it does not work on all of my systems. Solaris prints the character, as well as AIX. My thought is that they understand the full ASCII table like DOS, where HP-UX does not support the extended ASCII tables.

Any other way to get the ascii value instead of char value in these cases?
Microsoft. When do you want a virus today?
Frank Slootweg
Honored Contributor

Re: Strange Characters

It is unclear to me what your remaining problem is.

People mentioned to use tr(1) with its octal notation ('\OOO').

You seem to have problems determining the right octal value of the 23rd and 26th character. If so, you can just use "od" (octal dump) instead of "od -x", i.e.

ls | od

I.e. in short: Use "od" (without "-x") to determine the octal code(s) and then use tr(1) to translate that/those octal code(s) to slashes.

I hope this helps.

As to points assignment: In your own interest, you should assign points to all responses. If a response is of no value, you can assign 0 points (There used to be a "N/A" choice as well, but it might be gone after the lasest update.)
Shannon Petry
Honored Contributor

Re: Strange Characters

Frank,

Your correct that the two answers Clay provided fix the problem. I cant go back and change points though to give the magic answer.
The solution is a bit obscure, as the filenames the script reads will change constantly, and it is difficult to test each name and character. (23rd and 26th char will obviously change in each filename), thus the strange characters will move around.
Making a bit more confusing is that HP-UX, Solaris, AIX, and Irix all support different character tables. Thus I can strip out the "'" looking character in Solaris, but it shows up differently in the other Unices.

So, if Clay responds back, I can assign the magic answer to him.

Regards,
Shanno
Microsoft. When do you want a virus today?