Operating System - HP-UX
1829904 Members
2087 Online
109993 Solutions
New Discussion

Remove Control Characters ( ¾^ , Control-L, Control-C )

 
lnair
Occasional Contributor

Remove Control Characters ( ¾^ , Control-L, Control-C )

Data content within File looks so different with all the control charcaters ( ¾^ , ^C, ^V , ^L etc.. )


Could you please assist:

1) How to get rid of all these control characters.
2) Also which format is this control character, Could'nt understand these characters;
3) How does these control characters gets created .
5 REPLIES 5
curt larson_1
Honored Contributor

Re: Remove Control Characters ( ¾^ , Control-L, Control-C )

1) How to get rid of all these control characters.

you could use the commands, strings, vis, od/xd.

3) How does these control characters gets created

your usually looking at binary data
Leif Halvarsson_2
Honored Contributor

Re: Remove Control Characters ( ¾^ , Control-L, Control-C )

Hi,
With tr it is possible to translate (or remove) character classes (for example "cntrl"). Have a look at the man page for tr.
A. Clay Stephenson
Acclaimed Contributor

Re: Remove Control Characters ( ¾^ , Control-L, Control-C )

This is one approach:

tr -cd "[ -~\012]" < infile > outfile

That will strip anything not ' ' thru '~' or a LF.

If you don't know where these characters are coming from, how can we? These could easily be terminal escape sequences to position characters on the screen.
If it ain't broke, I can fix that.
Elmar P. Kolkman
Honored Contributor

Re: Remove Control Characters ( ¾^ , Control-L, Control-C )

1) You could also try to use:
cat -tv | sed 's/[\^].//g'

Mind this might remove more then you want... like . And regular carrots will be removed too.

You could replace the . with [@CLM] or things like that to remove a specific range of CTRL strings.

2) ^L == Form feed -> get to next page when printing to printer
^C == ? Normally no special meaning, but could be part of some printer language.
The other one is not readable to us, but that could be caused by it being a ASCII code outside the default readable ASCII range of 32-126. The first 27 ASCII values are the control characters etc.

3) In most cases the file is a data file, containing binary data. For instance, when writing the number 3 to a file, it will show as a -C, and not the human readable '3', which is in fact ASCI code 51... For small numbers it is just as efficient, but for larger numbers, like above 10000, you would need 5 bytes to store it in human readable form, while you still need 2 bytes to store it in binary form. And it is faster to read, because a program doesn't have to bother with translation from and to human readable form...

So it completely depends on what generated these files how to make it completely readable. In most cases you will need a good translation program which is created to understand the structure of a file.
Every problem has at least one solution. Only some solutions are harder to find.
Mark Grant
Honored Contributor

Re: Remove Control Characters ( ¾^ , Control-L, Control-C )

Bear in mind also that those "control characters" but be needed in the file. Remove them from a data file and that file might not work anymore.

If you KNOW it is supposed to be just a plain old ascii file then fine, strip them out as mentioned above, otherwise, I'd leave it as it is.
Never preceed any demonstration with anything more predictive than "watch this"