System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Removing multiple bad characters from files

SOLVED
Go to solution
hamlite42
Occasional Advisor

Removing multiple bad characters from files

I have multiple files that I need to process in Unix that have bad characters (parentheses, spaces, dashes) in the file name. How could I remove the bad characters, but keep the basic file name?

Example BAD:
Testfile(1000171)20090112_154012.xml

Monthly Data_1000171_12-07-2008_2009-01-20_11.13.27.xml

Example Good:
Testfile_1000171_20090112.xml
Monthly_Data_1000171_20090120.xml
5 REPLIES
Steven Schweda
Honored Contributor
Solution

Re: Removing multiple bad characters from files

"man sed".

bash$ echo 'Testfile(1000171)20090112_154012.xml' | \
sed -e 'y/() /___/'
Testfile_1000171_20090112_154012.xml

What's bad about hyphens? You can certainly
remove or translate them, but they're not
shell-special. Why not leave them?
James R. Ferguson
Acclaimed Contributor

Re: Removing multiple bad characters from files

Hi:

Note that Steven used the transform operator (function) equivalent to 'tr' in the shell. This is much faster than an equivalent substition operation -- a nice touch here.

Regards!

...JRF...
Hein van den Heuvel
Honored Contributor

Re: Removing multiple bad characters from files


Here is a PERL alternative:

First a dry run:

$ touch "a(b)c.tmp"
$ touch "xa((b)c.tmp"
$ perl -e 'for (@ARGV) { $old=$_; if (s/[()-]+/_/g) { print qq(rename $old, $_ \n)} }' *.tmp

rename a(b)c.tmp, a_b_c.tmp
rename xa((b)c.tmp, xa_b_c.tmp


Remove the print( \n) to make it real:

perl -e 'for (@ARGV) { $old=$_; if (s/[()-]+/_/g) { print qq(rename $old, $_ \n)} }' *.tmp

$ ls *.tmp
a_b_c.tmp xa_b_c.tmp

fwiw,
Hein.


Hein van den Heuvel
Honored Contributor

Re: Removing multiple bad characters from files


I just noticed how the substitution is not only supposed to pick out 'odd' characters, but is also expected to do full transformations of a date format.

Much more fun, but much the same...

$ touch "Monthly Data_1000171_12-07-2008_2009-01-20_11.13.27.tmp"
$ touch "Testfile(1000171)20090112_154012.tmp"

$ perl -e 'for (@ARGV) { $old=$_; if (s/[()]+/_/g + s/(\d{4})-(\d\d)-(\d\d_\d\d\.\d\d\.\d\d/$1$2$3/g) { print qq(rename $old, $_ \n)} }' *.tmp
rename Monthly Data_1000171_12-07-2008_2009-01-20_11.13.27.tmp, Monthly Data_1000171_12-07-2008_2009
0120.tmp
rename Testfile(1000171)20090112_154012.tmp, Testfile_1000171_20090112_154012.tmp

So here we ADD the number of substitutions done by
1: s/[()]+/_/g
with those done by
2: s/(\d{4})-(\d\d)-(\d\d_\d\d\.\d\d\.\d\d/$1$2$3/

If the result is true (not-zero), then we call 'print' at first, and change to rename when confident.

Hein.


hamlite42
Occasional Advisor

Re: Removing multiple bad characters from files

Thanks for all the great answers!