Operating System - Linux
1826426 Members
3477 Online
109692 Solutions
New Discussion

simplest way to strip a character from a filename

 
SOLVED
Go to solution
Daavid Turnbull
Frequent Advisor

simplest way to strip a character from a filename

I have a gazzilion tiny files each with multiple ':' in the file name. I need to strip these out.

A sample file name is: x_0041257026_102110535974_2006-01-26T20:33:31.272Z.xml.28_00_05.Z

I wrote a clumbsy perl script to do it which is working but it soooo slooooow.

Is there a mean, easy and efficient way to do this?
Behold the turtle for he makes not progress unless he pokes his head out.
16 REPLIES 16
James R. Ferguson
Acclaimed Contributor

Re: simplest way to strip a character from a filename

Hi David:

# perl -ple 's/\.//g;s/Z$/\.Z/'

As for example:

# echo "12.3\nabc_def.xyz\ndavid.Z"|perl -ple 's/\.//g;s/Z$/\.Z/'

(or):

# perl -ple 's/\.//g;s/Z$/\.Z/' fileofnames

This (crudely) preserves the ".Z" extension if present.

Regards!

...JRF...


Geoff Wild
Honored Contributor

Re: simplest way to strip a character from a filename

Don't have a solution for the renaming, but as far as generation of the files - did you fix that so that it doesn't do the :

Looks like a time stamp...

Could do something like:

date '+%FT%H%M%S'

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Sandman!
Honored Contributor

Re: simplest way to strip a character from a filename

Hi,

Try this awk construct...

# ls -1 | awk '{gsub(":",""); print $0}'

cheers!
Daavid Turnbull
Frequent Advisor

Re: simplest way to strip a character from a filename

I am not sure that this is particularly efficient because it invokes perl for each mv but this is currently doing what I want:

for file in *
> do
> mv $file `echo $file | perl -ple 's/\://g'`
> done

Note that the char I wished to replace was a : and not a .

Behold the turtle for he makes not progress unless he pokes his head out.
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: simplest way to strip a character from a filename

Something like this should work leveraging the tr command:

ls | while read FNAME
do
FNAME2=$(echo "${FNAME}" | tr -d ":")
if [[ "${FNAME}" != "${FNAME2}" && -n "${FNAME2}" ]]
then
if [[ -r "${FNAME2}" ]]
then
echo "${FNAME2} exists; can't mv" >&2
else
mv "${FNAME}" "${FNAME2}"
fi
fi
done



The idea is that we use tr -d to strip the ':'s from the filename. Next we check to see if the filenames are then diffirent and also that the filename still has a non-zero length. Next we make sure that the new filename does not already exist; if so, that file is skipped. Finally, after all the tests have passed we mv the old name to the new.

Note the "'s around each filename. Whitespace is perfectly legal (if dumb) in UNIX pathnames.

This should be a robust solution.
If it ain't broke, I can fix that.
Muthukumar_5
Honored Contributor

Re: simplest way to strip a character from a filename

for file in *
> do
> mv $file `echo $file | perl -ple 's/\://g'`
> done
==

Simply as,

for file in `ls`
do

mv ${file} $(echo ${file} | perl -pe 's/://g')

done

--
Muthu
Easy to suggest when don't know about the problem!
Muthukumar_5
Honored Contributor

Re: simplest way to strip a character from a filename

There is no need to negate with \ for : character. IT is needed for * ? \ / characters. Simply use s/://g is enough for pattern change.

--
Muthu
Easy to suggest when don't know about the problem!
Daavid Turnbull
Frequent Advisor

Re: simplest way to strip a character from a filename

Because it needs to be done a gazzillion times (well somewhere between 100000 and 1000000) am I correct in assuming that the over head of starting perl as opposed to tr would be a factor in the load it places on the machine?
Behold the turtle for he makes not progress unless he pokes his head out.
Arunvijai_4
Honored Contributor

Re: simplest way to strip a character from a filename

Hi David,

For processing large number of files, perl is the best way to go. It has proven ability when it comes to large number.

-Arun
"A ship in the harbor is safe, but that is not what ships are built for"
Daavid Turnbull
Frequent Advisor

Re: simplest way to strip a character from a filename

Dear Arun - If the whole script was Perl I know you would be right but in this instance the majority of the file handling is done by the shell which invokes perl for every file name. Hence my question about the overhead of "invoking" perl over tr. I had assumed that tr would have less overhead because it is smaller but this is not the only factor.

In the script that was processing these files I have changed the line that moves the file to read:

mv $file $(echo "$dest/$file.$dayTimeStr" | tr ":" "_")

(Looking at the script as a whole these days I would have written the whole script in perl which would no doubt have solved a lot of problems and made it a lot easier to maintain.)
Behold the turtle for he makes not progress unless he pokes his head out.
Muthukumar_5
Honored Contributor

Re: simplest way to strip a character from a filename

Perl is utilizing more cpu resource. It is good to go with default utilities like sed or tr or awk also.

mv $file $(echo "$dest/$file.$dayTimeStr" | tr ":" "_")


It is not good always.

Use as,

mv ${file} $(echo "${dest}/${file}.${dayTimeStr}" | tr ":" "_")

It is good.

PS: Do you want to remove : or change to _ ?

--
Muthu
Easy to suggest when don't know about the problem!
Muthukumar_5
Honored Contributor

Re: simplest way to strip a character from a filename

If you want to maintain load then use tr method or sed instead of perl.

mv ${file} $(echo ${file}| sed -e 's/://g'

If you want to have SPEED use perl always.

mv ${file} $(echo ${file}| perl -pe 's/://g'

--
Muthu
Easy to suggest when don't know about the problem!
Arunvijai_4
Honored Contributor

Re: simplest way to strip a character from a filename

Hi David,

You said, If the whole script was Perl I know you would be right but in this instance the majority of the file handling is done by the shell which invokes perl for every file name. Hence my question about the overhead of "invoking" perl over tr. I had assumed that tr would have less overhead because it is smaller but this is not the only factor.

In the script that was processing these files I have changed the line that moves the file to read:

mv $file $(echo "$dest/$file.$dayTimeStr" | tr ":" "_")

When it comes to handle lot of files, i dont think unix default utilities will play a big part. I am not sure "tr" and "mv" are multithreaded as well. Perl with combination Unix shell utilities is a good way to strike.

-Arun
"A ship in the harbor is safe, but that is not what ships are built for"
Sandman!
Honored Contributor

Re: simplest way to strip a character from a filename

Hi Daavid,

After careful deliberation, it looks like you need to replace the colon characters and rename the files.

Here's an awk construct that would help. It assumes that your curent working dir is the one that has a guzzillion of those tiny files.

# ls -1 | awk '{x=$0;gsub(":","");system("mv "x" "$0)}'

cheers!
A. Clay Stephenson
Acclaimed Contributor

Re: simplest way to strip a character from a filename

You need to recognize that your real bottleneck in this case is not Perl/Shell/external commands but rather the large (I assume a gazzilion is a large number) number of entries in a directory. The overhead of safely rewriting each directory entry is quite high. UNIX has to make certain that no other processes are altering the directory at the same time. Even if written in very tight C this would still be a time-consuming operation. On the other hand, this sounds like a one-time deal so let it run all night or over a weekend and declare victory.
If it ain't broke, I can fix that.
Daavid Turnbull
Frequent Advisor

Re: simplest way to strip a character from a filename

Dear All,

Thanks for the responses.

I managed to replace the colons in about 90000 file names and fix the script that was responsible for creating them.

Clay, your last post improved my understanding of why it was taking so long a lot.

It did not help that the machine was out of disk space (hence my renaming requirement - I needed to archive the offending files to a Windows environment where colons in file names are verboten.) At one point I was getting errors doing the actual rename because of the lack of disk space.

I would probably still be trying to pull my hair out if if it were not for this forum.
Behold the turtle for he makes not progress unless he pokes his head out.