Operating System - HP-UX
1753819 Members
8776 Online
108805 Solutions
New Discussion юеВ

Getting rid of extra spaces and tabs???

 
SOLVED
Go to solution
Robert Fisher_1
Frequent Advisor

Getting rid of extra spaces and tabs???

Hi Gang,

I have a large textfile (5,000 lines or so) that originally came from a well-known Windows word processor. After running it through a re-formatter the file is filled with extra spaces and tabs between words. Some lines have extra spaces and tabs at the beginning and end of the lines as well. For example:

[TAB] Test1 [TAB] [TAB] Test2 [TAB]
should be
Test1 Test2

I would like to remove all the extra spaces and tabs. All the spaces at the beginning and end of the line should be removed and multiple spaces and tabs inside each line should be replaced with a single space. Does anyone have an idea? Sed? Awk? Help!!!

TIA, Bob
8 REPLIES 8
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Getting rid of extra spaces and tabs???

Hi Bob:

While sed or awk could be used, my weapon of choice for this is Perl. Perl has the substitution operator with '/s+/ which can represent one or more whitespace characters. You don't have to specify separate code for tabs or sspaces or other whitespace. Perl's pattern match is really regular expressions on steroids.

I already had a pet subroutine 'trim_ws' to remove lead/trailing whitespace so I simply added a trim_middle_whitespace. Less than 2 minutes of Perl.

cat oldfile | strip.pl > newfile

This should be very close, Clay
If it ain't broke, I can fix that.
James R. Ferguson
Acclaimed Contributor

Re: Getting rid of extra spaces and tabs???

Hi Robert:

You could use 'vi' or 'sed' and strip unnecessary spaces and tabs from the ends of lines with:

s/[ \t]*$//

...and from the beginning, with:

s/[ \t]*^//

You can substitute pressing the tab key on your keyboard in lieu of the '\t'.

You could also use 'expand' to change tabs to spaces (see man 'expand').

Regards!

...JRF...
Charles McCary
Valued Contributor

Re: Getting rid of extra spaces and tabs???

Robert,

to replace spaces with nothing....
sed 's/ //g'

to replace tabs with nothing...
sed 's/ //g'

tx,
c
SHABU KHAN
Trusted Contributor

Re: Getting rid of extra spaces and tabs???

Hi,

If you need only one space between fields then use the tr command

cat test | tr -s "\t" " "

will trim all the uneven spaces/tabs to one space ...

Thanks,
Shabu
James R. Ferguson
Acclaimed Contributor

Re: Getting rid of extra spaces and tabs???

Hi (again) Robert:

A correction. To strip leading spaces and/or tabs, use:

# sed 's/^[ \t]*//'

...substituting the actual keyboard TAB for the \t

...and similarly for trailing spaces and/or tabs:

# sed 's/[ \t]*$//'

Regards!

...JRF...

Sachin Patel
Honored Contributor

Re: Getting rid of extra spaces and tabs???

Hi Robert,
My choice is sed

# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left
cat file | sed 's/^[ \t]*//' >newfile
#mv newfile file

Then
# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left
cat file | sed 's/^[ \t]*//' > newfile
#mv newfile file

Then remove extraspace from middle
cat file | sed 's/\s+/\s/g' > newfile
#mv newfile file

Sachin

Is photography a hobby or another way to spend $
Sachin Patel
Honored Contributor

Re: Getting rid of extra spaces and tabs???

opps I type same thing again in second step.

# delete trailing whitespace (spaces, tabs) from end of each line
#cat file sed 's/[ \t]*$//' > newfile
#mv newfile file

Sachin
Is photography a hobby or another way to spend $
Robert Fisher_1
Frequent Advisor

Re: Getting rid of extra spaces and tabs???

Thanks everyone for your fast responses. I should have mentiioned that I needed a script solution because I will be doing this on a weekly basis. Vi would not be a good answer. Because the re-formatter I am using is written in perl, I saw how to add A. Clay's two subroutines to that script and now it's perfect.

Thanks again, Bob.