1843273 Members
2724 Online
110214 Solutions
New Discussion

script help

 
SOLVED
Go to solution
kholikt
Super Advisor

script help

I am writing some script to parse the tab delimited file. I tried to replace all the tab to semicolumn.

The input file looks like this.

abc_abcsgsap_PRD Aborted trans 12/16/03 05:00:00 1071522000 12/16/03 09:38:30 1071538710 0:004:38 0.00 0 1 4 0 0 1 0 1 0 0% root.sys@mwscm.esc 2003/12/16-12
abc_abcsapdev_DEV Completed trans 12/16/03 05:00:06 1071522006 12/16/03 05:02:02 1071522122 0:00 0:01 0.18 1 0 2 0 0 0 5 5 5 100% root.sys@mwscm.esc 2003/12/16-13
abc_abcgqa_QAS Completed trans 12/16/03 05:00:11 1071522011 12/16/03 05:03:19 1071522199 0:000:03 0.32 1 0 3 0 0 0 4 4 4 100% root.sys@mwscm.esc 2003/12/16-14
abc_abcsgsap In Progress full 12/16/03 13:19:29 1071551969 - 0 0:00 0:14 6.14 1 04 1 4 0 3 8 12822 38% SPOP\E-SERVICES@spop.esc 2003/12/16-27

After processing the output should be like this.
field1;field2;field3

I have attached part of my script. Currently the problem is my script cannot differentiate between tab and space if the input file contain a field like "In Progress", it will actually create as field1;In;Progress; intead of field1;In Progress.
abc
8 REPLIES 8
Siem Korteweg
Advisor
Solution

Re: script help

Use sed to replace all tabs:

echo "${LINE}" | sed -e 's/ /;/g' >>${DESTDIR}/${FILENAME}

Be sure to use a TAB between the first two slashes.

When you want only specific fields, use awk with a TAB as field-separator:

echo "${LINE}" | awk -F" " '{ print ... }' >>${DESTDIR}/${FILENAME}

Again, be sure to use a TAB as the argument of the -F option of awk.
Mark Grant
Honored Contributor

Re: script help

You could try this

cat myfile | tr "\t" ":"
Never preceed any demonstration with anything more predictive than "watch this"
Elmar P. Kolkman
Honored Contributor

Re: script help

Some shells have problems with the 'TAB'. When typing it in on the command line, first enter a '\' and then the 'TAB' or first a 'CTRL-V' and then the 'TAB'. If it shows up as a 'CTRL-I' ('^I'), don't despair: a 'TAB' is the same as a 'CTRL-I'.

The replacement doesn't need the echo. It can be done this way:
sed 's|^I|;|g' output.ssf

Every problem has at least one solution. Only some solutions are harder to find.
Graham Cameron_1
Honored Contributor

Re: script help

From your script I see you are using awk to extract just certain fields, not all of them.
Therefore your simplest way forward is an amalgam of the above suggestions, just change
awk '{printf $1 ...
to
awk -F"\t" '{printf $1 ...

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
kholikt
Super Advisor

Re: script help

Thanks for all the help.

Just one more question, is there any easyway in sed I can remove some fields using semicolon as delimitier.

For example I have

field1;field2;field3;field4

I want to remove all field2 and field4 from the file.

Currently I use a loop to read through line by line and do a print $1 $3 to just print out field1;field3.

I think this is very troublesome

abc
Graham Cameron_1
Honored Contributor

Re: script help

To answer your last question, (I am more familiar with awk than with sed)....

awk -F";" '{printf ("%s;%s\n", $1, $3)}' infile > outfile

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Mark Grant
Honored Contributor

Re: script help

You probably need to do something like

awk -F";" '{print $1,$4}' datafile

Although this looks like the same thing, if your data is already in the file datafile or if you pipe the data into this awk thing then you will not need to do the loop yourself and it will be much faster.

Never preceed any demonstration with anything more predictive than "watch this"
Elmar P. Kolkman
Honored Contributor

Re: script help

You can do it with cut:
cut -d\; -f1,3,5-

Using sed, it becomes trickier...
It depends on how many fields you have in the input. If it is always more then 4, sed would look like this:
sed 's|^\([^;]*;\)[^;]*;\([^;]*;\)[^;]*;\(.*\)$|\1\2\3|'

I would go for cut or awk, not for sed in this case...
Every problem has at least one solution. Only some solutions are harder to find.