Operating System - HP-UX
1831422 Members
3019 Online
110025 Solutions
New Discussion

Using sed or awk to validate a file content

 
SOLVED
Go to solution
Melvin Thong
Advisor

Using sed or awk to validate a file content

Hi Unix Gurus,

I am new to sed and awk command. Currently I have a text file that has lines of records. The record fields are delimited by ";". A line of record has 26 fields and separated by 25 semi-colons ";".

eg:
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z

Our objective is to have a shell script that can validate the text file whether the records are complete in term of number of columns. Hence my idea is to count the number of delimiters on each of the records in the text file.

If the count of delimiters has 25, then the complete record will be written to another file else if the delimiter count has less than 25, then I have to write the record to a reject file.

Can the above objective be accomplished using the sed or awk command? I am stuck at this point using sed or awk to meet the objective.

Any suggestion is very much appreciated. Thank you in advance!

Regards,
Melvin
12 REPLIES 12
Hein van den Heuvel
Honored Contributor

Re: Using sed or awk to validate a file content

As Awk reads a line it will split it into words and setup the word coutn in a variable NF.
Awk default word seperator is white-space.
But you can make it anything including a semicolon (but you'll need to escape it for the shell).
The two notions combined solve your problem:

awk -F\; '(NF==26); (NF!=26) {print $0 >> "bad-file"}' < mixed-file > good-file


Myself, I'd spend a few minutes more and cerate a PERL scripts to open good, bad, loop over the input, print left or right and close.

hth,
Hein.
Elmar P. Kolkman
Honored Contributor

Re: Using sed or awk to validate a file content

It cannot be done simple by using sed, though it is easy to do by combining sed with wc:

cat files | while read line
do
if [ $(echo "$line" | sed 's|[^;]||g' | wc -c) -ne 26 ]
then
echo $line > rejectfile
else
echo $line > another_file
fi

done

This should result the number of ';' plus 1 which indicates the number of columns. The plus 1 is because of the newline 'character' at the end of the line, which is counted.
Every problem has at least one solution. Only some solutions are harder to find.
john korterman
Honored Contributor

Re: Using sed or awk to validate a file content

Hi Melvin,
a more traditional approach:

#!/usr/bin/sh
# Check CORRECT num of delim in $1

typeset -i NUM_CHARS=0 AFTER_DEL=0 DIFF=0 CORRECT=25
typeset DELIM="\;"
OK_FILE=./okrecs
REJ_FILE=./rejrecs

while read line
do
NUM_CHARS=$(echo "$line"| wc -c)
AFTER_DEL=$(echo "$line" | tr -d $DELIM |wc -c)
DIFF=$(( $NUM_CHARS - $AFTER_DEL ))
if [ "$DIFF" = "$CORRECT" ]
then
echo "$line" >> $OK_FILE
else
echo "$line" >> $REJ_FILE

fi
done <$1

Set OK_FILE and REJ_FILE to something appropriate and run it with your infile as $1.

regards,
John K.
it would be nice if you always got a second chance
Elmar P. Kolkman
Honored Contributor

Re: Using sed or awk to validate a file content

Oops... I see my solution truncates the resulting files, returning only the last reject and good line. Should be >> instead of > in the echo's, but you had already seen that of course ;-)
Every problem has at least one solution. Only some solutions are harder to find.
Hein van den Heuvel
Honored Contributor

Re: Using sed or awk to validate a file content


I'm curious to know why, judging by assigned points, the author appears to value a simple one-line solution with the requested tool less than a broken complex solution with will perform like a dog due multiple forks per record.
(And there was even a bonus solution explanation! :-)

The explanation could be that I simply read too much in the points assigned. The author assigning medium points to a first solution, seeing if something better still will show up.

[no, I don't need more points, my boss might get worried where I found the time :^].

Just curious,
Hein.
Todd McDaniel_1
Honored Contributor

Re: Using sed or awk to validate a file content

A great book you need to get is:

The AWK Programming Language by the original authors of the language... Aho, Weinberger, Kernigan

It is a great little book and you will be a master of awk after you are done. also the Awk/sed book from O'reilly is good as well. should be able to find it at a used book store for $5.
Unix, the other white meat.
Graham Cameron_1
Honored Contributor
Solution

Re: Using sed or awk to validate a file content

I'm with Hein on this.
My one-liner would be marginally shorter still...

awk -F\; '(NF!=26) {print >> "bad-file";next} {print}' mixed-file > good-file

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Jean-Luc Oudart
Honored Contributor

Re: Using sed or awk to validate a file content

cat | awk '{
if(split($0,a,";")!=26) {print "bad"; exit}
}'

Rgds
Jean-Luc
fiat lux
H.Merijn Brand (procura
Honored Contributor

Re: Using sed or awk to validate a file content

Why does no-one suggest perl?

# perl -paF\; -e'select(@F==26?STDOUT:STDERR)' mixed-file > good-file 2>bad-file

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Elmar P. Kolkman
Honored Contributor

Re: Using sed or awk to validate a file content

Perhaps because the question was to do it in sed or awk? ;-)
Every problem has at least one solution. Only some solutions are harder to find.
Melvin Thong
Advisor

Re: Using sed or awk to validate a file content

I am sorry Hein, if my point assignment somehow made you questionable. Actually, I valued your input and explanation. It was indeed a good simple one line solution!

After looking through all the replies, I found that the solutions provided met the objective. Surprisingly, the performance (execution speed) of all these different ways of syntax and coding is almost equal. I tried the execution with a big file. Perhaps as Hein commented, different ways of coding might take more resources though. Well, that's a good point to consider!

Lastly, I appreciate all your contributions here. Thank you very much!
Hein van den Heuvel
Honored Contributor

Re: Using sed or awk to validate a file content

And thank you for a quick closure message!
Hein.