1833704 Members
2535 Online
110062 Solutions
New Discussion

Data formating

 
SOLVED
Go to solution
Victor Pavon
Advisor

Data formating

Hello:

I was put to the task of formating data blocks from our legacy system to a flat ascii format. I have created a simple script that worked great until the programmers started sending huge data records beyond the old system maximun of 756 characters. We have agreed to insert an undescore (_) on position 756 indicating that the next line were to be continuation of the same record. So, I came to the idea of searching data until it finds an undescore and a line feed (_\012) and delete the pair, hence concatenating the next line. This was not as simple as I had planned. See model script bellow:
...
# Split data stream into 756 chunks and delete trailing spaces
fold -b -w756 $1 | sed 's/;[ ]*/;/g' > $1.foo
# replace '_' with 'u' and '\012' with 'l'
sed 's/_/u/g' $1.foo | tr '\012' 'l' > $1.mid
# add an end of record for sed to see bottom of file
echo '' >> $1.mid
# Delete 'ul' (concactenate), add linefeed and replace the _ were they should be
sed 's/ul//g' $1.mid | tr 'l' '\012' | tr 'u' '_' > $1.dat
...

The problem with this is that Iam IOing like a madd man. The script is building 3 times as many files to make one usable output file. Also, input data may have l's and u's in it, causing unexpected results.
This is a question for all of you sed, awk and perl masters.
Is there a way get a flat ASCII file using the least middle steps? May be a oneliner (or two)?

Appreciate any ideas. Included is a tiny sample input file.
3 REPLIES 3
curt larson_1
Honored Contributor

Re: Data formating

maybe this will work for you

fold -b -w756 $1 |
while read line
do
#everything but last char
x=${line%?}
lastChar=${line#$x}
if [[ $lastChar = "_" ]] ;then
#print line without the ending underscore
#and without a newline
print -nr $x
else
print -r $line
done > $1.dat
curt larson_1
Honored Contributor
Solution

Re: Data formating

same thing using awk

fold -b -w756 $1 |
awk '{
x=length($0);
lastChar=substr($0,x,1);
b=substr($0,1,x-1);
if ( lastChar == "_" )
printf("%s",b);
else
printf("%s\n",$0);
}'
Victor Pavon
Advisor

Re: Data formating

Thank you Curt, both solutions are a winner. My personal preference is for awk but as the great master said: "There is more than one way to skin a cat"
Thanks again.
Victor