topic Re: Data formating in Operating System - HP-UX

Data formating

Victor Pavon — Mon, 16 Feb 2004 18:45:24 GMT

Hello:

I was put to the task of formating data blocks from our legacy system to a flat ascii format. I have created a simple script that worked great until the programmers started sending huge data records beyond the old system maximun of 756 characters. We have agreed to insert an undescore (_) on position 756 indicating that the next line were to be continuation of the same record. So, I came to the idea of searching data until it finds an undescore and a line feed (_\012) and delete the pair, hence concatenating the next line. This was not as simple as I had planned. See model script bellow:
...
# Split data stream into 756 chunks and delete trailing spaces
fold -b -w756 $1 | sed 's/;[ ]*/;/g' > $1.foo
# replace '_' with 'u' and '\012' with 'l'
sed 's/_/u/g' $1.foo | tr '\012' 'l' > $1.mid
# add an end of record for sed to see bottom of file
echo '' >> $1.mid
# Delete 'ul' (concactenate), add linefeed and replace the _ were they should be
sed 's/ul//g' $1.mid | tr 'l' '\012' | tr 'u' '_' > $1.dat
...

The problem with this is that Iam IOing like a madd man. The script is building 3 times as many files to make one usable output file. Also, input data may have l's and u's in it, causing unexpected results.
This is a question for all of you sed, awk and perl masters.
Is there a way get a flat ASCII file using the least middle steps? May be a oneliner (or two)?

Appreciate any ideas. Included is a tiny sample input file.

Re: Data formating

curt larson_1 — Mon, 16 Feb 2004 19:41:45 GMT

maybe this will work for you

fold -b -w756 $1 |
while read line
do
#everything but last char
x=${line%?}
lastChar=${line#$x}
if [[ $lastChar = "_" ]] ;then
#print line without the ending underscore
#and without a newline
print -nr $x
else
print -r $line
done > $1.dat

Re: Data formating

curt larson_1 — Mon, 16 Feb 2004 19:54:50 GMT

same thing using awk

fold -b -w756 $1 |
awk '{
x=length($0);
lastChar=substr($0,x,1);
b=substr($0,1,x-1);
if ( lastChar == "_" )
printf("%s",b);
else
printf("%s\n",$0);
}'

Re: Data formating

Victor Pavon — Wed, 18 Feb 2004 09:48:41 GMT

Thank you Curt, both solutions are a winner. My personal preference is for awk but as the great master said: "There is more than one way to skin a cat"
Thanks again.
Victor