1745821 Members
4195 Online
108722 Solutions
New Discussion юеВ

Re: scripting question

 
SOLVED
Go to solution
Gary Glick
Frequent Advisor

scripting question

Hi all,

Here's what I need help with:
I receive a data file with several records inside and each record has several lines.

The file follows the following format:
H0001
D0001
D0001
D0001
T0001
H0002
D0002
D0002
T0002
H0003
...., etc.


The H indicates a Header Record, the D a Detail Record and T a Terminator Record.
The H,D or T is actually the first character of the line and the number of Detail lines varies.

What I need to do is split the single file into multiple files so that each H-D-D-D-T section of the file is in a separate file with a different name. The file name is doesn't matter much, perhaps a file.timestamp format. I'm not getting very far with it and was wondering if I could get some help or direction.

I've got perl on the system but I'm not very conversant with it, a shell script would be prefered for longer term support. Or, of course, I could start learning perl ;-)

Thanks a Lot

Gary
10 REPLIES 10
curt larson_1
Honored Contributor

Re: scripting question

just a quicky so it is untested

file="f"
num=0

cat yourFile |
while read var
do
case $var in
H*) ((num = $num + 1))
print "$var" > ${file}${num}
;;
D*|H*) print "$var" >> ${file}${num}
;;
esac
done
Patrick Wallek
Honored Contributor

Re: scripting question

Hmmm.....

Here's something off the top of my head ---

#!/usr/bin/sh

while read LINE
do
FIRST=$(echo $LINE | cut -c 1)
if [ "${FIRST}" = "H" ] ; then
FILE=H_$(date +%m%d%Y)_$(date +%H%M%S)
echo ${LINE} >> ${FILE}
else
echo ${LINE} >> ${FILE}
fi


This should give you a file named like H_05192004_154715 starting with an H line, it will write to that file until it finds another record with H as the first character then it will start a new file. I haven't tested, but I think it should work.
curt larson_1
Honored Contributor

Re: scripting question

awk might be a bit faster

cat yourFile | awk '
BEGIN {name="f";num=0;}
/^H/ {
num += 1;
fname=sprintf("%s%d",name,num);
print $0 > fname;
next;
}
/^D/ {
print $0 >> fname
next;
}
/^T/ {
print $0 >> fname
}
John Poff
Honored Contributor

Re: scripting question

Hi,

Here is one way to do it in Perl:

#!/usr/bin/perl
while (<>)
{
if (/^H/){
close (OUTF);
$count++;
$outfile="FILE." . $count;
open(OUTF,">$outfile") or die "Can't open output file $outfile";
}
print OUTF $_;
}


JP
Dave La Mar
Honored Contributor

Re: scripting question

Gary -
We do just this thing on a similar data file.
Attached is a snip of the process with our naming convention edited.
The array allows the cylcle through and sed of the lines you want printed to separate files.
Not this snip is based on each new record starting with H, and H does not appear in the data portion.

Best of luck.

Regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."
Marvin Strong
Honored Contributor

Re: scripting question

perl -ne 'if(/H00(\d+)/.../T00($1)/){open O, ">>$1";print O;close O;}' inputfile

This will create files named 1,2,3 etc. The numbers will correspond to the end of the H.

One of the other ways might be better.

Francisco J. Soler
Honored Contributor

Re: scripting question

Hi Gary,

My two lines awk script:

awk '
/^H/ {count++ ; filename="prefix_" count}
{ print >> filename }' filein

where "prefix_" is a prefix you want to name the out files.

Frank.
Linux?. Yes, of course.
Patrick Wallek
Honored Contributor
Solution

Re: scripting question

Here's an updated version of my script. This one DOES work, as I just had a chance to do some quick testing and debugging of it.

#!/usr/bin/sh

COUNT=0
while read LINE
do
FIRST=$(echo $LINE | cut -c 1)
echo $FIRST ; if [ "${FIRST}" = "H" ] ; then
FILE=H_${COUNT}_$(date +%m%d%Y)_$(date +%H%M%S)
echo ${LINE} >> ${FILE}
let COUNT=$COUNT+1
else
echo ${LINE} >> ${FILE}
fi
done < datfile

Michael Schulte zur Sur
Honored Contributor

Re: scripting question

Hi,

There must be an easy solution with csplit. I try from memory, so please don't hit me. ;-).

hth,

Michael

csplit -f spl filetosplit /^H[0-9]*/