1823143 Members
3575 Online
109647 Solutions
New Discussion юеВ

script help

 
SOLVED
Go to solution
maliaka
Advisor

script help

I'm trying to break the loop and continue with the next line:

the file has 10 lines of the record (f1), I want to break when it reaches 5 and continue with the next line (f2) and so on.

what is the best way to do it?
10 points for the best answer.

f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
fi
f2
f2
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3

19 REPLIES 19
Tim Nelson
Honored Contributor

Re: script help

for x in file1 \
file2
do
awk 'NR < 6 {print}' $x
done
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: script help

Well, the "best" answer is the one you come up with and this is so simple you should really do it yourself. Here's one approach using uniq -c which will output the number of identical lines plus the line itself.

#!/usr/bin/sh

typeset -i N=0
typeset -i I=0
typeset -i MAX=5 # max repeats
typeset S=''
typeset INFILE="myinfile"

uniq -c ${INFILE} | while read N S
do
I=1
if [[ ${N} -gt ${MAX} ]]
then
N=${MAX}
fi
while [[ ${I} -le ${N} ]]
do
echo "${S}"
((I += 1))
done
done

If it ain't broke, I can fix that.
James R. Ferguson
Acclaimed Contributor

Re: script help

Hi:

With a pure shell script:

# #!/usr/bin/sh
while read LINE
do
[ -z "${SAVE}" ] && SAVE=${LINE}
if [ "${LINE}" = ${SAVE} ]; then
let i=i+1
[ ${i} -ge 5 ] && continue || echo ${LINE}
else
i=0
SAVE=${LINE}
echo ${LINE}
fi
done < file

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: script help

Hi (again(:

Oops, that shebank line (line-1) should of course be:

#!/usr/bin/sh

...not:

# #!/usr/bin/sh

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: script help

Hi (again(:

Oops, that shebang line (line-1) should of course be:

#!/usr/bin/sh

...not:

# #!/usr/bin/sh

Regards!

...JRF...
maliaka
Advisor

Re: script help

James

yours is good but it coninues to count the lines before it gets to the next string.
Is there away to avoid counting the lines and just skip to the next string?
A. Clay Stephenson
Acclaimed Contributor

Re: script help

>> Is there away to avoid counting the lines and just skip to the next string?

Think about what you are asking. Do you know of a "skip to the next (different) string" command? How would you write such a command without reading the intervening data?
If it ain't broke, I can fix that.
maliaka
Advisor

Re: script help

.
Dennis Handly
Acclaimed Contributor

Re: script help

>Clay: Do you know of a "skip to the next (different) string" command? How would you write such a command without reading the intervening data?

You just ask the index sequential file to skip to a record with a key greater than the current key. :-)
maliaka
Advisor

Re: script help

Dennis,

Would you mind to elaborate?
Victor Fridyev
Honored Contributor

Re: script help

If I understand you correctly,
#!/usr/bin/sh
typeset -i I=0
typeset -i MAX=5
PREV=""
cat FILE | while read CURR; do
if [ $CURR != $PREV ]; then
I=1
echo $CURR
PREV=CURR
else
if [ $I -le $MAX ] ; then
echo $CURR
I=$I+1
fi
fi
done
Entities are not to be multiplied beyond necessity - RTFM
Dennis Handly
Acclaimed Contributor

Re: script help

In COBOL an Indexed file allows you to find records by a key. You can also find records by a key greater than the one you supply. Once you find the record, you can do sequential reads of the records in order.

Of course the file has to have some type of B tree to contain the keys and records and allow these quick searches.

So you would basically do:
1) READ and after 5 matches then:
2) START key > current key
3) repeat at 1)

Since you don't have COBOL, you would have to do what Clay said, skip matching records until you come to a difference.

So unless you have 1000s of records to skip, you should just read and compare.
Dennis Handly
Acclaimed Contributor

Re: script help

>JRF: With a pure shell script:

You forgot to initialize "i". And if you do, you only print 4 of the first group. So why did you have "[ -z "${SAVE}" ]"?
It seems you want SAVE to be empty so you go though the difference code like Victor.

>Victor: PREV=CURR

Typo, you forgot a "$" before CURR.
James R. Ferguson
Acclaimed Contributor

Re: script help

Hi (again):

>Dennis: JRF, You forgot to initialize "i". And if you do, you only print 4 of the first group.

Yes, you're correct - sloppy logic on my part and in fact running with 'sh -x' exposes that. The script should look like:

#!/usr/bin/sh
typeset -i i=0
while read LINE
do
[ -z "${SAVE}" ] && SAVE=${LINE}
if [ "${LINE}" = ${SAVE} ]; then
let i=i+1
[ ${i} -gt 5 ] && continue || echo ${LINE}
else
i=1
SAVE=${LINE}
echo ${LINE}
fi
done < file


ALSO:

Dennis>: In COBOL an Indexed file allows you to find records by a key.

Yes, that's true, but B-trees, and hashes are more germane under the assumption that the file was built with the intention of searches like this question posed. ;-)

Regards!

...JRF...
maliaka
Advisor

Re: script help

You guys are awesome!

Dennis,

Yes, some lines are over 1000 lines and that is why I kept asking if there is away to skip them. It'll take forever before I get the final result.
Sorry if my question sounds stupid but I'd really appreciate any help.
If the shell can not do it, can Perl do it then?
James R. Ferguson
Acclaimed Contributor

Re: script help

Hi (again):

> some lines are over 1000 lines and that is why I kept asking if there is away to skip them. It'll take forever before I get the final result.

Are you saying that your file is static in its contents but that you repeatedly want to search it?

If that's true then you could build a hash (index) as a separate file. The index (file) would contain the offset of the first record of each "block" of similar data (akin to what your example shows). Using the index file, you find the key you want in the index; read the offset stored there associated with the key; and using that offset, seek() to the correct position in the data file. While a pure shell script can't do this, Perl can.

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: script help

>It'll take forever before I get the final result.

How many total lines? And you want to visit only the first 5 of each set?

If your data is more dynamic, it would have to be sorted, there you could make that index.
(How does the file get sorted?

Or you could just binary search forward in a C program to your guess where the next group starts.

Or in C++, create a multimap.
Hein van den Heuvel
Honored Contributor

Re: script help

1,000 - 100,000 dups is probably not worth your time not the computer time to try and 'jump' over. Just read and compare.


For better help, please indicate
- an approximate total record count
- whether records as fixed length (allowing for binary search, or jump aheads'.
- do all bytes of each record contribute to uniqueness?
- what data (counters) do you want to retain as well (records, dups, selected,..?)

If the skip-ahead was really imporant then I would do something like:
After N dups, seek ahead an other N dups.
Start with 4.
Repeat if still dup.
Binary search backwards if when jumped too far.
So within a 10,000 sup series you might read: 1,2,3,4,8,32,64,128,256,512,1024,2048,4096,8192,16384,
12288,10240,9216,9728,9984,10112,10048,10016,10000

So that's a good 25 reads to count 10,000,
and only 2 more for every 2 times as many records.


I guess I'll also have to do the obligatory Perl alternatives! :-)

# perl -ne 'print if $test{$_}++ < 5'

The above does NOT require sorted input.

As written it uses the whole line to indicate uniqueness, but it is readily modified to just use a substring or field.

It will gobble up memory per unique line.
It will be fine for up to 100,000 lines, but might become problematic for millions (of uniques records).

What problem are you really trying to solve?

I looks like the requested task will lose a lot of info but doen not at much value.

Don't you want to know how many there where?
IF SORTED, no memory consumption:

$ perl -ne 'if ($last ne $_){ print "($n)\n" if $n>5; $last=$_; $n=0; print} else {print if $n++ <
5}'

Don't you want at least an indication there where more than 5?

$ perl -ne 'print if (($x=$test{$_}++) < 5); print ":\n" if 6==$x'

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
Dennis Handly
Acclaimed Contributor

Re: script help

>Hein: - whether records as fixed length (allowing for binary search, or jump aheads

Even if not fixed, you can do fuzzy skips by throwing away the partial record.