script help

maliaka · ‎08-31-2007

I'm trying to break the loop and continue with the next line:

the file has 10 lines of the record (f1), I want to break when it reaches 5 and continue with the next line (f2) and so on.

what is the best way to do it?
10 points for the best answer.

f1
f1
f1
f1
f1
f1
f1
f1
f1
f1
fi
f2
f2
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3
f3

Tim Nelson · ‎08-31-2007

for x in file1 \
file2
do
awk 'NR < 6 {print}' $x
done

A. Clay Stephenson · ‎08-31-2007

Well, the "best" answer is the one you come up with and this is so simple you should really do it yourself. Here's one approach using uniq -c which will output the number of identical lines plus the line itself.

#!/usr/bin/sh

typeset -i N=0
typeset -i I=0
typeset -i MAX=5 # max repeats
typeset S=''
typeset INFILE="myinfile"

uniq -c ${INFILE} | while read N S
do
I=1
if [[ ${N} -gt ${MAX} ]]
then
N=${MAX}
fi
while [[ ${I} -le ${N} ]]
do
echo "${S}"
((I += 1))
done
done

If it ain't broke, I can fix that.

James R. Ferguson · ‎08-31-2007

Hi:

With a pure shell script:

# #!/usr/bin/sh
while read LINE
do
[ -z "${SAVE}" ] && SAVE=${LINE}
if [ "${LINE}" = ${SAVE} ]; then
let i=i+1
[ ${i} -ge 5 ] && continue || echo ${LINE}
else
i=0
SAVE=${LINE}
echo ${LINE}
fi
done < file

Regards!

...JRF...

James R. Ferguson · ‎08-31-2007

Hi (again(:

Oops, that shebank line (line-1) should of course be:

#!/usr/bin/sh

...not:

# #!/usr/bin/sh

Regards!

...JRF...

James R. Ferguson · ‎08-31-2007

Hi (again(:

Oops, that shebang line (line-1) should of course be:

#!/usr/bin/sh

...not:

# #!/usr/bin/sh

Regards!

...JRF...

maliaka · ‎08-31-2007

James

yours is good but it coninues to count the lines before it gets to the next string.
Is there away to avoid counting the lines and just skip to the next string?

A. Clay Stephenson · ‎08-31-2007

>> Is there away to avoid counting the lines and just skip to the next string?

Think about what you are asking. Do you know of a "skip to the next (different) string" command? How would you write such a command without reading the intervening data?

If it ain't broke, I can fix that.

maliaka · ‎08-31-2007

.

Dennis Handly · ‎08-31-2007

>Clay: Do you know of a "skip to the next (different) string" command? How would you write such a command without reading the intervening data?

You just ask the index sequential file to skip to a record with a key greater than the current key. :-)

maliaka · ‎08-31-2007

Dennis,

Would you mind to elaborate?

Victor Fridyev · ‎08-31-2007

If I understand you correctly,
#!/usr/bin/sh
typeset -i I=0
typeset -i MAX=5
PREV=""
cat FILE | while read CURR; do
if [ $CURR != $PREV ]; then
I=1
echo $CURR
PREV=CURR
else
if [ $I -le $MAX ] ; then
echo $CURR
I=$I+1
fi
fi
done

Entities are not to be multiplied beyond necessity - RTFM

Dennis Handly · ‎08-31-2007

In COBOL an Indexed file allows you to find records by a key. You can also find records by a key greater than the one you supply. Once you find the record, you can do sequential reads of the records in order.

Of course the file has to have some type of B tree to contain the keys and records and allow these quick searches.

So you would basically do:
1) READ and after 5 matches then:
2) START key > current key
3) repeat at 1)

Since you don't have COBOL, you would have to do what Clay said, skip matching records until you come to a difference.

So unless you have 1000s of records to skip, you should just read and compare.

Dennis Handly · ‎08-31-2007

>JRF: With a pure shell script:

You forgot to initialize "i". And if you do, you only print 4 of the first group. So why did you have "[ -z "${SAVE}" ]"?
It seems you want SAVE to be empty so you go though the difference code like Victor.

>Victor: PREV=CURR

Typo, you forgot a "$" before CURR.

James R. Ferguson · ‎09-01-2007

Hi (again):

>Dennis: JRF, You forgot to initialize "i". And if you do, you only print 4 of the first group.

Yes, you're correct - sloppy logic on my part and in fact running with 'sh -x' exposes that. The script should look like:

#!/usr/bin/sh
typeset -i i=0
while read LINE
do
[ -z "${SAVE}" ] && SAVE=${LINE}
if [ "${LINE}" = ${SAVE} ]; then
let i=i+1
[ ${i} -gt 5 ] && continue || echo ${LINE}
else
i=1
SAVE=${LINE}
echo ${LINE}
fi
done < file

ALSO:

Dennis>: In COBOL an Indexed file allows you to find records by a key.

Yes, that's true, but B-trees, and hashes are more germane under the assumption that the file was built with the intention of searches like this question posed. ;-)

Regards!

...JRF...

maliaka · ‎09-01-2007

You guys are awesome!

Dennis,

Yes, some lines are over 1000 lines and that is why I kept asking if there is away to skip them. It'll take forever before I get the final result.
Sorry if my question sounds stupid but I'd really appreciate any help.
If the shell can not do it, can Perl do it then?

James R. Ferguson · ‎09-01-2007

Hi (again):

> some lines are over 1000 lines and that is why I kept asking if there is away to skip them. It'll take forever before I get the final result.

Are you saying that your file is static in its contents but that you repeatedly want to search it?

If that's true then you could build a hash (index) as a separate file. The index (file) would contain the offset of the first record of each "block" of similar data (akin to what your example shows). Using the index file, you find the key you want in the index; read the offset stored there associated with the key; and using that offset, seek() to the correct position in the data file. While a pure shell script can't do this, Perl can.

Regards!

...JRF...

Dennis Handly · ‎09-01-2007

>It'll take forever before I get the final result.

How many total lines? And you want to visit only the first 5 of each set?

If your data is more dynamic, it would have to be sorted, there you could make that index.
(How does the file get sorted?

Or you could just binary search forward in a C program to your guess where the next group starts.

Or in C++, create a multimap.

Hein van den Heuvel · ‎09-02-2007

1,000 - 100,000 dups is probably not worth your time not the computer time to try and 'jump' over. Just read and compare.

For better help, please indicate
- an approximate total record count
- whether records as fixed length (allowing for binary search, or jump aheads'.
- do all bytes of each record contribute to uniqueness?
- what data (counters) do you want to retain as well (records, dups, selected,..?)

If the skip-ahead was really imporant then I would do something like:
After N dups, seek ahead an other N dups.
Start with 4.
Repeat if still dup.
Binary search backwards if when jumped too far.
So within a 10,000 sup series you might read: 1,2,3,4,8,32,64,128,256,512,1024,2048,4096,8192,16384,
12288,10240,9216,9728,9984,10112,10048,10016,10000

So that's a good 25 reads to count 10,000,
and only 2 more for every 2 times as many records.

I guess I'll also have to do the obligatory Perl alternatives! :-)

# perl -ne 'print if $test{$_}++ < 5'

The above does NOT require sorted input.

As written it uses the whole line to indicate uniqueness, but it is readily modified to just use a substring or field.

It will gobble up memory per unique line.
It will be fine for up to 100,000 lines, but might become problematic for millions (of uniques records).

What problem are you really trying to solve?

I looks like the requested task will lose a lot of info but doen not at much value.

Don't you want to know how many there where?
IF SORTED, no memory consumption:

$ perl -ne 'if ($last ne $_){ print "($n)\n" if $n>5; $last=$_; $n=0; print} else {print if $n++ <
5}'

Don't you want at least an indication there where more than 5?

$ perl -ne 'print if (($x=$test{$_}++) < 5); print ":\n" if 6==$x'

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Dennis Handly · ‎09-02-2007

>Hein: - whether records as fixed length (allowing for binary search, or jump aheads

Even if not fixed, you can do fuzzy skips by throwing away the partial record.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

script help

script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help

Re: script help