Operating System - HP-UX
1758429 Members
2968 Online
108871 Solutions
New Discussion юеВ

Random sampling from a file

 
SOLVED
Go to solution
cbres00
Frequent Advisor

Random sampling from a file

I have a file over 650,000 lines long. I need to send a random sampling of these lines to someone for verification. How can I do this? Thanks in advance.
Life is too short not to have fun every single day
9 REPLIES 9
Jitendra_1
Trusted Contributor

Re: Random sampling from a file

Korn shell has a $RANDOM variable inbuilt. You can use it here . Its value is between 0 and 32767 i guess. Since you file has 650000 lines , you can directly use this number or do soem arithmatic with it. You can then use this number to do some operation on the file like tail or head etc.

hope this helps.
Learning is the Key!
cbres00
Frequent Advisor

Re: Random sampling from a file

Excellent! I'm pretty new
to this scripting thing, so may I indulge your patience?
How would I code the head
statement? Thanks...
Life is too short not to have fun every single day
Jitendra_1
Trusted Contributor

Re: Random sampling from a file

Ok , here it goes:

#!/usr/bin/ksh
file="/path/filename" # Your data file
Number=`echo "($RANDOM)" |bc` # This is first random number
OtherNumber=`echo "($RANDOM)" |bc` # This is second random number

#Now test which number is bigger

if [ $Number -gt $Othernumber ];
then
cat $filename | tail $Number | Head $othername
else
cat $filename | tail $othername |head $Number
fi


So for example ( say your data file has 500 lines altogether) if the first number is 400 and second is 300 , the program will show you the lines starting from 100 to 400 .

Hope this is clear.
Learning is the Key!
Jitendra_1
Trusted Contributor

Re: Random sampling from a file

sorry , read that $othername as $Othernumber .

Thanks
Learning is the Key!
cbres00
Frequent Advisor

Re: Random sampling from a file

So it'll always grab some part of the beginning or the end of the file? I was hoping to grab some data from the middle, also. T.I.A...
Life is too short not to have fun every single day
Jitendra_1
Trusted Contributor
Solution

Re: Random sampling from a file

No , in fact this script also will grab the middle part only. The tail -$Number | head -$Othernumber is exactly for this purpose.

For another example say this is your file:
test line1
test line2
test line3
test line4
test line5

Then it will grab something like this :

test line2
test line3


etc.etc.

Hope this helps
Learning is the Key!
cbres00
Frequent Advisor

Re: Random sampling from a file

Perfect! Thanks much!
Cathy
Life is too short not to have fun every single day
cbres00
Frequent Advisor

Re: Random sampling from a file

I think there needs to be a
minus in front of the $Number and $Othernumber in the head/tail references. Otherwise it treats it as a filename.

I think. ;-)
Life is too short not to have fun every single day
Jitendra_1
Trusted Contributor

Re: Random sampling from a file

You are right , there has to be minus sign there. Sorry for my typing .
Learning is the Key!