1748252 Members
3953 Online
108760 Solutions
New Discussion юеВ

Re: scripting question

 
SOLVED
Go to solution
John McDen
Regular Advisor

scripting question

How do I extract a particular pattern followed by few charaters. I have have a very large data file. There are few line containg a typical pattern I just want to extract it.

The pattern is randomly placed in the file,but has a common 1st two chars. followed by 8 diff. chars.

eg.
INPUT DATA
This is a test data XYxxxxxxxx
Data XYmmmmm_mm could be any where in the file
Need to only extract XYaaaaaaaa

OUTPUT

XYxxxxxxxx
XYmmmmm_mm
XYaaaaaaaa
....

Is it possible to do this ?? Please help I am brain dead ... can't think of anything..really.

New to HP
11 REPLIES 11
Rodney Hills
Honored Contributor

Re: scripting question

A simple perl program can do it

open(INP,"while() {
chomp;
while (/(XY[^\s]*)/g) {
print $1,"\n";
}
}

-- Rod Hills
There be dragons...
John McDen
Regular Advisor

Re: scripting question

Rodney thanks for replying so fast it works but I just need the XY and next 8 chars I wish I knew perl so that I could fix it on my own.

New to HP
Rodney Hills
Honored Contributor
Solution

Re: scripting question

Just change

while (/(XY[^\s]*)/g) {

to

while (/(XY........)/g) {

In your sample, you had a value with _mm appended which made it more than 8 characters so I wasn't sure if you needed the characters up to the next blank character or end of line.

Good Luck

-- Rod Hills
There be dragons...
Rodney Hills
Honored Contributor

Re: scripting question

Oops, recounting I see it was 8 characters. These variable width fonts can be misleading at times.

-- Rod Hills
There be dragons...
John McDen
Regular Advisor

Re: scripting question

Great .. it worked... Thanks
New to HP
harry d brown jr
Honored Contributor

Re: scripting question

Or in a shell:

cat DATA | tr -cs "[:alnum:][:punct:]" "[\012*]" | grep XY

live free or die
harry
Live Free or Die
Ralph Grothe
Honored Contributor

Re: scripting question

perl -ne 'print "$1\n" if /\b(XY\w{8})/' /path/to/your/file
Madness, thy name is system administration
Rodney Hills
Honored Contributor

Re: scripting question

Ralph, I believe your version assumes only one per line. The version I supplied allows for multiple entries to exist on a line.

See documentation on m//g. "g" allows to search multiple times through the same line.

-- Rod Hills
There be dragons...
harry d brown jr
Honored Contributor

Re: scripting question

John,

This will hit those "words" starting with XY with only a length of eight:

cat DATA | tr -cs "[:alnum:][:punct:]" "[\012*]" | grep "^XY\(.\)\{8\}$"

ignoring anything like XYabcdefghijk

live free or die
harry
Live Free or Die