Re: Help with perl

Eric Antunes · ‎09-14-2011

I'm new to perl scripting and I need to limit to 60 characters the following String in this pattern in a xml file:

<CompanyName>String</CompanyName>

But perl doesn't seem to recognize the instr(big, little) function:

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
my @b = ();
my @c = ();
my @d = ();
while (<>) {
if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      push @a, instr($_,">");
      push @b, instr($_,"</");
      push @c, substr(substr($_,@a+1,@b - @a-1),0,60);
      push @d, substr($_,0,@a)||@c||substr($_,@b,14);
      print @d;
      @a = ();
      @b = ();
      @c = ();
      @d = ();
      next;
    }
    print;
}
print;
}
1;

Eric

Each and every day is a good day to learn.

Eric Antunes · ‎09-14-2011

Now, it erases the entire pattern:

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
#my @b = ();
#my @c = ();
#my @d = ();
while (<>) {
if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      push @a, $_;
      next;
    }
    if (m{>}+1..m{</-1}) {
      push @a, substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      push @a, $_;
      next;
    }
    print @a;
    @a = ();
    next;
}
print;
}
1;

Each and every day is a good day to learn.

Eric Antunes · ‎09-14-2011

Now, it is almost working but it still doesn't limit the string to 60 characters:

#!/usr/bin/perl
use strict;
use warnings;
#my @d = ();
while (<>) {
if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      print;
      next;
    }
    if (m{>}+1..m{</-1}) {
      print substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      print;
      next;
    }
}
print;
}
1;

Each and every day is a good day to learn.

H.Merijn Brand (procura · ‎09-14-2011

Why jump through diffucult hoops?

while (<>) {
    s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};
    }

If the content for this tag is longer than 60 characters, truncate to 60

(/me still thinks you should use XML::Parser

Enjoy, Have FUN! H.Merijn

Eric Antunes · ‎09-14-2011

Hi Merijn,

With your last script I get an empty file.

This is working but just for the first occurence:

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      if (length($_) gt 87) {
        print substr($_,0,73);
        print "</CompanyName>\n";
        next;
      }
      print;
      next;
    }
}
print;
}
1;

Eric

Each and every day is a good day to learn.

James R. Ferguson · ‎09-14-2011

@Eric Antunes wrote:
Hi Merijn,

With your last script I get an empty file.

Hi Eric:

Try this:

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
    s{(<(CompanyName)>)(.{0,60}).*?</\2>}{<$1>$3</$2>};
    print;
}
1;

Regards!

...JRF...

H.Merijn Brand (procura · ‎09-14-2011

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

<<CompanyName>>Whatever</CompanyName>

Enjoy, Have FUN! H.Merijn

James R. Ferguson · ‎09-14-2011

@H.Merijn Brand (procura wrote:
*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)
$1 already includes < and >, so you'll end up with

<<CompanyName>>Whatever</CompanyName>

Yes, my friend, I missed the doubled angle backets :-( and needlessly complicated the regex :-((

Yes, too, the missing print was obvious.

BUT, your original version did not limit the string :-(

You had:

s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};

whereas I should have used:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

Regards!

...JRF...

H.Merijn Brand (procura · ‎09-14-2011

That is what one gets if not testing code :/

I indeed obviously had one pair of parens too many.

For completeness sake - we both made too many simple mistakes -, here is the full version:

$ cat modify.pl
use strict;
use warnings;

while (<>) {
    s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};
    # other modifications here
    print;
    }
$ perl -wc modify.pl
modify.pl syntax OK
$ perl modify.pl myfile.xml > modified.xml

Enjoy, Have FUN! H.Merijn

Eric Antunes · ‎09-15-2011

Exactly Merijn, you just posted the right script.

Although I didn't understand the s{} part, It worked wonderfuly!

But I will try to understand it.

Thank you,

Eric

Each and every day is a good day to learn.

H.Merijn Brand (procura · ‎09-15-2011

lemme (try to) explain:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

make that more readable and still legal:

s{ <(CompanyName)>    # Search for the opening tag (keep tag name in $1)
   (.{0,60})  .*?     # Keep 0 to 60 characters in $2, ignore rest to
   </\1>              # The closing tag (\1 == $1 in the match part)
   }{<$1>$2</$1>}x;   # Replacement pattern

all between parens is "captured". The first cature goes to $1, the next to $2 etc. If captures are nested, the outermost capture gets the lowest index: the index of the capture is the number of opening paren found. (unless you use (?|...) in newer perls, but we do not use that here).

So after "<CompanyName>" matched, $1 now contains "CompanyName".

The next line captures .{0,60}, which means "any character between 0 and 60 times". The patter .*? means a non-greedy match on any number of characters until the next part of the match which prevails over the otherwise greedy .* when we would not add the ?

as that is not in parens, it is just forgotten

The last part of the match is matching </\1> where \1 is the content of $1. We cannot use $1 there, as we are still inside the matching part. </\1> in this case is essentially the same as matching on </CompanyName>, which is more typing and more error-prone.

after the closing } of the match, the substitution pattern puts it all together again. The x after the last } enables us to split up the matching pattern over several lines and add whitespace and comments.

Enjoy, Have FUN! H.Merijn

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Help with perl

Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl

Re: Help with perl