Showing results for 
Search instead for 
Do you mean 

Help with perl

Honored Contributor Honored Contributor

Help with perl

[ Edited ]

I'm new to perl scripting and I need to limit to 60 characters the following String in this pattern in a xml file:

 

<CompanyName>String</CompanyName>

 

But perl doesn't seem to recognize the instr(big, little) function:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
my @b = ();
my @c = ();
my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      push @a, instr($_,">");
      push @b, instr($_,"</");
      push @c, substr(substr($_,@a+1,@b - @a-1),0,60);
      push @d, substr($_,0,@a)||@c||substr($_,@b,14);
      print @d;
      @a = ();
      @b = ();
      @c = ();
      @d = ();
      next;
    }
    print;
  }
  print;
}
1;
 

 

Eric

Each and every day is a good day to learn.
10 REPLIES
Honored Contributor Honored Contributor

Re: Help with perl

Now, it erases the entire pattern:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
#my @b = ();
#my @c = ();
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      push @a, $_;
      next;
    }
    if (m{>}+1..m{</-1}) {
      push @a, substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      push @a, $_;
      next;
    }
    print @a;
    @a = ();
    next;
  }
  print;
}
1;

Each and every day is a good day to learn.
Honored Contributor Honored Contributor

Re: Help with perl

Now, it is almost working but it still doesn't limit the string to 60 characters:

 

#!/usr/bin/perl
use strict;
use warnings;
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      print;
      next;
    }
    if (m{>}+1..m{</-1}) {
      print substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      print;
      next;
    }
  }
  print;
}
1;

Each and every day is a good day to learn.
Honored Contributor

Re: Help with perl

[ Edited ]

Why jump through diffucult hoops?

 

while (<>) {
s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};
}

 

 If the content for this tag is longer than 60 characters, truncate to 60

 

(/me still thinks you should use XML::Parser 

Enjoy, Have FUN! H.Merijn
Honored Contributor Honored Contributor

Re: Help with perl

Hi Merijn,

 

With your last script I get an empty file.

 

This is working but just for the first occurence:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      if (length($_) gt 87) {
        print substr($_,0,73);
        print "</CompanyName>\n";
        next;
      }
      print;
      next;
    }
  }
  print;
}
1;

 

Eric

Each and every day is a good day to learn.
Acclaimed Contributor Acclaimed Contributor

Re: Help with perl


Eric Antunes wrote:

Hi Merijn,

 

With your last script I get an empty file.

 


Hi Eric:

 

Try this:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
    s{(<(CompanyName)>)(.{0,60}).*?</\2>}{<$1>$3</$2>};
    print;
}
1;

Regards!

 

...JRF...

Honored Contributor

Re: Help with perl

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 

Enjoy, Have FUN! H.Merijn
Acclaimed Contributor Acclaimed Contributor

Re: Help with perl


H.Merijn Brand (procura wrote:

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 


Yes, my friend, I missed the doubled angle backets :-( and needlessly complicated the regex :-((

 

Yes, too, the missing print was obvious.

 

BUT, your original version did not limit the string :-(

 

You had:

 

s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};

whereas I should have used:

 

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

Regards!

 

...JRF...

Honored Contributor

Re: Help with perl

That is what one gets if not testing code :/

I indeed obviously had one pair of parens too many.

 

For completeness sake - we both made too many simple mistakes -, here is the full version:

$ cat modify.pl
use strict;
use warnings;

while (<>) {
    s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};
    # other modifications here
    print;
    }
$ perl -wc modify.pl
modify.pl syntax OK
$ perl modify.pl myfile.xml > modified.xml

 

 

Enjoy, Have FUN! H.Merijn
Honored Contributor Honored Contributor

Re: Help with perl

Exactly Merijn, you just posted the right script.

 

Although I didn't understand the s{} part, It worked wonderfuly!

 

But I will try to understand it.

 

Thank you,

 

Eric

Each and every day is a good day to learn.
Highlighted
Honored Contributor

Re: Help with perl

lemme (try to) explain:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

 make that more readable and still legal:

s{ <(CompanyName)>    # Search for the opening tag (keep tag name in $1)
   (.{0,60})  .*?     # Keep 0 to 60 characters in $2, ignore rest to
   </\1>              # The closing tag (\1 == $1 in the match part)
   }{<$1>$2</$1>}x;   # Replacement pattern

 all between parens is "captured". The first cature goes to $1, the next to $2 etc. If captures are nested, the outermost capture gets the lowest index: the index of the capture is the number of opening paren found. (unless you use (?|...) in newer perls, but we do not use that here).

So after "<CompanyName>" matched, $1 now contains "CompanyName".

The next line captures .{0,60}, which means "any character between 0 and 60 times". The patter .*? means a non-greedy match on any number of characters until the next part of the match which prevails over the otherwise greedy .* when we would not add the ?

as that is not in parens, it is just forgotten

The last part of the match is matching </\1> where \1 is the content of $1. We cannot use $1 there, as we are still inside the matching part. </\1> in this case is essentially the same as matching on </CompanyName>, which is more typing and more error-prone.

after the closing } of the match, the substitution pattern puts it all together again. The x after the last } enables us to split up the matching pattern over several lines and add whitespace and comments. 

 

Enjoy, Have FUN! H.Merijn