Simpler Navigation for Servers and Operating Systems
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
cancel
Showing results for 
Search instead for 
Did you mean: 

Help with perl

Highlighted
Eric Antunes
Honored Contributor

Help with perl

I'm new to perl scripting and I need to limit to 60 characters the following String in this pattern in a xml file:

 

<CompanyName>String</CompanyName>

 

But perl doesn't seem to recognize the instr(big, little) function:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
my @b = ();
my @c = ();
my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      push @a, instr($_,">");
      push @b, instr($_,"</");
      push @c, substr(substr($_,@a+1,@b - @a-1),0,60);
      push @d, substr($_,0,@a)||@c||substr($_,@b,14);
      print @d;
      @a = ();
      @b = ();
      @c = ();
      @d = ();
      next;
    }
    print;
  }
  print;
}
1;
 

 

Eric

Each and every day is a good day to learn.
10 REPLIES
Eric Antunes
Honored Contributor

Re: Help with perl

Now, it erases the entire pattern:

 

#!/usr/bin/perl
use strict;
use warnings;
my @a = ();
#my @b = ();
#my @c = ();
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      push @a, $_;
      next;
    }
    if (m{>}+1..m{</-1}) {
      push @a, substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      push @a, $_;
      next;
    }
    print @a;
    @a = ();
    next;
  }
  print;
}
1;

Each and every day is a good day to learn.
Eric Antunes
Honored Contributor

Re: Help with perl

Now, it is almost working but it still doesn't limit the string to 60 characters:

 

#!/usr/bin/perl
use strict;
use warnings;
#my @d = ();
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{<CompanyName>}) {
      print;
      next;
    }
    if (m{>}+1..m{</-1}) {
      print substr($_,0,60);
      next;
    }
    if (m{</CompanyName>}) {
      print;
      next;
    }
  }
  print;
}
1;

Each and every day is a good day to learn.
H.Merijn Brand (procura
Honored Contributor

Re: Help with perl

Why jump through diffucult hoops?

 

while (<>) {
s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};
}

 

 If the content for this tag is longer than 60 characters, truncate to 60

 

(/me still thinks you should use XML::Parser 

Enjoy, Have FUN! H.Merijn
Eric Antunes
Honored Contributor

Re: Help with perl

Hi Merijn,

 

With your last script I get an empty file.

 

This is working but just for the first occurence:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
  if (m{<CompanyName>}..m{</CompanyName>}) {
    if (m{</?CompanyName>}) {
      if (length($_) gt 87) {
        print substr($_,0,73);
        print "</CompanyName>\n";
        next;
      }
      print;
      next;
    }
  }
  print;
}
1;

 

Eric

Each and every day is a good day to learn.
James R. Ferguson
Acclaimed Contributor

Re: Help with perl


Eric Antunes wrote:

Hi Merijn,

 

With your last script I get an empty file.

 


Hi Eric:

 

Try this:

 

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
    s{(<(CompanyName)>)(.{0,60}).*?</\2>}{<$1>$3</$2>};
    print;
}
1;

Regards!

 

...JRF...

H.Merijn Brand (procura
Honored Contributor

Re: Help with perl

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 

Enjoy, Have FUN! H.Merijn
James R. Ferguson
Acclaimed Contributor

Re: Help with perl


H.Merijn Brand (procura wrote:

*I* was just "missing" the print. *You* overcomplicate the regex and generate invalid XML :)

$1 already includes < and >, so you'll end up with

 

<<CompanyName>>Whatever</CompanyName> 


Yes, my friend, I missed the doubled angle backets :-( and needlessly complicated the regex :-((

 

Yes, too, the missing print was obvious.

 

BUT, your original version did not limit the string :-(

 

You had:

 

s{(<(CompanyName)>)(.{0,60}).*?</\1>}{<$1>$2</$1>};

whereas I should have used:

 

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

Regards!

 

...JRF...

H.Merijn Brand (procura
Honored Contributor

Re: Help with perl

That is what one gets if not testing code :/

I indeed obviously had one pair of parens too many.

 

For completeness sake - we both made too many simple mistakes -, here is the full version:

$ cat modify.pl
use strict;
use warnings;

while (<>) {
    s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};
    # other modifications here
    print;
    }
$ perl -wc modify.pl
modify.pl syntax OK
$ perl modify.pl myfile.xml > modified.xml

 

 

Enjoy, Have FUN! H.Merijn
Eric Antunes
Honored Contributor

Re: Help with perl

Exactly Merijn, you just posted the right script.

 

Although I didn't understand the s{} part, It worked wonderfuly!

 

But I will try to understand it.

 

Thank you,

 

Eric

Each and every day is a good day to learn.
H.Merijn Brand (procura
Honored Contributor

Re: Help with perl

lemme (try to) explain:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

 make that more readable and still legal:

s{ <(CompanyName)>    # Search for the opening tag (keep tag name in $1)
   (.{0,60})  .*?     # Keep 0 to 60 characters in $2, ignore rest to
   </\1>              # The closing tag (\1 == $1 in the match part)
   }{<$1>$2</$1>}x;   # Replacement pattern

 all between parens is "captured". The first cature goes to $1, the next to $2 etc. If captures are nested, the outermost capture gets the lowest index: the index of the capture is the number of opening paren found. (unless you use (?|...) in newer perls, but we do not use that here).

So after "<CompanyName>" matched, $1 now contains "CompanyName".

The next line captures .{0,60}, which means "any character between 0 and 60 times". The patter .*? means a non-greedy match on any number of characters until the next part of the match which prevails over the otherwise greedy .* when we would not add the ?

as that is not in parens, it is just forgotten

The last part of the match is matching </\1> where \1 is the content of $1. We cannot use $1 there, as we are still inside the matching part. </\1> in this case is essentially the same as matching on </CompanyName>, which is more typing and more error-prone.

after the closing } of the match, the substitution pattern puts it all together again. The x after the last } enables us to split up the matching pattern over several lines and add whitespace and comments. 

 

Enjoy, Have FUN! H.Merijn