1753411 Members
7419 Online
108793 Solutions
New Discussion

Re: Help with perl

 
H.Merijn Brand (procura
Honored Contributor

Re: Help with perl

lemme (try to) explain:

s{<(CompanyName)>(.{0,60}).*?</\1>}{<$1>$2</$1>};

 make that more readable and still legal:

s{ <(CompanyName)>    # Search for the opening tag (keep tag name in $1)
   (.{0,60})  .*?     # Keep 0 to 60 characters in $2, ignore rest to
   </\1>              # The closing tag (\1 == $1 in the match part)
   }{<$1>$2</$1>}x;   # Replacement pattern

 all between parens is "captured". The first cature goes to $1, the next to $2 etc. If captures are nested, the outermost capture gets the lowest index: the index of the capture is the number of opening paren found. (unless you use (?|...) in newer perls, but we do not use that here).

So after "<CompanyName>" matched, $1 now contains "CompanyName".

The next line captures .{0,60}, which means "any character between 0 and 60 times". The patter .*? means a non-greedy match on any number of characters until the next part of the match which prevails over the otherwise greedy .* when we would not add the ?

as that is not in parens, it is just forgotten

The last part of the match is matching </\1> where \1 is the content of $1. We cannot use $1 there, as we are still inside the matching part. </\1> in this case is essentially the same as matching on </CompanyName>, which is more typing and more error-prone.

after the closing } of the match, the substitution pattern puts it all together again. The x after the last } enables us to split up the matching pattern over several lines and add whitespace and comments. 

 

Enjoy, Have FUN! H.Merijn