topic A little improvement to my webinput perl scripts in Operating System - HP-UX

A little improvement to my webinput perl scripts

Steven E. Protter — Thu, 20 May 2004 22:29:47 GMT

I have some perl scripts that take input from a web page and process them into valid html documents.

It works pretty well right now.

I strip out the line feeds as follows so I can put them in with print statements later:

chop ($filedata) if ($filedata =~/\n$/);

In the output file get the following results

this the data^M

This ^M is a single character and I'd like to strip it out.

I know I can do it after the script run with the dos2unix command but I'd rather strip it in the program.

I imagine its another chop statement. I can't begin to figure out what it should be.

Bunny for tested code or an explanation as to why I can't do it.

SEP

Re: A little improvement to my webinput perl scripts

Mel Burslan — Thu, 20 May 2004 23:20:21 GMT

Steven,

This is not recently tested but I remember running into a situtaion like this in a 7 line perl code (this is the extent of my perl capacity goes to tell you the truth) but instead of chop, I remember using chomp to eliminate the trailing carriage return character.

Hope it helps.

Re: A little improvement to my webinput perl scripts

curt larson_1 — Thu, 20 May 2004 23:45:46 GMT

i would guess why this isn't working is that ms-dos end of line is \r\n$. and just doing a chop is leaving the \r.

you can use chomp to remove a substring at the end of a line. $\ specifies the substring. or $INPUT_RECORD_SEPARATOR if you use English module.

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 06:50:46 GMT

Attaching a sample output file.

A bunny for a working chomp command.

SEP

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 06:51:02 GMT

Attaching a sample output file.

A bunny for a working chomp command.

attaching a sample

SEP

Re: A little improvement to my webinput perl scripts

Ken Penland_1 — Fri, 21 May 2004 07:02:35 GMT

^M is considered whitespace to perl, you can strip it off by doing a:

$filedata =~ s/\s+$//;

Re: A little improvement to my webinput perl scripts

Geoff Wild — Fri, 21 May 2004 08:03:35 GMT

Will something like this work?

open(INF,"../links.txt");
@data = ;
close(INF);
foreach $i (@data) {
chomp($i);
($name,$heading,$text) = split(/\|/,$i);
print "$heading\n";
}

Rgds...Geoff

Re: A little improvement to my webinput perl scripts

Mark Grant — Fri, 21 May 2004 08:16:37 GMT

Hi SEP,

How about (as suggested by curt)

open FILE, "filename" or die "oh no, not again\n";

$/="^M";

while(){
chomp;
print;
}

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 08:22:01 GMT

These ideas look wonderful.

I will be trying them late this afternoon.

I'm concerned that Ken's code will remove all whitespaces including the spaces between words.

Since the program is already runnning, my perference if possible is to add a chomp command to the existing statement so I don't have to execute a secondd program.

Perhaps change:
chop ($filedata) if ($filedata =~/\n$/);
to
chop ($filedata) if ($filedata =~/^M\n$/);

except I know its not carat M, its a single character taht I don't know the escape code for.

Thanks. If there are other ideas, I'll be happy to try them.

SEP

Re: A little improvement to my webinput perl scripts

curt larson_1 — Fri, 21 May 2004 08:28:34 GMT

what does

$\="\r\n";
chomp(filedata);

do for you

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 08:29:53 GMT

Well Ken, its not a whitespace I guess. Tried your idea no joy. Seven points for the try.

SEP

Re: A little improvement to my webinput perl scripts

curt larson_1 — Fri, 21 May 2004 08:35:10 GMT

well chop only removes the last character. so
chop ($filedata) if ($filedata =~/^M\n$/);

is only going to remove the \n. leaving the \r. so, your going to still have the same issue that you are now.

Re: A little improvement to my webinput perl scripts

Mark Grant — Fri, 21 May 2004 08:36:14 GMT

SEP, you don't need the test. If you have set $/ to "^M^J" or possibly just "\r\n" then chomp won't do anything if they are not there.

Apologies though, my little snippet above forgot that there needs to be a line feed in there too :)

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 08:48:12 GMT

Kurt

$\="\r\n";
chomp(filedata);

results in an extra ^M after each line.

I think this is significant.

SEP

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 08:58:45 GMT

Okay to be honest, the ^M is not so bad.

It was interfering with the following problem.

I have a line of data that looks like this.

* * *^M

I wanted to test for text

if ( $filedata eq "* * *") {
# process differently
}

I can close this thread if i can reliabily test the first character of the array for an asterisk and take action.

That code will end this with a bunny.

Got a meeting, point assignment after.

SEP

Re: A little improvement to my webinput perl scripts

Mark Grant — Fri, 21 May 2004 09:16:06 GMT

if ( $filedata=~/^\*/){
#process differently
}

Re: A little improvement to my webinput perl scripts

Ralph Grothe — Fri, 21 May 2004 09:17:26 GMT

See "perldoc -f chomp", "perldoc -f chop", "perldoc perlvar", "perldoc perlop".

The chomp POD says that chomp() will chop off any string held in the input separator varaible $/.
On Win32 the line separator is the sequence of \r\n or ^M^J.
Perl should automagically take care of the propper separator.
But to be explicit you could assign this char sequence to $/ (better localize $/)
e.g.
{ local $/ = "\r\n";
# parsing, chomping here
}
If you prefer you could as well use the octal or hex reps.

$/ = "\015\012";

or even this might work

$/ = "\cM\cJ";

To get rid of carriage returns it's more efficient to use the transliterate operator as known from awk (tr or y) than a regexp.

while () {
tr/\015//d;
...
}

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 09:26:57 GMT

Mark Grant's last idea worked.

I'm still going to try to strip that ^M but I've accomplished what I need on this one.

dos2unix works fine as a final processing step.

SEP

Re: A little improvement to my webinput perl scripts

Ralph Grothe — Fri, 21 May 2004 09:29:38 GMT

put the aterisk in a character class and test for one ore more times occurrences

while () {
if (/[*]+/) {
# do something
}
}

similarily, treating only lines that don't contain carriage returns

unless (/[\r]/) {
}

Re: A little improvement to my webinput perl scripts

Steven E. Protter — Fri, 21 May 2004 09:32:27 GMT

RALPH!

You DUDE!

You got it.

after the chop

$/="\cM";
chomp ( $filedata );

I know I could probably do it in one line of code but I don't care.

THREAD CLOSED!

AWESOME

SEP