Operating System - HP-UX
1829404 Members
1497 Online
109991 Solutions
New Discussion

A little improvement to my webinput perl scripts

 
SOLVED
Go to solution
Steven E. Protter
Exalted Contributor

A little improvement to my webinput perl scripts

I have some perl scripts that take input from a web page and process them into valid html documents.

It works pretty well right now.

I strip out the line feeds as follows so I can put them in with print statements later:

chop ($filedata) if ($filedata =~/\n$/);


In the output file get the following results

this the data^M

This ^M is a single character and I'd like to strip it out.

I know I can do it after the script run with the dos2unix command but I'd rather strip it in the program.

I imagine its another chop statement. I can't begin to figure out what it should be.

Bunny for tested code or an explanation as to why I can't do it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
19 REPLIES 19
Mel Burslan
Honored Contributor

Re: A little improvement to my webinput perl scripts

Steven,

This is not recently tested but I remember running into a situtaion like this in a 7 line perl code (this is the extent of my perl capacity goes to tell you the truth) but instead of chop, I remember using chomp to eliminate the trailing carriage return character.

Hope it helps.

________________________________
UNIX because I majored in cryptology...
curt larson_1
Honored Contributor

Re: A little improvement to my webinput perl scripts

i would guess why this isn't working is that ms-dos end of line is \r\n$. and just doing a chop is leaving the \r.

you can use chomp to remove a substring at the end of a line. $\ specifies the substring. or $INPUT_RECORD_SEPARATOR if you use English module.
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

Attaching a sample output file.

A bunny for a working chomp command.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

Attaching a sample output file.

A bunny for a working chomp command.

attaching a sample

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ken Penland_1
Trusted Contributor

Re: A little improvement to my webinput perl scripts

^M is considered whitespace to perl, you can strip it off by doing a:

$filedata =~ s/\s+$//;
'
Geoff Wild
Honored Contributor

Re: A little improvement to my webinput perl scripts

Will something like this work?

open(INF,"../links.txt");
@data = ;
close(INF);
foreach $i (@data) {
chomp($i);
($name,$heading,$text) = split(/\|/,$i);
print "$heading\n";
}

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Mark Grant
Honored Contributor

Re: A little improvement to my webinput perl scripts

Hi SEP,

How about (as suggested by curt)

open FILE, "filename" or die "oh no, not again\n";

$/="^M";

while(){
chomp;
print;
}
Never preceed any demonstration with anything more predictive than "watch this"
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

These ideas look wonderful.

I will be trying them late this afternoon.

I'm concerned that Ken's code will remove all whitespaces including the spaces between words.

Since the program is already runnning, my perference if possible is to add a chomp command to the existing statement so I don't have to execute a secondd program.

Perhaps change:
chop ($filedata) if ($filedata =~/\n$/);
to
chop ($filedata) if ($filedata =~/^M\n$/);

except I know its not carat M, its a single character taht I don't know the escape code for.

Thanks. If there are other ideas, I'll be happy to try them.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
curt larson_1
Honored Contributor

Re: A little improvement to my webinput perl scripts

what does

$\="\r\n";
chomp(filedata);


do for you
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

Well Ken, its not a whitespace I guess. Tried your idea no joy. Seven points for the try.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
curt larson_1
Honored Contributor

Re: A little improvement to my webinput perl scripts

well chop only removes the last character. so
chop ($filedata) if ($filedata =~/^M\n$/);

is only going to remove the \n. leaving the \r. so, your going to still have the same issue that you are now.
Mark Grant
Honored Contributor

Re: A little improvement to my webinput perl scripts

SEP, you don't need the test. If you have set $/ to "^M^J" or possibly just "\r\n" then chomp won't do anything if they are not there.

Apologies though, my little snippet above forgot that there needs to be a line feed in there too :)
Never preceed any demonstration with anything more predictive than "watch this"
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

Kurt

$\="\r\n";
chomp(filedata);

results in an extra ^M after each line.

I think this is significant.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

Okay to be honest, the ^M is not so bad.

It was interfering with the following problem.

I have a line of data that looks like this.

* * *^M

I wanted to test for text

if ( $filedata eq "* * *") {
# process differently
}

I can close this thread if i can reliabily test the first character of the array for an asterisk and take action.

That code will end this with a bunny.

Got a meeting, point assignment after.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Mark Grant
Honored Contributor
Solution

Re: A little improvement to my webinput perl scripts

if ( $filedata=~/^\*/){
#process differently
}
Never preceed any demonstration with anything more predictive than "watch this"
Ralph Grothe
Honored Contributor

Re: A little improvement to my webinput perl scripts

See "perldoc -f chomp", "perldoc -f chop", "perldoc perlvar", "perldoc perlop".

The chomp POD says that chomp() will chop off any string held in the input separator varaible $/.
On Win32 the line separator is the sequence of \r\n or ^M^J.
Perl should automagically take care of the propper separator.
But to be explicit you could assign this char sequence to $/ (better localize $/)
e.g.
{ local $/ = "\r\n";
# parsing, chomping here
}
If you prefer you could as well use the octal or hex reps.

$/ = "\015\012";

or even this might work

$/ = "\cM\cJ";

To get rid of carriage returns it's more efficient to use the transliterate operator as known from awk (tr or y) than a regexp.

while () {
tr/\015//d;
...
}
Madness, thy name is system administration
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

Mark Grant's last idea worked.

I'm still going to try to strip that ^M but I've accomplished what I need on this one.

dos2unix works fine as a final processing step.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ralph Grothe
Honored Contributor

Re: A little improvement to my webinput perl scripts

put the aterisk in a character class and test for one ore more times occurrences

while () {
if (/[*]+/) {
# do something
}
}

similarily, treating only lines that don't contain carriage returns

unless (/[\r]/) {
}
Madness, thy name is system administration
Steven E. Protter
Exalted Contributor

Re: A little improvement to my webinput perl scripts

RALPH!

You DUDE!

You got it.

after the chop

$/="\cM";
chomp ( $filedata );

I know I could probably do it in one line of code but I don't care.

THREAD CLOSED!

AWESOME

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com