Operating System - HP-UX
1833535 Members
3183 Online
110061 Solutions
New Discussion

Delete ?' pattern in a streamed file

 
SOLVED
Go to solution
Kellogg Unix Team
Trusted Contributor

Delete ?' pattern in a streamed file

Hello,

I am looking for a command to delete occurance of ?' pattern in a streamed file. sed doesn't work on streamed file and I am unable to get tr to work for a pattern, it deletes ALL occurance of ? and ' which I don't want. For example a test file with following data -

This is a test?' file'
with'? some' test? data'

should be translated as -

This is a test file'
with'? some' test? data'

I also need the resultant file to be streamed. Any help is appreciated.

Thanks in advance
...Manjeet

work is fun ! (my manager is standing behind me!!)
19 REPLIES 19
Michael Schulte zur Sur
Honored Contributor

Re: Delete ?' pattern in a streamed file

Hi,

it is not clear to me what you want.
There are still ? and ' after translation.

greetings,

Michael
Rodney Hills
Honored Contributor

Re: Delete ?' pattern in a streamed file

This won't work?

echo "abc?'efg" | sed -e "s/\?'//"

-- Rod Hills
There be dragons...
Kellogg Unix Team
Trusted Contributor

Re: Delete ?' pattern in a streamed file

Michael,

If you look closely at the output file, "?'" [? followed by ' - in that order] has been removed, other kind of occurances are not modified.

Rodney,

sed doesn't work on streamed files. I had tried that but didn't work.

...Manjeet
work is fun ! (my manager is standing behind me!!)
Hein van den Heuvel
Honored Contributor
Solution

Re: Delete ?' pattern in a streamed file


Manjeet,
Please explain what you mean with streamed file. Is that different from data coming in trough a pipe? A 'binary' stream with no clear line deliniation?

Rodney's suggestion seems to work for me

$ cat > x
This is a test?' file'
with'? some' test? data'
$ cat x | sed -e "s/\?'//"
This is a test file'
with'? some' test? data'

Similar with perl:

$ perl -pe "s/\?'//g" x
This is a test file'
with'? some' test? data'

hth,
Hein.


H.Merijn Brand (procura
Honored Contributor

Re: Delete ?' pattern in a streamed file

For every problem there is the right tool. In this case it proves to be the much underrated 'tr' util.

# process | tr -d '?'
# tr -d '?' < file

also supported in perl, and MUCH faster than s///

# process | perl -pe'tr/?//d'

lt09:/home/merijn 102 > perl -MBenchmark -e'$a="ab?cdef?gh" x 1000;timethese(-3,{s=>sub{($b=$a)=~s/\?//g},tr=>sub{($b=$a)=~tr/?//d}})'
Benchmark: running s, tr for at least 3 CPU seconds...
s: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 1224.84/s (n=3846)
tr: 3 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 11271.57/s (n=35280)
lt09:/home/merijn 103 >

lt09:/home/merijn 124 > set time
lt09:/home/merijn 125 > perl -le'$a="ab?cdef?gh" x 1000;print$a for 1..1000' > file
0.007u 0.056s 0:00.06 83.3% 0+0k 0+0io 0pf+0w
lt09:/home/merijn 126 > ll file
1562524 -rw-rw-rw- 1 merijn users 10001000 2004-12-04 11:27 file
0.000u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w
lt09:/home/merijn 127 > sed 's/\?//g' file | wc -c
8001000
1.440u 0.026s 0:01.51 96.6% 0+0k 0+0io 0pf+0w
lt09:/home/merijn 128 > tr -d '?' < file | wc -c
8001000
0.034u 0.018s 0:00.05 80.0% 0+0k 0+0io 0pf+0w
lt09:/home/merijn 129 > perl -pe's/\?//g' file | wc -c
8001000
0.867u 0.022s 0:00.92 95.6% 0+0k 0+0io 0pf+0w
lt09:/home/merijn 130 > perl -pe'tr/?//d' file | wc -c
8001000
0.143u 0.027s 0:00.17 94.1% 0+0k 0+0io 0pf+0w
lt09:/home/merijn 131 >

I bet that, seeing these timings, people are going to use 'tr' a lot more. Both as standalone util and in perl.

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Muthukumar_5
Honored Contributor

Re: Delete ?' pattern in a streamed file

simply do as,

muthu # cat > t.log
This is a test?' file'
with'? some' test? data'

muthu # sed "s/\?'//" t.log
This is a test file'
with'? some' test? data'

muthu # sed "s/?'//" t.log
This is a test file'
with'? some' test? data'

muthu # perl -pe "s/\?'//" t.log
This is a test file'
with'? some' test? data'


HTH.
Easy to suggest when don't know about the problem!
Ralph Grothe
Honored Contributor

Re: Delete ?' pattern in a streamed file

Merijn,

thank you for reminding us of the effeciency of the tr translate command (or transliterate operator as the Perl folks name it) over the usage of pattern substitutions by regexps.
As your benchmarking vividly demonstrates one should always avoid the expense of regexps when there is no need for such a sharp knife.
Boy, almost 10 times as fast.
Madness, thy name is system administration
Michael Schulte zur Sur
Honored Contributor

Re: Delete ?' pattern in a streamed file

Merijn,
tr does not work because Manjeet wants to delete ?'

Muthukumar,
as Majeet already stated sed does not work because he has not text file.

greetings,

Michael
Michael Schulte zur Sur
Honored Contributor

Re: Delete ?' pattern in a streamed file

Hi,

I hope the follwing c programme does what you need. create pattdel.c compile and use
cat streamfile | ./pattfile

greetings,

Michael

#include
main()
{
char c1,c2;
while ((c1=getchar())!=EOF)
{
if (c1!='?')
printf("%c",c1);
else {
c2=getchar();
if (c2!=EOF && c2!=0x27)
printf("%c%c",c1,c2);
}
}
}
Ralph Grothe
Honored Contributor

Re: Delete ?' pattern in a streamed file

Of course does tr delete characters from sets.
Here an excerpt from tr manpage


-d Deletes all occurrences of input characters or
collating elements found in the array specified in
string1.



The same effect has the trailing d modifier in Merijn's Perl script
Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: Delete ?' pattern in a streamed file

Sorry, I haven't read carefully enough.
It's actually the string sequence "?'" he wants to get rid of.
Then you're right.
Madness, thy name is system administration
Fred Ruffet
Honored Contributor

Re: Delete ?' pattern in a streamed file

Ralph,

tr -d "?'" will delete all ? and all '. Manjeet do not want to suppress ? nor ' nor '?. Just ?'.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Ralph Grothe
Honored Contributor

Re: Delete ?' pattern in a streamed file

Fred,

I understood by now.
That's why I apologized above.
Albeit in a different case when you need to translate whole character sets tr is the preffered choice for efficiency.
Madness, thy name is system administration
Fred Ruffet
Honored Contributor

Re: Delete ?' pattern in a streamed file

Sorry Ralph, didn't saw your post :)

Fred
--

"Reality is just a point of view." (P. K. D.)
Kellogg Unix Team
Trusted Contributor

Re: Delete ?' pattern in a streamed file

Hi, I am back ! :-) I am done with little firefighting.

First of all, thanks to all for participating. This is truly one of the best forums I have seen.

I am sorry if I have confused some of you with "streamed" file. Streamed file is a text file but is stripped off CR/LFs (upto the last line). I have attached a file, foo.streamed (streamed file) with this message. If you do a line count on it, following is returned -

# wc -l foo*
0 foo.streamed

If you run sed commands on a streamed file, it simply won't run as end-of-line is never detected. And I have not found a way for tr to remove a " pattern of ?' ", as some of you have already noticed.

Having said that, I'm happy to see perl happily doing what I want! Now that the example file is there, I would be like to know any alternate solution which I can do without perl. No, don't get me wrong, I am not against perl, its just that I would like to avoid introducing an application if I can do the same with unix commands.

Thanks!
...Manjeet
work is fun ! (my manager is standing behind me!!)
Fred Ruffet
Honored Contributor

Re: Delete ?' pattern in a streamed file

Well, what about doing something like this :
(cat foo.streamed ; echo) | sed "s/?'//g" | tr -d "\n"
(everybody's happy : both sed and tr :)

you add a \n, then parse, then remove \n.

Tested. Worked.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
A. Clay Stephenson
Acclaimed Contributor

Re: Delete ?' pattern in a streamed file

Avoiding Perl might have made sense as recently as 6 or 7 years ago but nowadays Perl is installed as "standard equipment" on virtually every UNIX platform and in the worst case is readily available for easy install. It's really your least evil weapon in this case.

If you try to do this with other "standard" UNIX commands, you stand a very good chance of hitting arbitrary buffer limits that vary from platform to platform and release to release. You could easily find that sed readily handles your file on platform A and explodes on platform B. All of the traditional text processing commands (awk, sed, ...) really assume line-oriented text files but Perl allows you the freedom to process binary input exactly according to your instructions as well as line-oriented text. As a bonus, well-written Perl script will run on Windows platforms as well --- without change.
If it ain't broke, I can fix that.
Kellogg Unix Team
Trusted Contributor

Re: Delete ?' pattern in a streamed file

Fred,

Thanks for the solution; nice way to achieve the results! :)

Clay,

I see your point. In fact the files will indeed be very large EDI files and it will be catastrophic to lose part of the file if, for buffer limitation, command fails to execute. I will run the tests & probably use perl then! :)

You all are very helpful, my thanks to all of you.

...Manjeet
work is fun ! (my manager is standing behind me!!)
Kellogg Unix Team
Trusted Contributor

Re: Delete ?' pattern in a streamed file

Closing
work is fun ! (my manager is standing behind me!!)