Operating System - HP-UX
1839175 Members
2898 Online
110136 Solutions
New Discussion

cut command - specifying multiple delimiters

 
SOLVED
Go to solution
Danny Fang
Frequent Advisor

cut command - specifying multiple delimiters

Hi,

I'm attempting to use the "cut" command to extract elements from the filename having the form shown below:
A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml

I'd want to extract the following elements:
1) A20051109
2) 0215
3) 0230
4)SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002
5) statsfile
6) xml

I've tried using "cut" in the following way, but it did not do the trick.

ls A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml |cut -f1-6 -d".,-"
cut: invalid delimiter
bash-3.00$

I've also tried using single quotes as in '.,-,_' to the -d option but it produced the same error shown above.

How do I specify multiple delimiters to the "cut" -d option?

Also, how do I assign each element obtained through "cut" into their respective variable names in a script?

Could anyone help show me how it's done?

Thanks

27 REPLIES 27
James R. Ferguson
Acclaimed Contributor

Re: cut command - specifying multiple delimiters

Hi Danny:

A llok at the manpages for 'cut' will show you that the delimiter ('-d') switch supports only a simple character argument.

You asked "How do I specify multiple delimiters to the "cut" -d option?". Consider :

# echo "a b|c d"|cut -d " " -f2|cut -d"|" -f2

...this extracts the "c" from the input string. We need to do two 'cut's each with a different delimiter.

To assign the extracted value to a variable, simply do:

# VAR=`echo "a b|c d"|cut -d " " -f2|cut -d"|" -f2`

# echo ${VAR} #...to see the value.

Regards!

...JRF...



jon2
Advisor

Re: cut command - specifying multiple delimiters

Change the delimeters to all be the same before using cut.... I chose spaces then awk will delimit on space without setting field seperator - so no cut required". Read directly in to variables.

e.g.
echo "XX.yy,zz-aa,bb" |sed
"s/\,/\ /g
s/-/ /g" |awk '{print $1, $2, $3}' |read VAR1 VAR2 VAR3

If you like you can use this with cut as before but only need to use one type.
harry d brown jr
Honored Contributor

Re: cut command - specifying multiple delimiters

With a decently complicated sed you can parse and reconstruct, replacing the delimiters with spaces:

sed "s/^\(.*\)\.\(.*\)\-\(.*\)\_\(.*\)\_\(.*\)\_\(.*\)\.\(.*\)/\1 \2 \3 \4_\5 \6 \7/"

thus

echo A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml | sed "s/^\(.*\)\.\(.*\)\-\(.*\)\_\(.*\)\_\(.*\)\_\(.*\)\.\(.*\)/\1 \2 \3 \4_\5 \6 \7/"

produces

A20051109 0215 0230 SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002 statsfile xml

Note that I had to reinsert the underscore (_) in the "SubNetwork" string.

live free or die
harry d brown jr

Live Free or Die
Hein van den Heuvel
Honored Contributor

Re: cut command - specifying multiple delimiters

Your problem will be the ","
For some parts of the string this is apparently desired to be a seperator, but not for all. You'll need (perl) logic to fix that.

Just doing the split is easy in perl:

#cat file1.tmp
A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml
#
#perl -ne "foreach $x (split /[.,\-_]/) {print \"$x\n\"}" file1.tmp
A20051109
0215
0230
SubNetwork=ONRM
RootMo
SubNetwork=SNTDCAUJRNC002
MeContext=SNTDCAUJRNC002
statsfile
xml

btw... you can pass the the seperators on the command line with -F, anf perl (like awk) has build-ins for auto split fields.


hth,
Hein.
jon2
Advisor

Re: cut command - specifying multiple delimiters

Do not forget that where the delimiter is desired to be a "_" and has been made a space it can be reconstructed with the variables when they are used. (no perl etc required).

e.g echo "${VAR2}_${VAR3}....
harry d brown jr
Honored Contributor

Re: cut command - specifying multiple delimiters

To assign the "fields" into script variables you could do this:

create a script:

somescript.ksh
#!/usr/bin/ksh
#
while read aline
do
echo $aline | sed "s/^\(.*\)\.\(.*\)\-\(.*\)\_\(.*\)\_\(.*\)\_\(.*\)\.\(.*\)/\1 \2 \3 \4_\5 \6 \7/"|read var1 var2 var3 var4 var5 var6
echo var1=$var1
echo var2=$var2
echo var3=$var3
echo var4=$var4
echo var5=$var5
echo var6=$var6
done

chmod a+x somescript.ksh

cat yourdata | ./somescript.ksh

live free or die
harry d brown jr
Live Free or Die
jon2
Advisor

Re: cut command - specifying multiple delimiters

Harry,
Do you mean like in my sample at 11.26 am ?
Peter Nikitka
Honored Contributor
Solution

Re: cut command - specifying multiple delimiters

Hi,

if you like a solution using shell features only (I did this in ksh):

name='A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC
002,MeContext=SNTDCAUJRNC002_statsfile.xml'
p1=${name%%.*}
remain=${name#$p1.}
p2=${remain%%-*}
remain=${remain#$p2-}
p3=${remain%%_*}
remain=${remain#${p3}_}
p4=${remain%_*}
remain=${remain#${p4}_}
p5=${remain%.*}
p6=${remain#*.}

print $name;print $p1;print $p2;print $p3;print $p4;print $p5;print $p6

A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml
A20051109
0215
0230
SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002
statsfile
xml

The variables p1...p6 will contain the requested values (watch for the correct string operators!).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
harry d brown jr
Honored Contributor

Re: cut command - specifying multiple delimiters

jon2,

close if you consider the use of the "read" statement, but the examples I provided do the following:
- without the use of awk or perl (not that there is anything wrong with them)
- use of pattern matching
- providing all SIX desired output fields
- example of putting the code in a loop

BTW, you wouldn't be claiming plagurisim, would you?

live free or die
harry d brown jr
Live Free or Die
Hein van den Heuvel
Honored Contributor

Re: cut command - specifying multiple delimiters

Harry, Good points!

Peter, Excellent solution if the data is in a shell variable already and the results need to be in shell variable.
Note that often these questions are really only a small part of a large file based data processing and I find it more efficient to stay in perl or awk longer and do all of the string manipulation there before going back to the shel or into a shell (system call)

jon2 at Apr 10, 2006 13:05:01 GMT wrote: > Harry, Do you mean like in my sample at 11.26 am ?

Jon2, how does your solution solve part 4) ?
It seems to me that would be fragemented (like my solution).

Hein,
This part is broken: print \"$x\n\"
On hpux it should simply be: print "$x\n"


Danny,
Similar solution as Harry, but in with perl:

#perl -ne "print "$1 $2 $3 $4 $5 $6\n" if /^(\w+).(\w+)-(\w+)_(.*)_(\w+).(\w+)/" file1.tmp

A20051109 0215 0230
SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002 statsfile xml

You can of course 'echo/print' the intiable variable into perl when not using data from a file, and 'read' its output into shell variables if no further processing is needed.

hth,
Hein.


jon2
Advisor

Re: cut command - specifying multiple delimiters

Harry,

I was only highliting that
'To assign the "fields" into script variables you could' - use the technique I had shown.

You are correct I used awk, no pattern match and my variables only went up to VAR3, and not knowing the source of the line to parse did not put it in an actual script.

Hein I had not offered a "full solution" as I had not logged on to try it....but now I have had to...

Mine would deal with 4 as follows:


cat /tmp/inputfile |sed "s/\./,/g
s/_/,/
s/_/\#/g
s/\#/_/
s/-/,/
s/\#/,/" |awk -F, '{print $1,$2,$3,$4,$5,$6,$7,$8}' | read VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8

I guess that the awk could be removed and spaces used not "," and the output read direct into the variables. but there you go. Why I always use awk in that way I will never know.. habit I guess, along with avoiding using pattern matches.

.

Sandman!
Honored Contributor

Re: cut command - specifying multiple delimiters

How about using awk:

# awk -F"." '{
> split($2,z,"_")
> split(z[1],x,"-")
> printf("%s\n%s\n%s\n%s_%s\n%s\n%s\n",$1,x[1],x[2],z[2],z[3],z[4],$NF)
> }' your_file

cheers!
Danny Fang
Frequent Advisor

Re: cut command - specifying multiple delimiters

Hi Harry,

I tried executing the example which you provided:
somescript.ksh
#!/usr/bin/ksh
#
while read aline
do
echo $aline | sed "s/^\(.*\)\.\(.*\)\-\(.*\)\_\(.*\)\_\(.*\)\_\(.*\)\.\(.*\)/\1 \2 \3 \4_\5 \6 \7/"|read var1 var2 var3 var4 var5 var6
echo var1=$var1
echo var2=$var2
echo var3=$var3
echo var4=$var4
echo var5=$var5
echo var6=$var6
done

chmod a+x somescript.ksh

cat yourdata | ./somescript.ksh

OUTPUT:
var1 is
var2 is
var3 is
var4 is
var5 is
var6 is

and also:

echo A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC0
02,MeContext=SNTDCAUJRNC002_statsfile.xml | sed "s/^\(.*\)\.\(.*\)\-\(.*\)\_\(.*\)\_\(.*\)\_\(.*\)\.\(.*\)/\1 \2 \3 \4_\5 \6 \7/" |read var1 var2 var3 var4 var5 var6 echo var1 is $var1
echo var2 is $var2
echo var3 is $var3
echo var4 is $var4
echo var5 is $var5
echo var6 is $var6

var1 is
var2 is
var3 is
var4 is
var5 is
var6 is

I'm unable to obtain the output to the values var1, var2, var3, var4, var5 and var6.

Could you point out to me where did I go wrong?

I tried the other methods posted in this forum and they all work. However, I'd also like to understand your method of using sed to achieve the results.

Could you help out?

Thanks
Danny Fang
Frequent Advisor

Re: cut command - specifying multiple delimiters

Hi Hein,

I tried your method using PERL but I obtained the error below:

prod-cingkl-linux01\ :/nfs/users/lows>perl -ne "print "$1 $2 $3 $4 $5 $6\n" if /^(\w+).(\w+)-(\w+)_(.*)_(\w+).(\w+)/" f1
Can't open n if /^(\w+).(\w+)-(\w+)_(.*)_(\w+).(\w+)/: No such file or directory.
A20060405.0011-0032_SubNetwork=SE,SubNetwork=RNC203,MeContext=RNC203_statsfile.xml
prod-cingkl-linux01\ :/nfs/users/lows >


I've also tried doing:
cat f1|perl -ne "print "$1 $2 $3 $4 $5 $6\n" if /^(\w+).(\w+)-(\w+)_(.*)_(\w+).(\w+)/"
Can't open n if /^(\w+).(\w+)-(\w+)_(.*)_(\w+).(\w+)/: No such file or directory.
A20060405.0011-0032_SubNetwork=SE,SubNetwork=RNC203,MeContext=RNC203_statsfile.xml
prod-cingkl-linux01\ :/nfs/users/lows >


Could you point out to me where did I go wrong in this implementation?
Sandman!
Honored Contributor

Re: cut command - specifying multiple delimiters

Hi Danny,

That's because you need a newline before echo'ing the variables. Harry's command works but something may have been lost during the copy and paste.

Change...
read var1 var2 var3 var4 var5 var6 echo var1 is $var1
To...
read var1 var2 var3 var4 var5 var6
echo var1 is $var1

cheers!
Danny Fang
Frequent Advisor

Re: cut command - specifying multiple delimiters

Hi Sandman,

Attached is my script according to Harry's.

I do have the echo separated from line
read var1 var2 var3 var4 var5 var6

However, it's still not able to print the values in var1, var2, var3, var4, var5 and var6.

prod-cingkl-linux01\ :/nfs/users/lows >./testSed1.sh
var1 is
var2 is
var3 is
var4 is
var5 is
var6 is

Could you help point out where did I go wrong?

Sandman!
Honored Contributor

Re: cut command - specifying multiple delimiters

Danny,

Run the sed script in debug mode and see what's wrong with it?

# ksh -x ./testSed1.sh

I copied the script into a file on my system and it worked.
Danny Fang
Frequent Advisor

Re: cut command - specifying multiple delimiters

Hi Sandman,

The output of the script in debug mode:

prod-cingkl-linux01\ :/nfs/users/lows >ksh -x ./testSed1.sh
+ echo A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml
+ sed s/^\(.*\)\.\(.*\)\-\(.*\)\_\(.*\)\_\(.*\)\_\(.*\)\.\(.*\)/\1 \2 \3 \4_\5 \6 \7/
+ read var1 var2 var3 var4 var5 var6
+ echo var1 is
var1 is
+ echo var2 is
var2 is
+ echo var3 is
var3 is
+ echo var4 is
var4 is
+ echo var5 is
var5 is
+ echo var6 is
var6 is
prod-cingkl-linux01\ :/nfs/users/lows >

I can't seem to detect where the error is.

Could you help out?

Thanks
jon2
Advisor

Re: cut command - specifying multiple delimiters

Try without the
|read .........
and the echos as the read behaves differently under Linux to HP-UX.
harry d brown jr
Honored Contributor

Re: cut command - specifying multiple delimiters

Danny,

I also cut and pasted the script you attached and it worked.

Can you check the path of ksh and some other things and post them here?

whereis ksh
which ksh
what `which ksh`
ls -l `which ksh`
uname -a

live free or die
harry d brown jr
Live Free or Die
Hein van den Heuvel
Honored Contributor

Re: cut command - specifying multiple delimiters

Ooop, sorry Danny.
I'm travelling and can't get at hpux so I used Windoze to try.
On windoze the perl command is enclosed in double-quotes and thus doublequotes also need to be escaped (with a backslash).

If you can, retry my example but replacing the " with ' around the 'program', or stick the program in a file.

HPUX perl one liners look like:

perl -e 'blah blah' filename

Hein.

harry d brown jr
Honored Contributor

Re: cut command - specifying multiple delimiters


if this is a linux machine, then try replacing the first line of the script

#!/usr/bin/ksh

to

#!/bin/ksh

Additionally, instead of using
what `which ksh`
use
file `which ksh`

live free or die
harry d brown jr
Live Free or Die
Danny Fang
Frequent Advisor

Re: cut command - specifying multiple delimiters

Hi Harry,

prod-cingtoman :/users/lows >what `which ksh`
/usr/bin/ksh:
defs.c $Date: 2002/11/18 20:42:15 $Revision: r11.11/2 PATCH_11.11 (PHCO_27019)
edit.c $Date: 2002/11/18 20:43:06 $Revision: r11.11/2 PATCH_11.11 (PHCO_27019)
io.c $Date: 2002/11/18 20:47:57 $Revision: r11.11/3 PATCH_11.11 (PHCO_27019)
cmd.c $Date: 2002/11/18 20:41:14 $Revision: r11.11/1 PATCH_11.11 (PHCO_27019)
main.c $Date: 2002/11/18 20:52:03 $Revision: r11.11/4 PATCH_11.11 (PHCO_27019)
xec.c $Date: 2002/11/18 20:52:56 $Revision: r11.11/2 PATCH_11.11 (PHCO_27019)
macro.c $Date: 2002/11/18 20:51:03 $Revision: r11.11/2 PATCH_11.11 (PHCO_27019)
error.c $Date: 2002/11/18 20:44:16 $Revision: r11.11/2 PATCH_11.11 (PHCO_27019)
jobs.c $Date: 2002/11/18 20:49:04 $Revision: r11.11/2 PATCH_11.11 (PHCO_27019)
$Revision: @(#) all CUP11.11_BL2002_1129_1 PATCH_11.11 PHCO_27019
Fri Nov 29 08:52:39 PST 2002 $
$ B.11.11_LR Feb 8 2002 01:58:34 $
Version 11/16/88

prod-cingtoman :/users/lows >file `which ksh`
/usr/bin/ksh: PA-RISC1.1 shared executable dynamically linked
prod-cingtoman :/users/lows >

prod-cingtoman :/users/lows >./testSed1.sh
A20051109 0215 0230 SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002 statsfile xml
var1 is
var2 is
var3 is
var4 is
var5 is
var6 is
prod-cingtoman :/users/lows >

prod-cingtoman :/users/lows >ls -l `which ksh`
-r-xr-xr-x 2 bin bin 159744 Nov 30 2002 /usr/bin/ksh

prod-cingtoman :/users/lows >whereis ksh
ksh: /usr/bin/ksh /usr/dt/man/man1/ksh.1 /usr/dt/share/man/man1/ksh.1 /usr/share/man/man1.Z/ksh.1
prod-cingtoman :/users/lows >


prod-cingtoman :/users/lows >uname -a
HP-UX toman B.11.11 U 9000/800 1854960616 unlimited-user license
prod-cingtoman :/users/lows >

LINUX:
prod-cingkl-linux01\ :/nfs/users/lows >uname -a
Linux kl-linux01 2.6.9-5.EL #1 Wed Jan 5 19:22:18 EST 2005 i686 i686 i386 GNU/Linux
prod-cingkl-linux01\ :/nfs/users/lows >ls -l `which ksh`
-rwxr-xr-x 1 root root 183492 Jul 7 2004 /bin/ksh
prod-cingkl-linux01\ :/nfs/users/lows >./testSed1.sh
A20051109 0215 0230 SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002 statsfile xml
var1 is
var2 is
var3 is
var4 is
var5 is
var6 is

could anyone help point out where the mysterious problem lies?



Danny Fang
Frequent Advisor

Re: cut command - specifying multiple delimiters

Hi everyone,

This may sound crazy, but the worked solution provided by Peter Nikitka earlier in this posting is now producing the error:
prod-cingtuna\ :/mkl/users/lows >./separator1.sh
./separator1.sh: bad substitution

prod-cingtuna\ :/mkl/users/lows >cat ./separator1.sh
#!/bin/sh
name='A20051109.0215-0230_SubNetwork=ONRM_RootMo,SubNetwork=SNTDCAUJRNC002,MeContext=SNTDCAUJRNC002_statsfile.xml'
#echo NAME is $name
p1=${name%%.*}
echo P1 here is $p1
remain=${name#$p1.}
echo REMAIN 1 is $remain
p2=${remain%%-*}
echo $p2 is P2
remain=${remain#$p2-}
echo REMAIN 2 is $remain
p3=${remain%%_*}
echo P3 now is $p3
remain=${remain#${p3}_}
echo REMAIN 3 is $remain
p4=${remain%_*}
echo P4 is $p4
remain=${remain#${p4}_}
echo REMAIN 4 is $remain
p5=${remain%.*}
echo P5 is $p5
p6=${remain#*.}
echo P6 is $p6

echo $name;echo $p1;echo $p2;echo $p3;echo $p4;echo $p5;echo $p6

Could anyone point out where did I go wrong in this implementation?

Thanks