Operating System - HP-UX
1833187 Members
2799 Online
110051 Solutions
New Discussion

Difficult string extraction

 
SOLVED
Go to solution
SwissKnife
Frequent Advisor

Difficult string extraction

Hi,

 

Here the format of my filename

aaaaaaaaaaaaaaaaaaaaaaa.bbb.cccccccccccccc.gz

 

How to extract the 8 chars of the section just before .gz ?

 

Exemple:

SIEBER00_ora_38928476_1.aud.20170208163224.gz

=> 20170208

 

Any ideas ?

 

kind regards

Den.

 

 

6 REPLIES 6
Patrick Wallek
Honored Contributor

Re: Difficult string extraction

If everything is the same format, then something like this may work:

 

# export VAR1=SIEBER00_ora_38928476_1.aud.20170208163224.gz

# echo $VAR1
SIEBER00_ora_38928476_1.aud.20170208163224.gz

# echo $VAR1 | awk -F . '{print $3}' | cut -c 1-8
20170208
SwissKnife
Frequent Advisor

Re: Difficult string extraction

Hi, thank you for your answer,

 

I should have give more details. I can't check but perhaps I could have more . in the filename.

I missed to precise this and of course with your solution it works if format stays the same.

Is there a way to consider as a good mark the .gz and take the string before ? or it's too complicated ?

 

 

Kind regards,

Den

 

PWallek
New Member
Solution

Re: Difficult string extraction

Try this.  This will print the 2nd to last field (the one before the .gz and cut out columns 1-8:

# echo $VAR1 | awk -F . '{print $(NF-1)}' | cut -c 1-8
20170208

 

 

Steven Schweda
Honored Contributor

Re: Difficult string extraction

   Or, if "sed" is your only friend:

pro3$ echo 'aaaa.aaa.bbb.c1c2c3c4c5c6c7.gz' | \
 sed -e 's/^.*\.\([^.]*\)\.gz$/\1/' -e 's/\(........\).*/\1/'
c1c2c3c4

   The first expression looks for any characters at the begininng {^.*},
a dot {\.}, any non-dot characters {[^.]*}, and ".gz" at the end
{\.gz$}, and keeps the non-dot characters between those dots (the last
dot before ".gz", and the dot in ".gz").  The second expression keeps
the first eight characters from that result.

SwissKnife
Frequent Advisor

Re: Difficult string extraction

Hi,

perfect, thank you.

Kind regards, Den.

Dennis Handly
Acclaimed Contributor

Re: Difficult string extraction

You can of course program in awk:

echo "SIEBER00_ora_38928476_1.aud.20170208163224.gz" | awk '{print substr($0, index($0, ".gz") - 8, 8)}'