Operating System - HP-UX
1821800 Members
3208 Online
109637 Solutions
New Discussion юеВ

String handling in shell scripts

 
SOLVED
Go to solution
J Ruud
Advisor

String handling in shell scripts

I'm wondering if someone would be able to suggest an approach to resolving a problem that I'm having with string handling in a shell script.

I'm using ksh and I have a script that I'm using to read a file that consists of two fields per record. The fields may or may not be separated by spaces. The record length is fixed (12), as are the fields (6). I want to parse the fields and feed them to an awk script, which will use them to find a matching record in another file. I'm using a read loop to process the first file and using the set
command to determine how to parse the data. If the number of args is 2, then the fields are separated by one or more spaces and I can just assign $1 and $2. If I only have 1 arg then I use a cut command to parse the data. The problem I'm dealing with is that the shell seems to drop leading blanks. So if the first field has one or more leading blanks and the second field doesn't, the cut ends up grabbing data from the second field. How do I get the shell to leave the record data as it is in the file?

Here's the code segment...

while read line
do
set $line
if [ $# -eq 2 ]
then
a=$1
b=$2
else
a=`echo $line | cut -c 1-6`
b=`echo $line | cut -c 7-12`
fi
buffer=`awk '{ if( $2 == '$a' && $3 == '$b' )
print $2, $3, $4, $6, $7, $8, $9, $10, $11}' ${file2}`
echo $buffer >> ${ifn}
done < $file1
15 REPLIES 15
Steven Schweda
Honored Contributor

Re: String handling in shell scripts

More quotation?

> line=' ab '

> e1=` echo $line `
> e2=` echo "$line" `

> echo '>>'"$e1"'<<'
>>ab<<

> echo '>>'"$e2"'<<'
>> ab <<

Your "echo $line" is tossing white space.
Kenan Erdey
Honored Contributor

Re: String handling in shell scripts

Hi,

> a=`echo $line | cut -c 1-6`
> b=`echo $line | cut -c 7-12`

try

a=${line:0:6}
b=${line:6:12}

Computers have lots of memory but no imagination
J Ruud
Advisor

Re: String handling in shell scripts

It looks like it happening before I get to the lines with the cut command. I tried using the double quote approach to do a little debugging. I used echo 'x'"$line"'x'. And, I found that where line = ' ab', I got 'xabx'. It looks like it's happening in the read.
J Ruud
Advisor

Re: String handling in shell scripts

So, I guess my question now is...
How do I read this file without having the shell drop leading white space?
Kenan Erdey
Honored Contributor
Solution

Re: String handling in shell scripts

change internal delimiter to another character than whitespace

put IFS=':' to the top of your script.
Computers have lots of memory but no imagination
J Ruud
Advisor

Re: String handling in shell scripts

Good Answer!
That did the job.
Thanks!
Peter Nikitka
Honored Contributor

Re: String handling in shell scripts

Hi,

to overcome the problem of your space handling, I suggest to use pure awk.
More, I would check, if the input is really in correct format.
awk 'BEGIN {while (getline < "file1" == 1) {if (NF==2) {
if (!(length($1) == 6 && length($2) == 6)) print "incorrect data at","file1",NR >"error"
else {a[++i]=$1; b[i]=$2}}
else {if(length($0) <12) print "incorrect:","file1",NR >"error"
else {a[++i]=substr($0,1,6); b[i]=substr($0,6,12)}}}
NF >= 11 {for(j=1;j<=i;j++) {
if($2 == a[j] && $3 == b[j]) {print $2, $3, $4, $6, $7, $8, $9, $10, $11
next}}' file2 >outfile

NB: Untested - no UNIX at hand here.
Look carefully for balanced () and {}.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
J Ruud
Advisor

Re: String handling in shell scripts

Peter,
If I couldn't come up with a simple script solution, I was thinking about how I might be able to use more awk to solve the problem. But, not being that well versed in using it, I was struggling to come up with a solution. I was thinking that I would be able to use $NF, length and substr to determine the format of the input and parse it. But I wasn't sure how I could process the one file, one line at a time, while using its contents to process the other. I was looking into that when Kenan came up with the one line solution to my problem. But, thanks for your input. I'm going to save your answer for future reference.
Dennis Handly
Acclaimed Contributor

Re: String handling in shell scripts

You can use mix solution of shell read and awk to do what you're doing. But use awk to split the fixed length fields:
while read line; do
# Your solution, with fixes
set $line
if [ $# -eq 2 ]; then
a=$1
b=$2
else
a=$(echo "$line" | cut -c 1-6)
b=$(echo "$line" | cut -c 7-12)
fi
buffer=$(awk -v a="$a" -v b="$b" '{ if( $2 == a && $3 == b)
print $2, $3, $4, $6, $7, $8, $9, $10, $11}' ${file2})
echo $buffer >> ${ifn}

# awk solution
awk -v line="$line" '
BEGIN {
a = substr(line, 1, 6)
gsub(" ", "", a) # remove spaces
b = substr(line, 7, 6)
gsub(" ", "", b)
}
{
if ($2 == a && $3 == b)
print $2, $3, $4, $6, $7, $8, $9, $10, $11
}' ${file2} >> ${ifn}
done < $file1
Dennis Handly
Acclaimed Contributor

Re: String handling in shell scripts

You can use mixed solution of shell read and awk to do what you're doing. But use awk to split the fixed length fields:
while read line; do
# Your solution, with fixes
set $line
if [ $# -eq 2 ]; then
a=$1
b=$2
else
a=$(echo "$line" | cut -c 1-6)
b=$(echo "$line" | cut -c 7-12)
fi
buffer=$(awk -v a="$a" -v b="$b" '{ if( $2 == a && $3 == b)
print $2, $3, $4, $6, $7, $8, $9, $10, $11}' ${file2})
echo $buffer >> ${ifn}

# awk solution
awk -v line="$line" '
BEGIN {
a = substr(line, 1, 6)
gsub(" ", "", a) # remove spaces
b = substr(line, 7, 6)
gsub(" ", "", b)
}
{
if ($2 == a && $3 == b)
print $2, $3, $4, $6, $7, $8, $9, $10, $11
}' ${file2} >> ${ifn}
done < $file1

>The problem I'm dealing with is that the shell seems to drop leading blanks.

What do you want to do with those leading blanks? You won't find them in your second file.
J Ruud
Advisor

Re: String handling in shell scripts

My mixed solution of shell and awk was working fine except for one case, when the first field had leading spaces and the second field didn't. Then, the number of args would be one and I'd have to use the cut to parse the fields. The problem with the shell dropping the leading white space was that the "cut 1-6" would then grab data from the second field. I wasn't worried about the leading space in awk. It was about getting the fields parsed correctly.
J Ruud
Advisor

Re: String handling in shell scripts

Dennis,
Regarding your solution, it gives me an idea of another approach that I might take using awk. But, I don't think it would have worked as you have it coded. You see, with the space being dropped, the awk script would only get a string of 11 bytes. So the substr would have the same problem as my cut.
James R. Ferguson
Acclaimed Contributor

Re: String handling in shell scripts

Hi:

Instead of reading a line into a single variable, I would do:

...
while read A B X
...

Now, either you have two fields (or more, in which case the third..n-th are in 'X') or you have field 'A' with an empty 'B'.

If you have one field ('B' is empty), you could reject the record read if the size of 'A' isn't exactly 12-characters (your requirement).

For instance, you could issue an error message and continue the 'read' loop by adding:

[ "${#A}" -ne 12 ] && { echo "bad_size"; continue; }

If you are going to change the Inter-Field-Seperator to prevent field splitting and preserve both leading and trailing spaces, *at least* localize its action:

#!/usr/bin/sh
OLDIFS=${IFS}
IFS=''
while read A
do
echo "[${A}]"
echo "...and my size was: " ${#A}
done
IFS=${OLDIFS}

...

Regards!

...JRF...
J Ruud
Advisor

Re: String handling in shell scripts


Thanks for your input. The file will always have two fields, unless I change it. In case you're wondering, the format is such because of the number of different programs and files with which it is used. I'm not that experienced with programming in shell. Mainly, I've used it to call other programs and I learn as much as I need to get the job done. This whole exercise started when I decided to provide added functionality by modifying this script to take in a file of arguments, instead of the original two. So, first, I had to figure out how to read the file and pass the args to the awk script. Your suggestion shows me another way that shell can be used to perform file processing tasks. Also, I like the idea of localizing the IFS influence. And, although I don't have to worry about it having a negative impact on the processing of this script, I'll definitely implement your idea. Thanks to everyone for your responses. It's been a positive learning experience.
Dennis Handly
Acclaimed Contributor

Re: String handling in shell scripts

>with the space being dropped, the awk script would only get a string of 11 bytes. So the substr would have the same problem as my cut.

Yes, you need that IFS solution too. But you need to later remove those spaces.