Operating System - HP-UX
1834814 Members
2408 Online
110070 Solutions
New Discussion

Re: Finding and correcting curropt file lines

 
Declan Heerey
Frequent Advisor

Finding and correcting curropt file lines

I have a file where each line starts with one of the following strings;

HDR
POL
OBJ

I have a script which searches the file and errors if one of the lines has is corrupt. The scripts run as follows

cat | while read line
do grep -n -e "^HDR" -e "^POL" -e "^OBJ" -v > err_log.txt
done

I then manually edit the script at the lines reported in the err_log.txt (if any exist)

Is there any way I can get the script to perform the corrections for me? any help appreciated

Declan
13 REPLIES 13
Massimo Bianchi
Honored Contributor

Re: Finding and correcting curropt file lines

Hi,
i think that you have no luck, because corruption can have different method for appearing.

If you know exactly what kind of corruption can be, there are chances.

Other way, if you want to tear away the offending lines, you can use a nice grep like your, without "-v", redirecting output to a file, but i suspect some missing things.

Any more hints from you regarding the kind of corruption ?

Massimo
Umapathy S
Honored Contributor

Re: Finding and correcting curropt file lines

Declan,
You havnt said what to replace with. Corruption is random. If you want you can redirect all the valid lines to another file but you may be missing the corrupted lines which needs your manual attention.

-Umapathy

Arise Awake and Stop NOT till the goal is Reached!
Declan Heerey
Frequent Advisor

Re: Finding and correcting curropt file lines

Apologies i should have explained; corruption normally takes place when a line breaks before finishing and so begins another line i.e

HDR more text>
POL
OBJ

I then go to offending line (end of HDR line) and hit Ctrl J
Erik Heckers
Advisor

Re: Finding and correcting curropt file lines

Hello!

I guess your 'grep' is already doing the trick if you remove -n and -v.

Or use this shorter solution:

grep -E "^(HDR|POL|OBJ)" repaired.txt

Erik
if power_on; then
TOMAS BERNABEU
Frequent Advisor

Re: Finding and correcting curropt file lines



Hi
use sed command for delete/change this lines.

To
curt larson_1
Honored Contributor

Re: Finding and correcting curropt file lines

a start would be

cat $yourFile |
while read line
do
case $line in
HDR*) if corrupt ;then
fix
fi
print fixedLine;;
POL*) if corrupt ;then
fix
fi
print fixedLine;;
OBJ*) if corrupt ;then
fix
fi
print fixedLine;;
*) print $line;;
done > newFile

#check your fixes or
mv newFile yourFile

your have to put in how to test for corruption
and what the fix would be
curt larson_1
Honored Contributor

Re: Finding and correcting curropt file lines

you could try something like this

cat yourFile|
awk '
#assume the first line starts correct
#print it without a newline
NR == 1 {printf("%s",$0);next;}
# if the line match our pattern
# apply the newline and
#print it without a newline
/^HDR/ {printf("\n%s,$0);next;}
/^POL/ {printf("\n%s,$0);next;}
/^OBJ/ {printf("\n%s,$0);next;}
# doesn't match one of out patterns
# print the line, joining it to the
# previous line
{printf("%s,$0);}
# apply an ending newline if necessary
END {printf("\n");}
'> newfile

you can apply spacing on the join and ending newline as desired
Declan Heerey
Frequent Advisor

Re: Finding and correcting curropt file lines

Curt, that looks like the solution i need, however when i run the script against the file in question i get the following error messages;

awk: The string
%s,$0);ne cannot contain a newline character.
The source line is 13.
The error context is
/^HDR/ {printf("\n%s,$0);next;} >>>
<<<
syntax error The source line is 14.
awk: The statement cannot be correctly parsed.
The source line is 14.
awk: There are 3 missing } characters.
awk: There are 3 missing ) characters.
./test_and_fix[15]: Syntax error at line 15 : `(' is not expected.


Any ideas?
Massimo Bianchi
Honored Contributor

Re: Finding and correcting curropt file lines

Roland's answer might work, with proper " !! :)

cat yourFile|
awk '
#assume the first line starts correct
#print it without a newline
NR == 1 {printf("%s",$0);next;}
# if the line match our pattern
# apply the newline and
#print it without a newline
/^HDR/ {printf("\n%s",$0);next;}
/^POL/ {printf("\n%s",$0);next;}
/^OBJ/ {printf("\n%s",$0);next;}
# doesn't match one of out patterns
# print the line, joining it to the
# previous line
{printf("%s",$0);}
# apply an ending newline if necessary
END {printf("\n");}
'> newfile


One day or the other i will study awk seriusly....

Massimo
Declan Heerey
Frequent Advisor

Re: Finding and correcting curropt file lines

One day i will do the same, until then any additional help would be appreciated. With the correct ""'s i get the following error;

awk: Cannot find or open file match.
The source line number is 17.
./test_and_fix[15]: Syntax error at line 15 : `(' is not expected.
Charles G.
Honored Contributor

Re: Finding and correcting curropt file lines

Hello Declan,

The following perl script will do what you expect :

#!/usr/bin/perl

# read the file and place it in a list
while (<>) {
chomp;
push(@liste,$_);

}

# print first line assumed ok
$line = shift(@liste);
print $line;

# handle other lines
foreach $line (@liste) {
# print a newline if the line did not match the patterns
print "\n" if ($line =~ /^HDR/ || $line =~ /^POL/ || $line =~ /^OBJ/);
print $line;
}

# print trailing newline
print "\n"


Save the script in a file xx.pl (eventually change the first line to match the path of a perl interpreter on your system), make the xx.pl file executable and use :

cat | xx.pl > file.out

Declan Heerey
Frequent Advisor

Re: Finding and correcting curropt file lines

Charles, you're a star, that perl script works a treat!!! thanks for all the help guys, much appreciated.

I'm going to get onto my boss and sort out a perl course!

Great stuff and thanks again! I'll be able to relax over the weekend

Have a good one

Declan
curt larson_1
Honored Contributor

Re: Finding and correcting curropt file lines

Declan

Massimo has pointed out my missing
double qoutes in the printf's

and

you might already satisfied with Charles's
perl script.

but it still looks like there was a syntax
issue with your script before that. If your
still interested in finding out what it was,
if you'd post your verison, i'm sure we can
figure out where the problem is.