- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: help needed scripting urgent plzz
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-06-2006 03:28 AM
10-06-2006 03:28 AM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
of course you shouldn't type blindly and copy my spelling errors into your script - you should know (slowly, but in the end shurely ...) what the characters, that form the awk program do.
My mistake:
awk -v qu=#"' ...
should have been written as
awk -v qu='"' ...
mfG Peter
who really has done some work in this thread here - additional work for you, which really should be done:
http://forums1.itrc.hp.com/service/forums/helptips.do?#28
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-06-2006 06:31 AM
10-06-2006 06:31 AM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
Cheers,
Hein.
--------------------- tmp.pl --------------
use warnings;
use strict;
# element: 0 1 2 3 4 5 6 7 8 9
my @len = (2,2,2,0,1,2,2,0,4,5);
my @num = (0,0,1,0,0,0,1,0,0,1);
sub strip { return ($_[0] =~ /^"(.*)"$/)? $1 : $_[0] }
#
# Replace comma in quoted strings with spaces.
#
while (<>) {
chomp;
my (@quoted) = split /"/;
my ($i)=1;
while ($i < @quoted) {
$quoted[$i] =~ s/,/ /g;
$i += 2;
}
$_ = join "\"", @quoted ;
#
# Now find the real fields
#
my (@values) = split /,/;
#
# Deal with optional fields
#
if (&strip($values[8]) ne "") {
if (&strip($values[3]) eq "") {
print STDERR "Line $. field 3 - missing\n";
}
}
#
# Validate each fields for length (if non-0 length) and numericness
#
for ($i = 0; $i < @values; $i++) {
my $strip = &strip($values[$i]);
#debug print STDERR "-- $.:$i:$num[$i]:$len[$i]:$values[$i]:$strip\n";
if ($len[$i] && (length($strip) != $len[$i])) {
print STDERR "Line $. field $i - bad lenght\n";
}
if ($num[$i] && ($values[$i] =~ /\D+/)) {
print STDERR "Line $. field $i - not numeric\n";
}
}
print "$_\n";
}
--- test data ----
--- notice comma in field on 2nd line ---
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi,there","sony",12345
"hi","bh",12,"ih","j","mk",34567,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","jxxx","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",3x,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
---- sample run ---
> perl tmp.pl tmp.tmp > tmp.new
Line 3 field 6 - bad lenght
Line 4 field 4 - missing
Line 5 field 8 - bad lenght
Line 7 field 4 - bad lenght
Line 8 field 6 - not numeric
>
> cat tmp.new
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi there","sony",12345
"hi","bh",12,"ih","j","mk",34567,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","jxxx","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",3x,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-06-2006 06:50 AM
10-06-2006 06:50 AM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
When checking field-length, do you take the double-quotes into account? For ex. if a field is "hi" then its length is 4 (taking " into account) else it's 2 (not taking " into account); so please clarify.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-06-2006 03:02 PM
10-06-2006 03:02 PM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-06-2006 04:57 PM
10-06-2006 04:57 PM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
i have kept our validation script in a function. how can i return the NR(record in which validation fails) to that function??? when i try to give return $NR outside awk it is not taking. how can i do this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2006 07:56 AM
10-07-2006 07:56 AM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
you asked:
>>
...how can i return the NR(record in which validation fails) to that function??? when i try to give return $NR outside awk it is not taking.
<<
I ask: what script you are talking about?
If you did combine all the solutions together I gave you for your problems, it should be something like this:
awk 'trim_fields_with_commata' /original/file >/tmp/corrected_file
awk -v qu='"' -v len="$len" 'check_structure_of_records' /tmp/corrected_file
If you do not need the temporary file for inspection, just combine the awk's via a pipe. The output or the 2nd awk then contains all ill formated records.
If you want to process them further, simply redirect this output to another file and deal with this one.
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2006 11:35 AM
10-08-2006 11:35 AM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
# myawkscr.sh inputfile
~hope it helps
========================myawkscr.sh========================
#!/usr/bin/sh
awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i
printf((i
}' $1 | awk -F, '{
vlen = 0
numrc = 0
for (i=1;i<=NF;++i) {
if (i==1 || i==2 || i==4 || i==5 || i==6 || i==8)
if (length($i) != 4)
vlen++
if (i==9)
if (length($i) != 6)
vlen++
if (i==3 || i==7) {
if (length($i) != 2)
vlen++
if ($i!~/^"/)
numrc++
}
if (i==10) {
if (length($i) != 5)
vlen++
if ($i!~/^"/)
numrc++
}
}
if (vlen || !numrc)
printf("line %d [validation fails]: %s\n",NR,$0)
else
}'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-11-2006 04:46 AM
10-11-2006 04:46 AM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
>>
scripting urgent plzz
<<
seems no longer be so urgent :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2006 10:20 PM
11-06-2006 10:20 PM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
len='2 2 2 2 2 2 2 2 4 5'
awk -F, -v qu=#"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]){if(!(match($i,qu) && ((length($i)-2)==l[i])) printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file
is not working when the file doesnot contain double quotes at the start. So, can you please suggest me wht can be done..
The input file will be
"ABC","124","dkfjkd","jkdf","45678"
"sony","home,hi","890"
NOTE:comma can be present inside""
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2006 07:58 PM
11-07-2006 07:58 PM
			
				
					
						
							Re: help needed scripting urgent plzz
						
					
					
				
			
		
	
			
	
	
	
	
	
in one of my replies I told you to correct a typo in the awk syntax. The characters in the assignment to variable 'qu' at the -v option are singlequote-doublequote-singlequote '"'.
To make the algorithm more clear, here the existence of double quotes is checked first.
awk -F, -v qu='"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if((match($i,qu)) len2chk=l[i]+2
else len2chk=l[i]
if(length($i)!=len2chk){printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d contains non-numeric data,%s,\n",NR,i,$i)
}
}' file
Taking these two lines of your example input data
"ABC","124","dkfjkd","jkdf","45678"
"sony","home,hi","890"
I take advise to read one of my previous replies to this post:
You MUST consolidate your data first to to fulfil the condition NOT HAVING additional commata in your data when viewing it as fields in respect to a field delimiter "doublequote".
You can use a replacement character which is not used elsewhere (e.g. ^) and re-substitute it later.
But for the above awk solution - and that is true for ALL possible solutions as I stated earlier - data must be clean.
mfG Peter
- « Previous
- 
						- 1
- 2
 
- Next »
