Operating System - HP-UX
1833189 Members
2945 Online
110051 Solutions
New Discussion

Re: reformattings strings in a CSV file

 
SOLVED
Go to solution
rmueller58
Valued Contributor

reformattings strings in a CSV file

I have a flat file and need to reprocess it before passing it along to an application provider..


Requirements:
#Column #1 APPEND WF89 to front of existing string in #1
#Column #3 If Value = 8900003 then set value 177
#Column #3 if Value = 89000032 then set value 178
#Column #3 if Value = 0 then set value 177
#Column #4 (need more info)
#Column #5 EQUAL Column #1 (post process value)

I've figured out most items except Column #5

the value of which needs to be EQUAL to Column #5..

My raw data file looks as such:
2,2,890003,,,2,Patty,Peppermint,,f,,ppepper@DOMAINNAME.TLD,
3,3,8900032,,,3,Lucy,VanPelt,1/20/60,f,,lvanpelt@DOMAINNAME.TLD,
51,51,8900032,,,51,Charlie,Brown,2/21/00,M,,cbrown@DOMAINNAME.TLD,
56,56,8900032,,,56,Pig,Pen,1/18/56,M,,ppen@DOMAINNAME.TLD,



My script thus far handles all but Column #5

# Column #1 Formatting
## ftc_staffrex.csv RAW FILE
for record in `cat ftc_staffrex.csv`; do echo WF89$record >>stafftmp.csv done
M
### stafftmp.csv Post Column #1 Process ###
cp -f stafftmp.csv ftc_staffrex.csv

###
#COLUMN 2 and 3 Formatting
perl -pi -e s/"890003"/177/g ftc_staffrex.csv
perl -pi -e s/1772/178/g ftc_staffrex.csv
perl -pi -e s/",0,"/177/g ftc_staffrex.csv


I believe I will need to use AWK or PERL to get the positional $5 and to a s/$5/$1/g I am just not quite sure the best practice and am looking for suggested ways to handle it. Could it be handled in the same line as Column #1 processing?

Ideas and thought appreciated..


20 REPLIES 20
James R. Ferguson
Acclaimed Contributor

Re: reformattings strings in a CSV file

Hi:

Are you counting columns one-relative? (it would appear).

What do you want, field-1 (one relative) to assume the falue of field-5 (one relative) or the opposite???

Regards!

...JRF...
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

James COLUMN #5 = #1 in EACH record.
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

In the example:
2,2,890003,,,2,Patty,Peppermint,,f,,ppepper@DOMAINNAME.TLD,
3,3,8900032,,,3,Lucy,VanPelt,1/20/60,f,,lvanpelt@DOMAINNAME.TLD,

BECOMES:
WF892,2,177,,,WF892,Patty,Peppermint,,f,,ppepper@DOMAINNAME.TLD,
WF893,3,178,,,WF893,Lucy,VanPelt,1/20/60,f,,lvanpelt@DOMAINNAME.TLD,

James R. Ferguson
Acclaimed Contributor
Solution

Re: reformattings strings in a CSV file

Hi (again):

OK, so you want field-5 to be set to field-1.

# perl -pi -e '@F=split /,/;print join ",", @F[0..4],$F[0],@F[6..@F-1]' file

That said, all of your substitutions could be done in one pass by amending them before the 'split()' in the above. I suggest you better anchor your patterns (perhaps by specifying where the numbers have commas).

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: reformattings strings in a CSV file

Hi (again):

Oops. that should be:

# perl -ni -e '@F=split /,/;print join ",", @F[0..4],$F[0],@F[6..@F-1]'

...changing the '-p' to '-n' since we are explicitly printing. You don't want both the before and after image :-)

Looking again at what you are doing, this could all be done in _one_ tiny Perl script like this:

# cat ./reformat
#!/usr/bin/perl -ni.old
use strict;
use warnings;
s/,890003,/177/g;
s/,1772,/178/g;
s/",0,"/177/g;
my @F=split /,/;print join ",", "WF89", @F[0..4],$F[0],@F[6..@F-1];
1;

You may need to change the simple substitution patterns as I first indicated. The script retains a backup file suffixed with '.old' and leaves you with a modified file, in-place.

Run as:

# ./reformat file

Regards!

...JRF...
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

James, I always appreciate your insight. this fit the bill. My grasp of regex is sophomoric at best.
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

James (or who ever)

I need to insert a static value in Column #4

I have a value or "FACULTY" to insert into this file in the currently black Column #4 Any ideas on how to handle it?

This is the script that works for every thing else:
#!/bin/bash
sed s/" "/""/g TLCftc_staff.csv > ftc_staffrex.csv
# Column #1 Formatting
for record in `cat ftc_staffrex.csv`
do
echo WF89$record >>stafftmp.csv
done
cp -f stafftmp.csv ftc_staffrex.csv
#COLUMN 2 and 3 Formatting
perl -pi -e s/",890003,"/",177,"/g ftc_staffrex.csv
perl -pi -e s/",8900032,"/",178,"/g ftc_staffrex.csv
#perl -pi -e s/1772/178/g ftc_staffrex.csv
perl -pi -e s/",0,"/",177,"/g ftc_staffrex.csv


# CODE Column #5
perl -ni -e '@F=split /,/;print join ",", @F[0..4],$F[0],@F[6..@F-1]' ftc_staffrex.csv


James R. Ferguson
Acclaimed Contributor

Re: reformattings strings in a CSV file

Hi Rex:

> I need to insert a static value in Column #4

Simply change:

my @F=split /,/;print join ",", "WF89", @F[0..4],$F[0],@F[6..@F-1];

to:

my @F=split /,/;print join ",", "WF89", @F[0..3],"FACULTY",$F[0],@F[6..@F-1];

By the way, instead of running separate Perl scripts to do substitutions, stack them, seperated by semicolons!

perl -pi -e s/"890003"/177/g;s/1772/178/g;s/",0,"/177/g' ftc_staffrex.csv

...or simply use the script I offered.

Regards!

...JRF...

Hein van den Heuvel
Honored Contributor

Re: reformattings strings in a CSV file


Just write yourself a single simple perl script to implement all rules, current and future.

- loop though input
- split into columns
- deal with each column
- re-assemble columns
- print

----------------- process_columns.pl ------

while (<>) {
@columns = split /,/;
$columns[0] = 'WF89' . $columns[0];
$columns[2] = 177 if $columns[2] == 8900003 || $columns[2] == 0;
$columns[2] = 178 if $columns[2] == 89000032;
$columns[3] = 'FACULTY';
$columns[4] = $columns[0];
print join ',', @columns;


If you insists on a one-liner:

$ perl -i -F/,/ -lape '$F[0]="WF89$F[0]"; $F[2]=177 if $F[2]=8900003 || !$F[2]; $F[2]=178 if $F[2]
=89000032;$F[3]="FACULTY";$F[4]=$F[0]; $_=join ",",@F' your-file

Hein.

rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

Thanks you both.. I humbled as always..

James, I appreciate the one liners.. I am very ignorant as to the step through of what is occuring in the command.. My perl and regex knowledge is very limited..

I tend to break scripts down into steps, which when dealing with the record field by field can add a lot of overhead..

I wanted to get the original idea in place 1st, then I will use James's or Hein's script to improve it.
James R. Ferguson
Acclaimed Contributor

Re: reformattings strings in a CSV file

Hi Rex:

Hein's script shows a much safer approach to the substitutions you want than mine (which really only addressed the fourth and fifth field updates that comprised you post).

Since we have split the record into its component fields, instead of using regular expressions in matching and substitution, it is far faster and more rigorous to compare string equality.

Regards!

...JRF...
H.Merijn Brand (procura
Honored Contributor

Re: reformattings strings in a CSV file

Just to add: *NEVER* use split on real CSV file data. What would it do if fields actually contain a comma? Or newlines?

This is valid CSV:

1,,"","Hello, world",100.2,"This
line continues on the next",12

That line has 7 fields, and is extremely hard to parse with split and <>

Use Text::CSV and install - if you can - Text::CSV_XS for speed.

use strict;
use warnings;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new ({ binary => 1});
open my $fh, "<", "file.csv" or fie "file.csv: $!";
while (my $row = $csv->getline ($fh)) {
$row->[4] or next; # Field 5 should be true
# do some more with @$row
}
close $fh;

Enjoy, Have FUN! H.Merijn [ who chimed in too late for points ]
Enjoy, Have FUN! H.Merijn
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

Merlin,

the file is actually "comma" separated.. No quotes.
H.Merijn Brand (procura
Honored Contributor

Re: reformattings strings in a CSV file

I know. But is a general remark.

Once people were comfortable with using perl/split to handle CSV data that was ok to deal with this way, they tend to use the same way again when confronted with a similar problem.

That is where it goes pearshape.

Start learning Text::CSV now and use it the way it was intended and you will use it without problems and great joy ever on. I promise!.

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

James there is ONE field I need to revisit on this..

the Field is now being populated with information.. I need to retain the value of the field for #5 and append a "W89" to the front of the value in the field.

What would need to do to the values in the PERL lines?

perl -ni -e '@F=split /,/;print join ",", @F[0..2],"STUDENT", $F[0],@F[6..@F-1]' ftc_studentrex.csv
perl -ni -e '@F=split /,/;print join ",", @F[0..3],$F[0],@F[6..@F-1]' ftc_studentrex.csv
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

I am going to try a re-write with hein's code.
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

Hein,

In your code where / how would you read in and output file

Would you do a shell script for / while loop?

for record in `cat filename.txt`
do
process.pl
done
James R. Ferguson
Acclaimed Contributor

Re: reformattings strings in a CSV file

Hi:

> James there is ONE field I need to revisit on this..the Field is now being populated with information.. I need to retain the value of the field for #5 and append a "W89" to the front of the value in the field.

As we agreed, Hein's solution is cleaner than my original one. Counting 0-relative as Perl does, change:

$columns[4] = $columns[0];

To:

$columns[4] = 'W89' . $columns[4];

The dot is a concatenation operator, so updates the fifth field with 'W89" ahead of its old contents.

Regards!

...JRF...
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

James,

I am finally getting it.. Thanks Again..
rmueller58
Valued Contributor

Re: reformattings strings in a CSV file

I used hein's one liner..

Thanks again to all.