Operating System - HP-UX
1824808 Members
3916 Online
109674 Solutions
New Discussion юеВ

Appending space at end of each line

 
MAYIANAN
Advisor

Appending space at end of each line

Hi all,

I am getting a flat file with uneven line size. I need to format the line to equal length by appending spaces at the end of each line without creating a new file. Plz help me out.




7 REPLIES 7
Dennis Handly
Acclaimed Contributor

Re: Appending space at end of each line

>I am getting a flat file with uneven line size.

This is how UNIX works, variable length files, separated by newlines.

This creates a new file, which you can rename back to the original if you want.
$ awk -v LEN=40 '
{
if (length($0) < LEN) {
printf "%-" LEN "s\n", $0 # pad to LEN chars
} else
print $0
}' file

This pads short lines to 40 chars.

You could use dd(1):
$ dd conv=block cbs=40 if=file of=file.new

Here there are no newlines on the end.
MAYIANAN
Advisor

Re: Appending space at end of each line

the flat file which we are getting is of 9 GB , so that we cannot create a new file for the same because of lack of server space. We are looking for a solution which will do the append operation in existing file.
Dennis Handly
Acclaimed Contributor

Re: Appending space at end of each line

>the flat file which we are getting is of 9 GB, so that we cannot create a new file for the same because of lack of server space.

Adding spaces on the end of the 9 Gb file lines will also make it bigger.

>We are looking for a solution which will do the append operation in existing file.

You are not appending, you are inserting spaces in the middle which will move every byte down.

You could write a 64 bit application to just mmap the file, first make pre-pass to find out how many spaces to insert, then do it from the end to the beginning. Of course this will need 9 Gb of swap.

Why do you need to do this on such a large file? A file this big is useless, no human will read it.

You could take my awk script and change it so it will print out how many bytes need to be added. What is the fixed length you want to pad to?? Do you want the newline on the end of each line?

Otherwise you would have to break the 9 Gb file into many smaller bite sized pieces.

awk -v LEN=80 '
{
if (length($0) < LEN)
spaces += LEN - length($0)
}
END { printf "Need to add %d blanks\n", spaces} ' giant-file

Fortunately it does appear that awk supports large files.
Dennis Handly
Acclaimed Contributor

Re: Appending space at end of each line

You can of course use a pipe to enable you to process bite sized pieces. Of course you couldn't write it back on top of itself.
James R. Ferguson
Acclaimed Contributor

Re: Appending space at end of each line

Hi:

> I am getting a flat file with uneven line size.

I suspect that you are accustomed to mainframe files with fixed record sizes where fields are mapped in terms of offsets from the beginning of each record (line).

Unix files are byte streams with the newline character (\012 or 0x0a) delimiting a line or record boundry.

Fields are delimited by some convenient character (a space, tab, colon, comma) that allows splitting a line into its component fields based on the appearance of the field delimiter. It doesn't matter if a field (or its delimiter) is of consistent size.

Hence, padding your file with trailing spaces is needless. If the file is going to be moved from a UNIX server to a mainframe, look at the various options of FTP. You should be able to create fixed-size records as your file is copied to the mainframe from the UNIX server.

Regards!

...JRF...
Hein van den Heuvel
Honored Contributor

Re: Appending space at end of each line

>> need to format the line to equal length by appending spaces at the end of each line without creating a new file

As others indicated, this can not be done without heroic efforts/

Ask yourself WHY spaces need to be added.
That will lead towards the right answer.

The answer could be a message to management that the task can not be done unless/until more disk space is made available.

>> lack of server space.

NOT a reasonable way to run a business!
What is the price of not solving the problem vs the price of storage?

Can you roll it out to tape and back in?

Anyway... Back to the WHY.

I can see a few reasons. The most likely reason is that the file will be input to a process which requires fixed length records.
If so, then just add the spaces while feeding the data through a pipe:

Instead of
# process-fixed input.fixed
Use:
# awk '{printf ("%-40s\n",$0)}' input.variable | process-fixed

An other reason is a need to FTP to an other system where it is expected / desired to be in a certain format. Well... just tell management to put the onus on the other system. Surely they can do the convert if it is a system / team worth its salt!

Finally, there might be an application on your system which would like to do random access to the large files using 'known' record numbers and lseeks. Ouch. You may want to create a look-a-side list mapping record numbers to lseek offsets for that work.

The 'heroic' solution? not requiring a full copy of the data? Read to the whole file leaving breadcrumbs as you go. The distance for the breadcrumbs would be as much records as you can reasonably remember in an array, 10,000. So ftell for offest values every 10,000 records.
As you reach the end, step back to the last 10,000 boundary. Read all records to end into the array. Multiply that start record number by the target records size + 1 for the per record terminator. Now start fseek to that new byte address and start writing the expanded records from the array.
Step back to the breadcrumb corresponding to the chunck of records before the current batch, fseek back and read the 10,1000 records, calculate and fseek to its new start and write that batch. Repeat untill back at first record. Yikes!

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Dennis Handly
Acclaimed Contributor

Re: Appending space at end of each line

>Hein: there might be an application on your system which would like to do random access to the large files using 'known' record numbers and lseeks.

Right, I worked on such a system and it and UNIX's variable length records were "sucky" for random access.

Then someone added the starting record # in dead space at the end of each block so you could binary search to find the block then read forward. Unfortunately this was for system spool files.

gencat(1) sets up an index.