Operating System - HP-UX
1753378 Members
4809 Online
108792 Solutions
New Discussion юеВ

Zip files 1.7 Million files

 
SOLVED
Go to solution
ssheri
Advisor

Zip files 1.7 Million files

Hi,

I have a filesystem which has got 1.7 million files. This filesystem contains files since Jan 2005.I need to gzip all files till Dec2008. I need to do this in an yearly basis. ie one zip file for 2005 , one zip file for 2006 . Same procedure for rest of the years.

Can anyone help me with the commands for this task?

Your help is much appreciated.
18 REPLIES 18
OldSchool
Honored Contributor

Re: Zip files 1.7 Million files

do the filenames have the date in them somewhere? or are you relying on the datestamps? or something else entirely?

Suraj K Sankari
Honored Contributor

Re: Zip files 1.7 Million files

HI,
First make a tar file then zip it with compress or gzip utility.

tar -cvf 2005.tar /directiory_name
gzip 2005.tar

Suraj
Michael Steele_2
Honored Contributor

Re: Zip files 1.7 Million files

cd dir
find . -atime 360 -exec ll {} \;

Verify your selection by listing everything captured by find

When ready

find . -atime 360 -exec gzip {} \;

This is for one year. 720 for two years, etc.

I'd also suggest using 'tar' after you gzip else you'll run out of space fast. Real, real fast. In fact, having another dir to work with would be good.

find . -name *.gz | tar cvf backup.tar {} \;
Support Fatherhood - Stop Family Law
OldSchool
Honored Contributor

Re: Zip files 1.7 Million files

"tar -cvf 2005.tar /directiory_name
gzip 2005.tar"

that assumes the OP had the files already segregated into directories by year, which may or may not be the case.


"find . -atime 360 -exec gzip {} \;"

is probably closer to what the OP wants, but will result in one zip file for each original file found...which may be what their after.

or you could take the above "find" and "mv" the file to a separate directory, then gzip each, then tar the results....or mv the file, tar the directory and gzip *that*.

ssheri needs to remember the their is no "create date" stored in unix filesystems, M. Steele is going after the "access time" which would be may be a good bet. see the "man" page for "find", in particular the "-atime", "-mtime" and "-ctime" options to see which best fits.

Another option would be to create two reference files with appropriate dates, and use the "-newer" and "-older" options to sort out what you want.

All of the above is why I originally asked if the date was somehow "buried" in the filename.

some additional information about the original data layout, and the desired results might help in providing more appropriate responses.
Steven Schweda
Honored Contributor

Re: Zip files 1.7 Million files

> Another option would be to create two
> reference files [...]

This seems like a better scheme than any of
the "-time" options. Especially if
you're not running the job at 00:00 on 1
January. "-atime" would seem to be the
least likely to get the desired result
(unless no one ever looks at these files).

> or you could take the above "find" and
> "mv" the file [...]

I'd vote for moving them to year-specific
directories that way, and then doing
something like:

tar cf - year_2005_dir | \
gzip -c > year_2005_dir.tar.gz

Creating an actual "tar" archive file, and
_then_ hitting it with gzip tends to require
more disk space, at least temporarily.

> find . -atime 360 -exec gzip {} \;
>
> This is for one year. 720 for two years,
> etc.

Around here, years are longer than 360 days.
Which calendar do you use? (And which does
"find" use?)
Viktor Balogh
Honored Contributor

Re: Zip files 1.7 Million files

If I would want to separate the files based explicitly on the year, I would go this way to create a file list:

# find . -exec ll {} + | awk '$8 == "2007"' | tee list_2007

This lists the files exactly from year 2007, (1st jan -> 31th dec) and also dives into subdirs. After that you could feed this file to gzip/tar or whatever you want...

****
Unix operates with beer.
OldSchool
Honored Contributor

Re: Zip files 1.7 Million files

lots of options presented.....still waiting for "ssheri" to shed some light on the original directory layout and the desired output.

from what was originally stated, it could well be that the OP wants a gzip file for a given year that contains all the files for that year (as opposed to zipping a tar of those files).

If so, I don't think that option has been covered yet, and it might be a pain to implement.
ssheri
Advisor

Re: Zip files 1.7 Million files

Hi All,
Thanks for your quick responses. I hope I would explain my requirement in detail.
=======================================

I have a filesystem which contains 1.7 million files. File are there since 2005 till today. My requirement is to tar and zip the files for each year separately. ie one tar/zip file for 2005, 2006,2007 and 2008. The files can be identified by their time stamp and there are no separate directories for each year. All files are residing on a single directory.
======================================
OldSchool
Honored Contributor
Solution

Re: Zip files 1.7 Million files

"I have a filesystem which contains 1.7 million files. File are there since 2005 till today. My requirement is to tar and zip the files for each year separately. ie one tar/zip file for 2005, 2006,2007 and 2008. The files can be identified by their time stamp and there are no separate directories for each year. All files are residing on a single directory."

Ok, this could get ugly. Making the assumption that the files will be removed after archiving, then something like the following can be modified to work:

First, you need to realize that UNIX doesn't have / track a file timestamp related to the "creation time". It knows the following:

atime (File Access Time)
Access time shows the last time the data from a file was accessed - read by one of the Unix processes directly or through commands and scripts.

ctime (File Change Time)
ctime changes when you change file's ownership or access permissions. It will also naturally highlight the last time file had its contents updated.

mtime (File Modify Time)
Last modification time shows time of the last change to file's contents. It does not change with owner or permission changes, and is therefore used for tracking the actual changes to data of the file itself.

So...which one you look at depends on what you want. IF you can guarantee that the contents of the file, once written, were never modified, then the mtime option of find should be ok. Access time is useless for this if the file has ever been read after writing. Ctime *might* work.

If none of the above apply, then you're toast, as you've no way to locate files written in 2005.

Let us say that mtime is workable in your case, and you are going to find those files in year 2005. I'd create two reference files representing the upper and lower limits of the times you wish to locate:

touch -a -m -t 200501010000.00 $HOME/first.ref
touch -a -m -t 200512312359.59 $HOME/last.ref

should get be everything between 01/01/2005 at 00:00 and 12/31/2005 at 23:59 and 59 seconds.


then use find to locate the relevant file using find and move them to a directory by themselves

mkdir /yourname/2005
cd /where_files_are

find . -xdev -type f -newer $HOME/first.ref -a !-newer $HOME/last.ref -exec mv {} /yourname/2005/. \+

at that point, you should be able to tar the newly created directory and pipe that to zip as noted in one of the posts above.

Note that the above has not been tested, you might want to substitute something harmless, like ls for the move until you get it sorted out.

repeat the above, after adjusting timestamps on the ref files, and creating the required directories.