Operating System - HP-UX
1847820 Members
4597 Online
104021 Solutions
New Discussion

Re: find command with gzip

 
Dave La Mar
Honored Contributor

find command with gzip

The below command, whether from a script or command line fails to zip all the eligible files found.

find /path_to/some_directory -mtime +30 -a ! -name "*.gz" -exec /usr/contrib/bin/gzip {} \;

To set the tone for my post:
1. I do not need an option to what the above is expected to perform.
2. I realize there are a host of other ways to produce the results the command is "supposed" to perform.
3. I am utilizing another solution.

With the above in mind, my post is simply:

1. Why would one need to run the posted command multiple times to get all eligible files zipped?
2. It appears to be similar to a buffer overrun where the find command produces results faster
than gzip can perform.

Has anyone a "qualified" explanation for this issue?

Thanks in advance for that "explanation".

Best regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."
17 REPLIES 17
Kevin Wright
Honored Contributor

Re: find command with gzip

2) It appears to be similar to a buffer overrun where the find command produces results faster than gzip can perform.

I would agree with this..does the command work using xargs?
John Dvorchak
Honored Contributor

Re: find command with gzip

I wonder if your assumption #2 isn't correct. From the man page:

find pathname -type d -print | xargs chmod 555

Note that output from find was piped to xargs(1) instead of using
the -exec primary. This is because when a large number of files
or directories is to be processed by a single command, the -exec
primary spawns a separate process for each file or directory,
whereas xargs collects file names or directory names into
multiple arguments to a single chmod command, resulting in fewer
processes and greater system efficiency. The + delimiter for the
-exec primary can be used to achieve the same efficiency.

Since it is doing all of this forking, I wonder if you are exceeding you max_procs.
If it has wheels or a skirt, you can't afford it.
Pete Randall
Outstanding Contributor

Re: find command with gzip

Dave,

I really like John's thinking here and I think you could easily prove his case by observing Glance's System Tables Report while running your find command.


Pete



Pete
Todd McDaniel_1
Honored Contributor

Re: find command with gzip

Im kind of wondering if it would be better in a script that first identifies the files then performs the zip in a for loop.

find .. ... -mtime +30 -a !name > file.list
for name in `cat file.list`
do
/usr/contrib/bin/gzip $name
echo "$name is zipped, proceeding to next file."
done


cant remember if single or double quotes work... I think it is double with a Variable name...
Unix, the other white meat.
Dave La Mar
Honored Contributor

Re: find command with gzip

Kevin -
Yeah I'm leaning #2, just wondered if it could be quantified. Not interested in trying another way, since this is nearly identical to the man page example using -exec rm, just looking for a "why not this?".

There was a typo in the command I presented it is precisely:
find /path_to/some_directory \( -mtime +30 -a ! -name '*.gz' \) -exec /usr/contrib/bin/gzip {} \;

John & Pete-
Not a blip in glance when I ran the command, not even close to limits.

Todd -
Sorry, not relevent to the post.

Still frustrated.

dl
"I'm not dumb. I just have a command of thoroughly useless information."
Todd McDaniel_1
Honored Contributor

Re: find command with gzip

No, disrespect... but I think my example is very relevant.

I can understand you wanting to do it as the manpage shows, but how can you say my example is irrelevant?

The problem with the gzip command is that you have a variable amount of time for the Gzip to finish...

The find command is forcing its output into the gzip command which can only handle 1 file at a time. Using exec rm is totally different unless the file is extremely large. rm will almost immediately remove a file with little or no overflow... whereas gzip will take much longer in comparison.

You must find a way to hold off the data flow from the find command and allow the gzip to complete its work. My example does that... unless it has a syntax error.


Unix, the other white meat.
Dave La Mar
Honored Contributor

Re: find command with gzip

Todd -
Whoa, certainly no disrespect intended on my part either. While your second post is more to the relevance of the content of the post, I am really looking to find "the why answer", and yours is a fair conclusion.
And I thank you.

Best regards,

dl
>snip

To set the tone for my post:
1. I do not need an option to what the above is expected to perform.
2. I realize there are a host of other ways to produce the results the command is "supposed" to perform.
3. I am utilizing another solution.

With the above in mind, my post is simply:

1. Why would one need to run the posted command multiple times to get all eligible files zipped?
2. It appears to be similar to a buffer overrun where the find command produces results faster
than gzip can perform.

Has anyone a "qualified" explanation for this issue?
"I'm not dumb. I just have a command of thoroughly useless information."
Patrick Wallek
Honored Contributor

Re: find command with gzip

Something else to keep in mind regarding find with the -mtime option, is that the search is not only dependent on the date, but on the time. If you have files in this directory that are being created all time, seconds or minutes apart, then you could very well get different results everytime you run the command.

You could have a file there that was created 29 days 23 hours 55 minutes and 16 seconds ago, that won't get picked up the first time you run the command, but if you wait 5 minutes, then it will since it is then more than 30 days old.
Dave La Mar
Honored Contributor

Re: find command with gzip

Patrick -
Good point. One that I did not consider, but after testing I find this is not the case.

Regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."
Todd McDaniel_1
Honored Contributor

Re: find command with gzip

Dave,

hehe...everyting is cool.

just trying to keep an even keel here. I know I can come accross rather strong sometimes in text only when you cant see my true intent.

I understand trying to solve "why". I am very much like that as well... Even if I found another answer that worked, I would still like to know why it cant work if it doesn't.

keep plugging away and post your solution if you find one that will work with "find".
Unix, the other white meat.
John Dvorchak
Honored Contributor

Re: find command with gzip

Ok let me try another stab at this. I am still thinking that you are running into a limit here so could you test it again and monitor these kernel params:

max_thread_proc

maxuprc

I got this from running sam and looking at HELP:

* Overview of Process Management Parameters * Configurable Parameters for Process Management: maxdsiz maximum process data segment size

maxssiz maximum process storage segment size

max_thread_proc maximum number of threads that one process can create

maxtsiz maximum process text segment size

maxuprc maximum number of processes per user

nkthread maximum number of kernel threads allowed on the system at same time
If it has wheels or a skirt, you can't afford it.
Dave La Mar
Honored Contributor

Re: find command with gzip

John -
Looks like:
max_thread_proc 256

maxuprc - 2500

maxssiz maximum - 2gb

maxtsiz - 10gb

nkthread - 2048
I am looking for the metric for max_thread_proc in order to run glance -adviser only for this while the script is running. The reason why I suspect you may have found the reason is the value of 256.
Note though that of the 1036 files available for zipping, the last run only zipped 162. Do you really think a new thread is created on each file passed to the gzip. If so, I think we are gaining ground on this.

Best regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."
Alan Turner
Regular Advisor

Re: find command with gzip

find -exec is inefficient because it forks a process for each match, BUT does it actually run the processes in parallel??? It seems quite likely that find would fork, exec, then wait for the child to complete before continuing. If this is the case, then maxuprc wouldn't come into it - only find and one gzip are running at any one time. However, as gzip modifies the directory which is being processed by find - find is probably doing a readdir() - is it possible that the updates to the directory are interfering with the results of readdir?
Dave La Mar
Honored Contributor

Re: find command with gzip

Alan -
Good thought, but the update would be the zipped file which is in the -a part of the logic to ignore .gz files. Or were you on another train of thought?

Regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."
Dave La Mar
Honored Contributor

Re: find command with gzip

John D. -
O.K. I ran glance in systax_only against
proc_thread_count,proc_proc_id to cover the period of time the command was running.
Unfortunately, it came back indicating only one thread for the process at any given time it was running.
Sure was worth a try.
Thanks.

Any other ideas?

Best regards,
dl
"I'm not dumb. I just have a command of thoroughly useless information."
Alan Turner
Regular Advisor

Re: find command with gzip

Dave

I was on a rather different train of thought:
a) find appears to do its execs one at a time (hence no obvious huge number of threads or processes)
b) since there are a lot of files to handle, find will exec several times
c) each command exec'd modifies the directory being searched (to create .gz and delete )
d) find is probably using the readdir() call to get the directory contents - will this cope gracefully with the directory being changed on every call? Perhaps the filesystem code every so often reorganises the directory to eliminate holes, e.g. by moving late entries to earlier positions, in which case a file entry could be moved from beyond the current directory pointer (held locally by find) to be before the directory pointer, and thus be missed.

(BTW - I would have expected a "-type f" on the find, but I don't think it's relevant.)
Dave La Mar
Honored Contributor

Re: find command with gzip

Alan -
Thanks for the well explained possibility.
I sure hate closing out this thread without a definitive answer.
There has been good participation and alot of thought given by all who replied.
While I lean towards a couple of the replies, I will walk away with another unsolved mystery. At this point, I have wasted enough time of the talented forum participants as well as our own staff on this topic.

I apolgize to all for not awarding higher than a 7, but I would not want this thread referred to in the future with the expectation of a solution.

Many thanks to all.

Best regards,

dl
"I'm not dumb. I just have a command of thoroughly useless information."