Operating System - HP-UX
1822588 Members
3444 Online
109643 Solutions
New Discussion юеВ

Re: Too many resources required when I use "gzip"

 
Curtelin
Advisor

Too many resources required when I use "gzip"

Hello,

Every day, I need to compress a large text file (near 500 Mo for size).
I (must) use the command "gzip" to execute this task, but other applications do not appreciate...CPU is overloaded (near 100 % of WCPU for the process "gzip" when I display all processes with the command "top") and other processes may crash...

How to reduce (limit) the load of the CPU when I use "gzip" (it's not a problem for me if the task is longer), in order other processes don't crash ?

I use the HP-UX "RP2405" server with 2 CPU (650 Mhz each) and 2 gigas for memory.

Thanks a lot for your help and your answers !!!
32 REPLIES 32
DCE
Honored Contributor

Re: Too many resources required when I use "gzip"


One way is to use the nice command (man nice) to assign a lower priority to the gzip command.

Curtelin
Advisor

Re: Too many resources required when I use "gzip"

Yes I've already tried using "nice -39 gzip my_file" to give the lower priority to the command "gzip" but the process is always the first one displayed in the "top" list, with 100% for WCPU...
A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

You can start the process under nice to lower priority or you can use the renice command to lower the priority of an already running process. However, no other processes should crash simply because the machine is heavily loaded. That behavior indicates a basic design flaw in the applications.
If it ain't broke, I can fix that.
DCE
Honored Contributor

Re: Too many resources required when I use "gzip"



What does ps -l show under the NI column when you run the command with nice?

The reason I ask is that nice -39 problably will not work, since the number after the nice command is added to the existing nice value (usually a default of 20) and the resulting nice level of 59 is outside of nice's limits

nice -19 should get you to a nice level of 39 (the max)
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

I don't know all the processes used by the only application running on the server (how they work, etc.)...that's why I would like to forbid "gzip" command to load CPU at 100%.

(It's true that the application has probably problems with the OS...)
Tim Nelson
Honored Contributor

Re: Too many resources required when I use "gzip"

If no other processes are asking for CPU then your gzip will use all the CPU no matter what the nice value is. In this case because it is a 2 cpu system gzip will use only on CPU out of two and your load should be 50%.

If other processes need the CPU then the gzip will go to the bottom of the list as it's nice value is -19.

This is all normal.

Processes should not crash.

Now, IO is a different story, if the gzip has the disk pegged out and other processes need to IO to that disk...


Curtelin
Advisor

Re: Too many resources required when I use "gzip"

Well the line "average" displays near 50% (for CPU, USER) when I use the command "gzip", but it seems that the problem is the "%WCPU" : it's always 98-100% (for example if I launch 2 gzip, both processes display 98% for WCPU)
How to reduce this WCPU percent ? what does it mean please ?
Thanks.
Bill Hassell
Honored Contributor

Re: Too many resources required when I use "gzip"

The man page for nice is confusing because a lower priority means that the HP-UX prioirity number will be larger. So nice -39 was the wrong way to go. nice 39 will make gzip run only when there are no other processes needing a CPU. Don't worry about 100% CPU usage. All flavors of Unix are timeshare systems and they can easily run 200% to 900% workloads with no problems at all You will see this with the uptime command. A workload of 2.00 for your 2 processor system means that both processors were fully used during the measurement periods. A workload of 4 means that each CPU-bound process will get about half the available time. That is what timesharing is all about. I have never heard of processes that crash due to heavy CPU usage unless these processes heavily depend on timing (rather than semaphores). In that case, find new programmers -- timing dependent code is a great example of bad programming.

So gzip will indeed consume some CPU cycles but like all data processing programs, it will relinquish the CPU each time it performs a read or write. This can happen dozens of times per second so everything will get it's fair share of disk and CPU. The nice priority only affects the compute time so if this large file is on a mountpoint that is heavily used by your applications, then gzip's I/O will slow things down and nice won't help.


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

Well, if nice or renice isn't good enough for you then it would be rather easy to craft a very small perl script that would read stdin and output to stdout but be passed 2 parameters (e.g. -l count -s seconds) so that after every "count" lines the process would sleep "seconds" seconds. Gzip will read from stdin and write to stdout so your sleeper script would pipe its output to gzip. This would free up the CPU so that outher processes can run more often.

By the way, by any chance have you "improved" your timeslice and set it to a very low value (1) rather than the default 10?
If it ain't broke, I can fix that.
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

"nice 39 gzip toto.tar" doesn't work...I must put a "-" before 39 and then I got "nice = 39" for the gzip process when I display processes with the command "top".

What is the timeslice ? :)

A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

It's a kernel tunable and if you have to ask, you are the wrong person to fix it.
If it ain't broke, I can fix that.
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

Yes I don't know enough Unix kernel...
It's just to find a solution in order the application used on the server doesn't crash when I use gzip with a large file...
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

About the script, it's very interesting...
Is it difficult to create ? I only know ksh or csh scripts...
Could you help me please ?

You mean I can "cut" my large file, then compress each part of my file with a delay between each compression, and obtain only one compressed file at the end ?
A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

No, I mean read 5000 lines or so and then sleep 2 seconds or so and read another 5000 lines and ... until the file is completely read. You could do it in the shell but perl would be much more efficient.

This is a band-aid approach to fixing the underlying problem. Your applications should not crash.
If it ain't broke, I can fix that.
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

Ok thanks, I "see" the mecanism...
The problem is that I "tar" the text file first before zipping it (because it's possible that there are more than one text file sometimes...)
So I don't know if I can "count" the lines of a tar file.
A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

No you cannot but I find it interesting that tar does not choke your box but gzip does. Gzip is more CPU intensive but should be no more i/o intensive than tar. I was leaning towards Perl because I thought there was more to the story but the nice thing about Perl is that it can handle binary data so rather than specifying lines you might specify a number of bytes.
If it ain't broke, I can fix that.
Steven E. Protter
Exalted Contributor

Re: Too many resources required when I use "gzip"

Shalom,

Bottom line is gzip is CPU intensive. There is no way around that.

I would suggest that you make sure any patches are available for gzip you get them.

Alternatively you may wish to stop using gzip altogether.

You've got valid solutions for lowering cpu use of gzip. What also might be causing problems and crashes is that while gzip is gzipping there are two copies of the file. filename and filname.gz

You may be stressing a critical filesystem. Perhaps try one with lots of extra space and low i/o.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

It's better to have extra space using gzip ? even if the space is enough to store the 2 files ?

How to know my gzip version ? is it the version of "zlib" ? I got zlib v1.1.4 when I do "swlist"
Cam Diaz
New Member

Re: Too many resources required when I use "gzip"


If you have PRM (Process Resource Manager) installed with your OE you could configured it to do what you want.
Mark Ellzey
Valued Contributor

Re: Too many resources required when I use "gzip"

Curtelin,

Another thing you may want to try is to run gzip when other users are not using the machine so heavily. In other words, write a script to do the gzip, then create a cron entry for it at 3:00 am. Hopefully, all your users will be fast asleep at that time.

Regards,
Mark
A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

OK, I spent about 10 minutes throwing together a Perl script that will read stdin to stdout in binary. By default if pauses 2 seconds after 10 64KiB chunks have been processes and then repeats but there are options to change the behavior. Invoke as cppause.pl -u for full usage. This should throttle your i/o sufficiently BUT this is still a Mickey Mouse, Band-aid fix for your poorly designed applications.


If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: Too many resources required when I use "gzip"

OK, I spent about 10 minutes throwing together a Perl script that will read stdin to stdout in binary. By default if pauses 2 seconds after 10 64KiB chunks have been processed and then repeats but there are options to change the behavior. Invoke as cppause.pl -u for full usage. This should throttle your i/o sufficiently BUT this is still a Mickey Mouse, Band-aid fix for your poorly designed applications.


If it ain't broke, I can fix that.
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

Thanks a lot for your help !!!

I try the script immediately !

The bin "perl" is in /opt/perl/bin...it's not a problem, isn't it ? I just have to change the first line of the file... ?
Curtelin
Advisor

Re: Too many resources required when I use "gzip"

Sorry, I'm not a pro with Unix and I don't manage to use the script :(
I change the first line of the script in order to find the interpreter "perl", then I launch the script with this syntax : for example "./cppause.pl gzip toto.tar"

Is it the good command please ?
When I do a "top", the process "perl" is sleeping and I don't see any process "gzip" :(