- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Any compression tool available which could use mul...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 09:39 AM
тАО11-29-2004 09:39 AM
I found pbzip2 (Parallel bzip) which has multiple CPU support. But it have a 2GB file limit
Does any one have any recommendations?
Thanks in advance
Madhu
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 10:54 AM
тАО11-29-2004 10:54 AM
Re: Any compression tool available which could use multiple CPU
Maybe you should use compress. It compresses low but faster.
Regards,
Fred
"Reality is just a point of view." (P. K. D.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 11:02 AM
тАО11-29-2004 11:02 AM
Re: Any compression tool available which could use multiple CPU
You can design your script to zip two tools a the same time.
do
control=$(ps -ef | grep zipsomething | wc -l)
if [ control -ge 2 ]
then
sleep 30
else
zipsomething &
zipsomething &
fi
That will at least insure that two zip processs are running. You'll have to be careful building the zipsomething command line so the same file is not zipped by two processes.
As to the tool gzip is good, can go up to 8 GB with patching.
SEP
done
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 11:13 AM
тАО11-29-2004 11:13 AM
Re: Any compression tool available which could use multiple CPU
Here is the link http://compression.ca/pbzip2/
But as I said earlier it has a 2GB file size limit and bzip2 does not have that limit for the version I use.
In my case I have a single 35GB file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 11:15 AM
тАО11-29-2004 11:15 AM
Re: Any compression tool available which could use multiple CPU
You're right for multiple files, but here there is only one file. So what is needed is really multi-thread. And it does not exis, as far as I know.
Regards,
Fred
"Reality is just a point of view." (P. K. D.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 05:22 PM
тАО11-29-2004 05:22 PM
Re: Any compression tool available which could use multiple CPU
'split' the large single file into fifo pipes.
Launch compresses for each pipe.
Start transfers as the compress jobs finish.
Uncompress, appending to a single file on the other side.
The uncompress, much like the transfer, would be single stream, but that generally takes less time than compress.
Here is a perl script that was supposed to split in parallel:
$file = shift @ARGV or die "Please provide file to split and # chunks";
$chunks = shift @ARGV;
$chunks = 4 unless $chunks;
$chunks = 26 if $chunks > 26;
$total = -s $file;
die "puny file" unless ($total > 10000000);
$name = "xxx_";
$chunk = int( $total / $chunks);
$i = 0;
while ($i < $chunks) {
$command = sprintf( "mknod %sa%c p", $name, ord("a") + $i++ );
printf "-- $command\n";
system ($command);
}
$command = "split -b $chunk $file $name";
$i = 0;
while ($i <= $chunks) {
print "-- $command\n";
exec ($command) unless fork();
$letter = ord("a") + $i++;
$command = sprintf( "cat %sa%c | gzip > %sa%c.gz", $name, $letter, $name, $letter );
}
$pid = 1;
$pid = wait() while ($pid > 0);
First problem was that gzip does not eat from fifo's... but cats do!
Biggest problem is that only one zip is going at a time because split is of course only writing one pipe at a time waiting for the result to be picked up.
One silly fix for that is to split into real intermediate files, and zip those. Yuck.
I think the better solution would be for the perl script to fork mutliple reader streams which each seek to their own start point and then read (binmode) and feed data into their own gzips.
On the other side, I think I'd go for a single unzip to combine the files. I don't think it will work to output into a single file from multipel streams after individual seeks. Then again, I suppose that would work, notably when starting block aligned.
Cheers,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 06:04 PM
тАО11-29-2004 06:04 PM
Re: Any compression tool available which could use multiple CPU
If you compress a single file, that limit is a burden, and for ages gzip had that problem. It was easy to overcome using more recent versions from GNU.
bzip2 knows, just as gzip, compression rate command line parameters, that influence the CPU usage (-1 .. -9), but do not control the number of CPU's involved, so your question is very good.
The option of having a script take care of running two (or more) compressions at the same time is good, but why would pbzip2 not work on unlimited file sizes when in streaming mode?
# pbzip2 -options < very_very_large_file > compressed_file
and why use a compressed file anyway
# pbzip2 -options
using dd as a buffer
Another option would be to compile pbzip2 from source yourself, removing the file limit
http://compression.ca/pbzip2/
http://compression.ca/pbzip2/pbzip2-0.8.tar.gz
Enjoy, Have FUN! H.Merijn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-29-2004 08:46 PM
тАО11-29-2004 08:46 PM
Re: Any compression tool available which could use multiple CPU
It is fixed at 4 threads, but that seems to be the optimum - even on my 12cpu rp8400 where it consumes over 1000% (yes one thousand) cpu in top.
It uses the zlib library so you need that installed. It runs at between 2-3 times the speed of compress.
The decompression is single stream, but that has been shown to be quickest.
The other thing is it is also 32 bit (i.e. 2Gb limit where you do not redirect stdout), but as has been pointed out, if you use it to read a pipe and merely append or redirect stdout using |, > or >> the 2Gb limit does not apply.
Alternatively you can modify the fopen() call in the code to be fopen64() on the output file. The worst thing about it is that is isn't compatible with compress/gzip or bzip2.
Here is the link:
http://forums1.itrc.hp.com/service/forums/parseCurl.do?CURL=%2Fcm%2FQuestionAnswer%2F1%2C%2C0x026250011d20d6118ff40090279cd0f9%2C00.html&admit=716493758+1101807043917+28353475
Make sure you test it though, it isn't commercial and comes with no warranty(!)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 04:38 AM
тАО11-30-2004 04:38 AM
Re: Any compression tool available which could use multiple CPU
I liked some of the comments posted here and will give points to those
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 04:47 AM
тАО11-30-2004 04:47 AM
Re: Any compression tool available which could use multiple CPU
Reliability is more important than speed.
Nice hat Fred.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 06:07 AM
тАО11-30-2004 06:07 AM
SolutionAnyway, over lunch I poked some more at a perl script to split a file and compress the parts, and it now works fine. (after I moved the open + seek from the parent to the children).
Here is a sample session for a 4GB file on an ia64 hp server rx7620 (8p)
# time perl split.pl xx.dat 6
6 x 699072512 byte chunks. 10667 x 65536 byte blocks. 4194305024 bytes
real 1:35.60, user 0.64, sys 29.37
# That's with over 75% cpu busy and gives:
# ls -l xx*
4194305024 Nov 29 21:05 xx.dat
163707548 Nov 30 10:41 xx.dat_1.gz
167387395 Nov 30 10:41 xx.dat_2.gz
163581093 Nov 30 10:41 xx.dat_3.gz
162035968 Nov 30 10:41 xx.dat_4.gz
159506304 Nov 30 10:41 xx.dat_5.gz
159981309 Nov 30 10:41 xx.dat_6.gz
#Put them back together with:
for i in xx.dat*gz
do
gunzip -c $i >> xx
done
real 1:07.8, user 50.0, sys 16.3
# ls -l xx
4194305024 Nov 30 10:47 xx
# doublecheck
# time diff xx xx.dat
real 2:13.2, user 1:32.8, sys 24.4
The script:
$|=1;
$file = shift @ARGV or die "Please provide file to split and # chunks";
open (FILE, "<$file") or die "Error opening $file";
close (FILE);
$chunks = shift @ARGV;
$chunks = 4 unless $chunks;
$chunks = 26 if $chunks > 26;
$total = -s $file;
die "puny file" unless ($total > 10000000);
# make last chunk the smallest
$block = 64*1024;
$blocks = 1 + int( $total / ($chunks * $block));
$chunk = $blocks * $block;
print "$chunks x $chunk byte chunks. $blocks x $block byte blocks. $total bytes\n";
$i = 0;
while ($i < $chunks) {
if ($pid=fork()) {
$i++;
} else {
open (FILE, "<$file") or die "Error opening $file in child $i";
binmode (FILE);
$pos = sysseek (FILE, $chunk * $i++, 0);
$name = "${file}_${i}.gz";
open (ZIP, "| gzip > $name") or die "-- zip error child $i file $name";
while ($blocks-- && $block) {
$block = sysread(FILE, $buffer, $block);
syswrite (ZIP, $buffer) if ($block);
}
exit 0;
}
}
$pid = wait() while ($pid > 0);
Enjoy!
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 06:25 AM
тАО11-30-2004 06:25 AM
Re: Any compression tool available which could use multiple CPU
live free or die
harry d brown jr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 08:21 AM
тАО11-30-2004 08:21 AM
Re: Any compression tool available which could use multiple CPU
My program is truly multi-threaded and does it all in one pass up the file, but you won't have the confidence of the simplicity that Hein's perl solution gives you. You would also have to edit some of my C code to get the correct level of compression and fopen64.
For a 35Gb file, you need something that does only one pass up the source file, or several small scans of the parts.
Hein I take my hat off to you, that little script is probably what I was looking for 2 years ago.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 11:02 AM
тАО11-30-2004 11:02 AM
Re: Any compression tool available which could use multiple CPU
It does. Each child starts and stops pretty much at the same time, the actual data contents defining the cpu time needed.
> you should get good performance and a high level of confidence out of it.
With a reasonable IO system I believe it to give near inverse linear improvement in elapsed time for the number of chunks selected, up to the number of available cpu's. For final perfomance tweaks you might want to toss an mpsched to the zip command, and force one per cpu.
> perl solution gives you. You would also have to edit some of my C code to get
I find that it actually looks more like a C program then a perl script :^)
> I take my hat off to you, that little script is probably what I was looking for 2 years ago.
And a pretty wizards hat at that. Thanks! :-)
Obviously the script is still pretty rough. Only initial error handling, remnants of shady pasts (that '26' was for the split hack in the first attempt) and so on, but it should be a fine starting point for someones specialized solution (different output selection, different zip params, automatically determine (free) cpu count,...)
by making the last chunk the smallest I could keep the loop control simple: just read a selected number of blocks or untill you coudl read no more (the last chunk).
Cheers,
Hein.