Online Expert Day - HPE Data Storage - Live Now
April 24/25 - Online Expert Day - HPE Data Storage - Live Now
Read more
System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

very poor performance, prealloc command

Rajeev jain
Advisor

very poor performance, prealloc command

I ran the prealloc command "$ time prealloc test $((1000*1024*1024))" to write a 1GB file on RAID-10 internal drives and on SAN drives (8GB cache + RAID5), the response time I received were very unsatisfactory. The local disks writes completed in 10 seconds and SAN disk in 30 seconds. I have a Sun server connected to the same RAID group which writes a 1GB file in 8 seconds. These systems have no aplication OR database running on them.

HP support has pretty much raised their hand as they couldn't find any errors.

I have a rx3600 with Hitachi AMS200 Storage with 2 X 2GB FCs.

I would highly appreciate if someone could post similar results from their environment and post their hardware config to the level of detail I have listed above.

If anyone has experienced a similar issue and know of any suggestion would be appreciated as well but I am really interested in knowing the result of prealloc command.

Thanks

24 REPLIES
Bill Hassell
Honored Contributor

Re: very poor performance, prealloc command

prealloc is probably the slowest disk writing program I have ever seen. There is nothing to fix -- that's the way it works (or crawls). It proves the concept that asking HP-UX to run faster doesn't solve badly written code. Use dd and /dev/zero like this:

writing:
timex dd if=/dev/zero of=/var/tmp/test bs=1024k count=1000

reading:
timex dd if=/dev/rdsk/cZZtYYdXX of=/dev/null bs=1024k count=1000

Note that dd is by far the fastest method to read or write (as long as you override the default 512byte block size) but is a lousy test for performance, especially for smart disks (RAID, arrays, virtualized storage) as dd is single threaded. Only one CPU can run the code and only one channel will get through to the disk. And of course, large cache sizes in arrays will make the measurement unstable, that is, the first run will be much longer than subsequent runs.

A much better test is to run 10 or 20 copies, or run the xdd freeware program to generate multiple tasks.


Bill Hassell, sysadmin
Rajeev jain
Advisor

Re: very poor performance, prealloc command

prealloc was my first test. I ran $time cp test test1", whereas test=1GB.

On HP it takes little over 2 mins and on sun about 15 seconds.

I ran dd which is shows poor performance compare to prealloc.

It would be very helpful if you could run this in your environment so I have something to compare.

NESTER:root(/vm/guest/kalimdor)# timex dd if=/dev/zero of=/vm/guest/kalimdor/test bs=1024k count=1000
1000+0 records in
1000+0 records out

real 45.75
user 0.00
sys 0.16
Hein van den Heuvel
Honored Contributor

Re: very poor performance, prealloc command


prealloc is a utility command and it may, or might not have been implemented as a high-performance command. It stated goal is NOT to write fast, but to create a file optimized for fast sequential reads and writes. It is probaly using SYNC commands to garantuee the IOs made it out the the storage and the storage actually allocated disk chunks fro 'smart' controllers like and EVA which only promiss space, but postpone allocation.

For prealloc only the end goal counts, not the path!

I suspect you are using prealloc as a method to evaluate the storage / filesystem performance potential. Correct?
As you may have discovered this is a treacherous method. To only proper way to measure performance it under actual load. Anything else may or might not hit or avoid good or bad attributes.

Surely it does not matter how fast you 'dd' or prealloc or tar or xyz goes.. unless that's all your application is doing.

Now I'll admit that the behavior would concern me also, but I'd be more inclined to look for explanations and alternatives and label it 'poor performance'.

Things I would check
- comparative DD results with if=/dev/zero for : bs=8k count = 128000 and for: bs=1024k count=1000... but 1GB is not enough!
- compare with RAW IO
- scsi queue_depth settings
- file system fragmentation ( only test on a clean file-system )
- LVM settings... just a single PV I hope?

On the SUN side...

When the pre-allocate returns, is all the IO actually done? (sync).

For example when i use a simple 'time dd' to write 1GB, that finished in 3 seconds. That is, the comamdn returns. But the actual IO still has to start! Looking with glance, the 'u' page. I can see the Io kick in several seconds later... for a minute long (slow single drive).

>> knowing the result of prealloc command.

On a single, clean U160 drive on my RX2600
# time prealloc /blah/test.tmp "$((1000*1024*1024))"
real 1:04.5

And GLANCE shows the same IO all along while busy, which drops to null when done, Unlike the prior DD experiment.


Hope this helps some,
Hein van den Heuvel ( at gmail dot com )
HvdH Performance Consulting.


Rajeev jain
Advisor

Re: very poor performance, prealloc command

Did you run dd and prealloc in the same directory?
Hein van den Heuvel
Honored Contributor

Re: very poor performance, prealloc command

Hmm, I don't understand that question.
Directory is utterly irrelevant.

But yes i did run them on the same file system, which is what matters.

And I did pre-delete the file before re-running

Also, I ofcourse had 4GB of filesystem cache.

Did you check the Sun cache and HP cache settings?

When I trimmed the filecache down to min=200MB, max=250MB, than the very same dd on the very same directory ( :-) ) took 40 seconds, with almost no IO after the dd command returned.

I'm sure you can figure out your lesson from there.

Hein.







Dennis Handly
Acclaimed Contributor

Re: very poor performance, prealloc command

>Hein: prealloc is a utility command and it may, or might not have been implemented as a high-performance command.

It calls prealloc(2) and that writes 8Kb (the filesystem blocksize), chunks, then fsync.

>It is probably using SYNC commands

Yes, one fsync(2) at the end.
Steven E. Protter
Exalted Contributor

Re: very poor performance, prealloc command

Shalom,

Maybe an OS patch will help with malloc.

Memory leak detector:
http://www.hpux.ws/?p=8

Performance monitor scripts
http://www.hpux.ws/?p=6

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Laurent Menase
Honored Contributor

Re: very poor performance, prealloc command

prealloc is better than dd on the fact it writes 8kb blocks from kernel, without needing to copy data to user level. It is a syscall.


You probably will need to look at the queues length of your FC, the number of path to the lun.

But you should pursue with support and ask them to elevate.

Hein van den Heuvel
Honored Contributor

Re: very poor performance, prealloc command

>> But you should pursue with support and ask them to elevate.

WHY?
There is nothing broken except the end user expectation. Hire a consultant yes, but support no.
There is a misguided belief that a simple system tool can provide useful performance information accross vendors without understanding of all the parameters involved. Specifically the size of the file system cache was not mentioned, yet is critical for dd experiments.


just a thought... how does the filesystem cache gets flushed left on its own, without fsync instructions? Will strictly write out in order of arrival, sweep low to high, or take a random approach. If it is not ordered, then a single fsync at the end is not good enough to guarantee the intended effect of prealloc. Storage subsystem may end up allocating storge segments for the file out of order.

Hein.


James R. Ferguson
Acclaimed Contributor

Re: very poor performance, prealloc command

Hi:

> Steven: Maybe an OS patch will help with malloc.

This is isn't 'malloc' [which allocates memory] but _prealloc_ which allocates DISK.

...JRF...
Laurent Menase
Honored Contributor

Re: very poor performance, prealloc command

Indeed Hein you are right, they ask a perf consultation to HP support or services, but not for support.

Dennis Handly
Acclaimed Contributor

Re: very poor performance, prealloc command

>Laurent: prealloc is better than dd on the fact it writes 8kb blocks from kernel, without needing to copy data to user level. It is a syscall.

Where do you get this strange idea? prealloc(2) isn't a syscall. As tusc shows, it calls write(2) and fsync(2).
Laurent Menase
Honored Contributor

Re: very poor performance, prealloc command

in fact my mistake comes from the fact it is defined as a syscall prealloc(2) --> 2 means syscalls
and prealloc vnode operation exist.



So I may have made a bad asumption that prealloc(1) was using prealloc(2) syscall.

but indeed libc prealloc is using write/fsync/fseek

Dennis Handly
Acclaimed Contributor

Re: very poor performance, prealloc command

>Laurent: 2 means syscalls

Yes, intro(2) seems to wave its hands. But prealloc(2) is a libc function.
Rajeev jain
Advisor

Re: very poor performance, prealloc command

Thank you all for the input. To move away from the difference between the command used between Sun and HP (prealloc and mkfile), I used cp command this time.

What I wanted to see were the results from your environment.

There is a case sitting with Support and Performance Engineering.

After running a simple cp command within the same file system, HP returns in 2min 6 seconds and Sun in 15 seconds.

There is no app OR database on the servers, these are brand new servers connected to a brand new AMS200 for testing. Both Sun and HP servers have 32GB memory and 4 CPUs.
The disks/disk group/volume and filesystem is configured exactly the same.

The command used is $timex cp test test1
Where test=1G file.
Rajeev jain
Advisor

Re: very poor performance, prealloc command

I ran prealloc and mkfile on local root mirrored disks. HP and Sun both takes between 10-12 seconds.

cp takes about 14 seconds on each system.
Hein van den Heuvel
Honored Contributor

Re: very poor performance, prealloc command

Maybe the test on sun had the input still cached, and hp not yet cached? The test file is _likely_ to have been cached as part of its creation, but you never know...

Anyway, your new test now measures the time it takes to read a file, as well as write (a copy), as well as measuring the ability of the system to cache that input file and retain it. The test has no control over where the cache stops and the IO starts and without any indication how much time is spend in the reading versus the writting.

Seems to me you took a step further away from reality and relevance. Unless of course the purpose of the system is to copy 1 gb files around. In that case you have constructed a perfect test, and would need to move on to the next phase: understanding what is wrong and fix it.

Did you check the file system cache min/max?
If you re-read my earlier reply it proved to potentially cause a 10x speed difference.

Due to past experiences, which are no longer valid and never were very valid, a good few HP-UX administrator like to severely cripple their systems by setting those too low. They'd pick values in the 2% - 5% of memory range, notably for systems which are expected to run Oracle.

Admittedly, that should still be enough to suck in a 1 GB file when 32 GB memory is present. But NOT when you also have to read in the input file. Check it (kctune)

Suggestions.

* Measure the reading part ... and load in memory.

- umount /test # flush cache
- mount /test
- time cp /test/file /dev/null # initial read
- time cp /test/file /dev/null # from cache?!

* Measure the writing part

- prealloc ?
- from /dev/zero with dd commands outlined earlier.
- from a MEMORY file system

* Now measure the cp. twice.

( Dennis, did truss show alternating writes and fsyncs, or one fsync at the end? And how large were the IOs? I know... Ishould just try myself. But if you have it handly ( sorry :-), then please share here or in an Email ).


Hope this helps,
Hein van den Heuvel
HvdH Performance Consulting


Rajeev jain
Advisor

Re: very poor performance, prealloc command

Hein,
Your point would be valid if I see r/wr time same on local drives and SAN drives. They are 3 times apart. I dont think the file cache within the OS distinguish between local drives and SAN drives, right?

What make me believe here is that either the paths to the disk are saturated which could be a driver issue OR an issue the cards itself.

Rajeev
Hein van den Heuvel
Honored Contributor

Re: very poor performance, prealloc command

Ah! Good work.
I missed a step in reading your details.
Thanks for making it explicit.

>> Your point would be valid if I see r/wr time same on local drives and SAN drives. They are 3 times apart. I dont think the file cache within the OS distinguish between local drives and SAN drives, right?

Absolutely Correct.
The cache does not discriminate
(except possibly for RAM disks, and explicitly selected mount options: http://g4u0419c.houston.hp.com/en/B3921-90010/mount_vxfs.1M.html )

>> What make me believe here is that either the paths to the disk are saturated which could be a driver issue OR an issue the cards itself.

I would have to agree and would lean back towards something, potentially hardware, being broken or mis configured.

How about the hp-ux scsi queue_depth?

Any counters on a SAN switch that might help you?
Any counters on the storage controlled that might help you? Notably queue depth?

Any error/retry accumulators growing?

Hein.

Rajeev jain
Advisor

Re: very poor performance, prealloc command

q_depth is set to 255 with no help. I will check the parameters on Storage.
Rajeev jain
Advisor

Re: very poor performance, prealloc command

I would highly appreciate if anyone could post me the result from two commands below, one from local drives and SAN drives. Please list your hardware model at the high level.

$ timex prealloc test $((1000*1024*1024))
$ timex cp test test1

This will help me a ton. I am leaning towards hardware issue as well but having tough time getting through.


Patrick Wallek
Honored Contributor

Re: very poor performance, prealloc command

Here you go:

Hardware -- HP9000 rp5470 - HP-UX 11.11
Local Drives:
[ root@hquwh01:/home/root ]
# timex prealloc testxxx $((1000*1024*1024))

real 1:40.34
user 0.13
sys 6.56

[ root@hquwh01:/home/root ]
# timex cp testxxx testxxx1

real 1:42.50
user 0.04
sys 12.60

[ root@hquwh01:/home/root ]
# ll testxxx testxxx1
-rw------- 1 root sys 1048576000 May 29 15:25 testxxx
-rw------- 1 root sys 1048576000 May 29 15:27 testxxx1
[ root@hquwh01:/home/root ]
#


SAN Drives (VA7400 - 1Gb/s Fibre):
[ root@hquwh01:/sync ]
# timex prealloc test $((1000*1024*1024))

real 31.88
user 0.12
sys 7.99

[ root@hquwh01:/sync ]
# timex cp test test1

real 53.71
user 0.04
sys 13.91

[ root@hquwh01:/sync ]
# ll test test1
-rw------- 1 root sys 1048576000 May 29 15:32 test
-rw------- 1 root sys 1048576000 May 29 15:33 test1
[ root@hquwh01:/sync ]
#
Patrick Wallek
Honored Contributor

Re: very poor performance, prealloc command

And here's from another, slightly newer, system:

HP9000 rp4440 - HP-UX 11.11:

Local disks:
[ root@tnudb05:/usr ]
# timex prealloc test $((1000*1024*1024))

real 1:21.00
user 0.07
sys 2.78

[ root@tnudb05:/usr ]
# timex cp test test1

real 1:35.12
user 0.02
sys 5.52

[ root@tnudb05:/usr ]
# ll test test1
-rw-r--r-- 1 root sys 1048576000 May 29 15:35 test
-rw-r--r-- 1 root sys 1048576000 May 29 15:38 test1
[ root@tnudb05:/usr ]
#

SAN Disks (VA7410 - 2 Gb/s fibre):
[ root@tnudb05:/dump ]
# timex prealloc test2 $((1000*1024*1024))

real 12.06
user 0.07
sys 3.35

[ root@tnudb05:/dump ]
# timex cp test2 test3

real 25.05
user 0.02
sys 5.42

[ root@tnudb05:/dump ]
# ll test2 test3
-rw-r--r-- 1 root sys 1048576000 May 29 15:41 test2
-rw-r--r-- 1 root sys 1048576000 May 29 15:42 test3
[ root@tnudb05:/dump ]
#

Dennis Handly
Acclaimed Contributor

Re: very poor performance, prealloc command

>Hein: did truss show alternating writes and fsyncs, or one fsync at the end?

In my first reply I said, at the end.

>And how large were the IOs?

I also said: writes 8Kb (the filesystem blocksize)

And that was my blocksize.