Extremely poor disk throughput

Tony Horton · ‎11-27-2003

Hi,

I'm setting up a new L1500 box to replace an L1000. The L1500 is dual 875Mhz the L1000 Dual 440Mhz.

I'm doing a progress bulkload of a 4GB database and it is taking roughly 8 times longer than on the L1000. the disks on the L1500 are newer and faster than the L1000.

The only differences (that I am aware of) are.

L1000 11.00 L1500 11i, L1000 4MB extent size in vg, L1500 8MB extent size, L1000 V3 VxFS L1500 V4 VxFS. Mount options are the same Kernel parameters are the same. Buf cache is set at 300MB on both machines.

It appears from sar output (attached) that the buffer cache is pretty much not being used (even though it should be with the mount options). also nothing seems to be maxing out. I'm confused!!! Should I go and re-read the 11i release notes? This machine should be a lot faster than the old one.

The database lvol is mirrored accross two internal 18GB disks. (it's on two internal 9GB disks on the L1000)

Any suggestions greatly appreciated, I don't want a database dump and load to go from 5 hours to 40 hours!!!

Regards,

Tony.

No man is an isthmus

Tony Horton · ‎11-27-2003

Oh and one more thing. Checking in glance shows no logical writes only physical writes so I guess no buffering is taking place.

Regards,

Tony.

No man is an isthmus

ketan_5 · ‎11-27-2003

Tony ,
1)There may be problem with new HDD. This possibiity can not be ruled out. Pls check it with dd.
2)Also check argument -o delaylog while mounting volume.
3) Any significant change in RAM/SWAP ?

Tony Horton · ‎11-27-2003

Its kinda weird, the first step in the load process formats the database extents it is achieving 11 to 13MB/s and is using the buffer cache.

The L1500 has 4GB ram, the L1000 3GB at the moment I only have 4GB primary swap on the L1500 but there is nothing else running and mem utilization is only 27%.

If I do a dd if=/dev/dsk/c1t2d0 of=/dev/null bs=4096k I get about 19MB/s transfer rate, if I do the same on the other disk it only gets 9MB/s If I use the raw device file both get 27MB/s........

I dd'ed the /usr lvol to a file on the offending filesystem and it's getting between 5-8MB/s throughput.

I might try splitting the mirror and trying on each of the two disks individually.

the faster of the 18GB drives is a seagate ST318203LC FW HP04 the other one is an IBM DMVS18D FW HP05

The progress is loaded off the same media, they provide the same for 11.00 and 11i

Regards,

Tony.

No man is an isthmus

Jean-Louis Phelix · ‎11-27-2003

Tony,

I don't fully agree with you when you say that nothing seems to be maxing out :^) ...

- %busy is quite high and buffer cache hit read is quite low for an ORACLE DB
- I recently add a performance problem dealt with ORACLE and they are now beginning to accept higher buffer cache sizes. So you could first try to increase it
- depending on the average io sizes and type of requests you are doing, 8M PE could be a little too big ... And I don't see enough disks to need it
- vxfs 3.3 4 has made some changes about new default values for some parameters. For example, discovered_direct_iosz which wasn't available as a tuning parameter in 11.00 had a default value of 128k and has now a default value of 256k. See 'man vxtunefs'. Depending on your db_multiblock_xxx and db block size, an IO of 128k used to be a direct IO and is now a buffered IO which could make much more data going trough the buffer cache.
- You could also find some docs about mount options on metalink

Best regards.

It works for me (© Bill McNAMARA ...)

Tony Horton · ‎11-27-2003

OK I have now tried V3 vxfs no change. each disk individually no change.

I went back and checked the dump and reload I did on the L1000 last Sunday, and it took 33 minutes to load the data. The L1500 took a bit over 8 hours to load the same data. There is something severly wrong here. I can't put 11.00 on the machine to compare because of the 875Mhz cpus. I don't know what to try next. I've also tried different block size on the filesystem 1K 4K 8K doesn't seem to make any noticeable difference (the L1000 has 4K blocks). You would think that something that makes the system 16 times slower would stick out, but I can't see it :(

Regards,

Tony.

No man is an isthmus

Tony Horton · ‎11-27-2003

Hi,

It's actually a progress db, but yes the buffer cache is sitting on 0% hits, I don't think it is using it at all, as glance shows 0 logical FS writes. The first build I tried (the one that took over 8 hours) was with the max buffers set at 50% (ie 2GB) so I don't think it's the size of the buffer cache.

The disk is showing about 97% utilisation but is only doing 1.3MB/second and 167 I/O per second. The first step of the process which is to format the 2 X 2GB db extents takes roughly the same time on both the L1000 and the L1500 (about 3 minutes for each 2GB extent).

About the only thing I can think of is that under 11.00 the bulkload utility is using the OS buffer cache and under 11i it isn't. Even then I would have thought I should get better than 1.3MB/sec out of a relatively modern scsi drive which is basically doing sequential writes!!!

I think I should go home and see whether anything pops into mind over the weekend :)

I tried vxtunefs on the 11.00 box but it is an unknown command, I do vaugly remember playing around with filesystem paramters when I set up the L1000, but didn't find anything that made a significant difference.

I'll try setting the PE back to 4MB (there are only two disks in this VG anyway) But I'm going home now, its nearly 8:00PM!

Regards,

Tony.

No man is an isthmus

Bruno Ganino · ‎11-27-2003

Tony, try see the use of the memory with "vmstat".
The column "po" with value not equal 0 the system is in pagination, if it is constantly in pagination is probably a problem of memory-RAM.
P.S.
For display more simple use "vmstat -n".
Bye
Bruno

Torino (Turin) +2H

Tony Horton · ‎11-28-2003

Hi Bruno,

The machine is only hitting around 27% memory utilisation, no page outs at all.

I did a complete build of the database overnight and compared results to the exact same load on the L1000 last Sunday.

L1500 Bulkload 5 hours 54 minutes, index build 4 hours 48 minutes. (faster than 1st load, probably due to no mirror)

L1000 Bulkload 33 minutes, index build 2 hours 44 minutes.

I'm currently doing a database build on a faster (Quantum atlas 10K IV 73GB) disk. The formating of the database extents took 40 seconds each compared to 3 minutes, but the bulkload is still taking forever (although it does appear to be managing a whopping 1.6MB/s compared to 1.3MB/sec on the other disk).

I also ran a database build on the L1000 (test data but still comparible) to see what was happening I/O wise. On the L1000 there are no Logical IO's either, but interestinly the cpu utilisation is about 60% with 270 IO's/sec and on the L1500 with the faster disk its doing 3.6% CPU utilization with about 188 IO's/sec.

The cpu difference is quite odd, I wouldn't have expected a 15 X drop in cpu useage between a 440 and an 875.......

The really weird thing is that the load on the L1000 didn't seem to be doing much higher MB/s rates than the L1500 maybe 1/2 as much again, yet it is significantly faster.

The only other difference I have remembered is that the L1000 has timeslice set at 4 and the L1500 is set at 10 (as per performance tuning white papers recommendation), but surely that wouldn't be having an effect, especially when the only process running on the system is the bulkload..... if anything it should make it better.

My next experiment will be to change the DB block size to 4KB (it isn't that easy with the way the scripts we use work), it defaults to 1KB (and is that on both machines). Maybe 11i is really ineficient with small write sizes compared to 11.0?????

Regards,

Tony.

No man is an isthmus

Bill Hassell · ‎11-28-2003

RE: timeslice=4, is bulkload truly a single process or does it have threads or other support processes running in parallel? AS every sysadmin has found, timeslice=1 is a performance disaster, the reason is that the opsystem spends massive amounts of time needlessly context switching. I would change timeslice=10 and see if both performance and % buffer cache usage goup.

Bill Hassell, sysadmin

Tony Horton · ‎11-28-2003

Hi Bill,

The timeslice=4 is on the "faster" (ie L1000) machine :) It's set to 10 on the machine that is having the performance problem. The bulk load is definitely a single process no threads or support processes.

I've been playing with vxtunefs as suggested by Jean_louis, and it is the most promising so far. Not there yet but definitely improving :)

The one paramter which seems to make a BIG difference is changing max_buf_data_size from 8K to 64K, disk throughput in glance improves from 1.3MB/Second to between 10 and 15MB/sec. I really don't know what this bulkload is doing as the total database size is 4GB roughly half of which is indexes, so at 10MB/s you'd think it would have loaded 2GB of data in a matter of minutes. Got me beat. Anyway I'll keep plugging away with the parameters, may try a combo of max_buf_data_size and reducing discovered_direct_iosize to 128K for my next test. changing discovered_direct_iosize by itself didn't seem to make any difference.

Anyone know how to find out the default settings of these vxfs params in 11.00 since it doesn't have vxtunefs? If I can find out all the defaults, and try them, I might be onto a winner :)

Thanks for the help so far guys, things are definitly looking better :)

Regards,

Tony.

No man is an isthmus

Tony Horton · ‎11-28-2003

Well I killed the last test at 4 hours, as I found something on progress' knowledge base. I had previously looked there to see if there were any required patches and they listed none for HPUX 11i. However when I searched on bulk load they suggested installing

Symptoms:
PHKL_28512:
( SR:8606271490 CR:JAGae35697 )
Posix Asynchronous I/O in JFS 3.3 uses Direct I/O
in place of Buffered I/O. Applications that rely
on data buffering may suffer performance degradation.

I've just installed this and the system is flying through the bulkload!!! horray!!!! Funny that HP hasn't included this in the June 2003 GoldQPK patch bundle. Haven't got the final timings yet but I know it's going to be a lot faster, I think its allready past the point that it took four hours to get to, and it's only been about 20mins.

It seems my suspicions about it not using the buffer cache were correct :)

Once it finishes I'll still have a bit of a play with the vxtunefs options, and I also discovered that my 11.0 filesystem has a logsize of 1024 blocks and my 11i one only 512 blocks so I'll experiment with that too.

Well there you go. Whilst I have been writing this the bulkload finished. It took 23 minutes, bit of an improvement on 8 hours, and better than the L1000's 33 minutes. I think I have a problem solved situation :)

Thanks for the suggestions guys.

Regards,

Tony.

No man is an isthmus

Tim D Fulford · ‎11-28-2003

Hi

1 - How are you loading/unoading the database, tape, network etc, are you SURE that this is not the bottleneck?

2 - It seems to me that c2t2d0 is MUCH slower & busier than its counterpart c1t2d0. -- My guess is that one disk is slowing the other down.

Regards

Tim

-

Tim D Fulford · ‎11-28-2003

Nope, I think I'm wrong, they both have similar performance characteristics

http://www.hgst.com/hdd/ultra/ul18lzx.htm

http://www.seagate.com/support/disc/specs/scsi/st318203lc.html

Tim

-

Bruno Ganino · ‎11-28-2003

I do not know if this is a good council, but look the dnlc_hash_locks and ncsize Tunable Kernel Parameter.
Perhaps they do not have no effect,
In documentation i read:

dnlc_hash_locks - number of locks for the Directory Name Lookup Cache

The minimum value allowed is 16. The maximum value allowed is 8192.
The value is further constrained in that it must be be a power of 2, and it must be equal to or less than one eighth the number of DNLC entries (ncsize >= 8 * dnlc_hash_locks).

ncsize - number of Directory Name Lookup Cache entries

The minimum value allowed is 128. The maximum value allowed is memory limited.
The value is further constrained in that it must be equal to or greater than eight times the value of the number of locks for the DNLC (ncsize >= 8 * dnlc_hash_lock).

HTH
Bruno

Torino (Turin) +2H

Tony Horton · ‎11-28-2003

Tim,

I had noticed that, at some stages in glance the seagate was showing roughly 97% utilisation and the IBM about 45% (this remained fairly constant).

However after loading the above patch everything is working MUCH better and it's faster than the L1000 so I'm quite happy :). I may still do some more tweaking to see if I can squeeze anymore performance out of the disks, but the show stopper problem is resolved.

Regards,

Tony.

No man is an isthmus

Tony Horton · ‎11-28-2003

Hi Bruno,

I can't find any more info on DNLC_HASH_LOCKS, so I don't think I'll mess with it (its at the default of 512) especially now that the problem is solved.

Thanks,

Tony.

No man is an isthmus

Tim D Fulford · ‎11-30-2003

Tony, just re-read some of the above..

> The disk is showing about 97% utilisation
> but is only doing 1.3MB/second and 167
> I/O per second.

167 IO/s is about 6ms service time.
1.3 MB/s with 167 IO/s is 8kB per IO

It looks like the head is doing a full seek for each block or IO written. The disks are about 8ms for a full seek & 0.5ms for trac-track seek! So I guess your patch fixed that.. if you got 15MB/s out of it

Regards

Tim

-

Tim D Fulford · ‎11-30-2003

Tony, try again... my previous reply was written in a rush & probably makes little sense.

What I mean is the patch you applied must have had the effect of allowing the head to remin on the current track. (I guess buffered IO's allow that & Dirext IO's dont!). Thus where as previously the head seemed to be dong something like a full seek, now it is just waiting for the platter to spin to the coreect segment(latency). So the latecncy of both disks is 3ms (10krpm), which means you should get 333 IO/s. In reality you will get less than this as the head will need to move track-to-track (0.5ms), so 300 would seem to be a conservative estimate.

300 IO/s at 8kB each ==> 2.3 MB/s
at 2.3 MB/s gives just under 30 minutes.

This does NOTHING to help you with your problem, but just shows that sometimes the manufactureres info sheets help out.

Regards

Tim

-

Tony Horton · ‎11-30-2003

Thanks Tim,

That seems to make quite a lot of sense, also seems to indicate that the figures shown in Glance for the current MB/s are somewhat off with the fairies :-)..... I couldn't work out why if glance was averaging between 5-10MB/s it was taking so long!!!!

Also leads me to beleive that progress' bulkload utility is somewhat less than optimal as far as IO size is concerned :-)

BTW The final result with 4KB blocksise on the filesystem and all other things at defaults was 2H 9Min for complete bulkload and index build, a somewhat massive improvement over 10 H 45M....... and a respectable improvement over the L1000.

Regards,

Tony.

No man is an isthmus

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Extremely poor disk throughput

Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput

Re: Extremely poor disk throughput