1833543 Members
2704 Online
110061 Solutions
New Discussion

Slow I/O performance

 
SOLVED
Go to solution
meno
Advisor

Slow I/O performance

Hello,

I have rp4440 box.
It has PCI RAID 6402 controller and
connected to two MSA30-SB enclosures filled with 27 146GB 15k disks.
I have created one logical drive with RAID 1+0 through 26 disks (1 disk is spare).
On it I have several standard VxFS filesystems.
Two week ago had failed one battery on RAID controller. After changing battery I/O had getting slower.
When I copy big file (oracle datafile) from one filesystem to another (both are on logical disk decribed above) transfer rate for this disk are about 50000 blks/s (sar) and busy 99-100%.
But I have another identical box and the same copy has rate about 170000-180000 blks/s.

What can be the problem?

Controller cache is OK 50% read/50% write.
glance on this cp process shows wait reason CACHE.

How can I find reason for this slow I/O?

Thanks.

Marian
18 REPLIES 18
OFC_EDM
Respected Contributor

Re: Slow I/O performance

How much memory do you have?
How much swap do you have?

What are the values of
dbc_max_pct dbc_min_pct

How much swap do you have?

Run glance (press m) and post the Memory report. It will show how much cache you have available.
The Devil is in the detail.
meno
Advisor

Re: Slow I/O performance

Hi,

I think, there is no problem with swap and memory. There is 24g memory and 20g swap,
there is no swap problem. All these sar values swpin/s bswin/s swpot/s bswot/s are 0.

$swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 20480 0 20480 0% 0 - 1 /dev/vg00/lvol2
reserve - 20479 -20479
memory 19670 12855 6815 65%
total 40150 33334 6816 83% - 0 -
OFC_EDM
Respected Contributor

Re: Slow I/O performance

Are the VG's/Lvols on the "identical" box the same as the one having issues? I'm thinking of PE size.

Any errors on the controllers?

Is this SAN connected or is the transfer going across network?
The Devil is in the detail.
Peter Nikitka
Honored Contributor

Re: Slow I/O performance

Hi,

knowing nothing about this special storage device, many other have the features
"write_cache": report the write operation as successfull, when it reached the cache
OR
"write_through": ignore the chache and wait for the data benn written to the real disk

The firmware of RAID often switches to write-through, when a single pass of failure is recognized, like
- no redundand power supply found
- battery low

The term used in your RAID my differ.
The effect of a switch to something like "write_through" results in a performance degration, NTHL.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Tim Nelson
Honored Contributor

Re: Slow I/O performance

The Controller cache is OK 50% read/50% write.
glance on this cp process shows wait reason CACHE.


glance WAIT of CACHE is filesystem CACHE, has nothing to do with the controller.

What is your filesystem buffer cache set to ?
11.11 kmtune|grep dbc
11.23 kctune|grep dbc

If it is set to the default then that is probably the issue. Reduce to 5% or less depending on type of environment and amount of memory.

meno
Advisor

Re: Slow I/O performance

I have
dbc_max_pct 4
dbc_min_pct 2

Can anybody help me,
where should I start to find root of this problem?

Marian
OFC_EDM
Respected Contributor

Re: Slow I/O performance

24GB of memory and 20GB of swap?
Is that correct?

I'll put this out there for discussion...

Isn't that too much swap?
I've never used more than 8GB of swap.

Could swap that size affect performance negatively?

The Devil is in the detail.
whiteknight
Honored Contributor

Re: Slow I/O performance


Marian,

From your swapinfo -tam, your %used is 83% quite high, may be you should consider add more swap space.

have your put in any performance patches before ?

WK
Problem never ends, you must know how to fix it
Torsten.
Acclaimed Contributor

Re: Slow I/O performance

Back to the smartarray - if we assume the device file is /dev/ciss1, can you post

# sautil /dev/ciss1
# saconfig /dev/ciss1

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Hein van den Heuvel
Honored Contributor

Re: Slow I/O performance

Marian wrote >> I have
dbc_max_pct 4
dbc_min_pct 2

On a box with 24GB of main memory is translates into almost 1GB of File system cache. That's probably a fine setting for to run Oracle production, but maybe not optimal for significant file copy jobs.

Is that the same on the identical box?
It probably is.

It is tempting to look for causes in the OS (settings), but the biggest clue is still that the MSA was bounced and brought back. So that's the most likely root problem cause area. There has to be a difference, somewhere.

Detailed study of the sa tools like Torsten suggest is probably the most productive step to find the difference, or at the very least remove an area of potential concern.

With the test as described, both input and output might be a factor. This can be possibly be divided into two chunks.
Test read performance with
dd if=some-large-file of=/dev/null bs=1024k
Test write performance with
dd if=/dec/zero of=some-temp-target bs=1024k count=20000

When the MSA was fixed, was HPUX rebooted?
Could the reboot have picked up a latent tuning change?

My first thought was scsi_max_qdepth, but that's dynamic. Still... is it the same (and larger than the default 8!) on both systems? But a command like scsictl could have been used to set queue_depth for a selected device and that change may have been lost over a reboot (ditto for any too using SIOC_SET_LUN_LIMITS ioctl options).


Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
Rasheed Tamton
Honored Contributor

Re: Slow I/O performance

Hello Marian,

It is better to add more swap space as suggested above. The processes have already reserved (20479) all your device swap size (20480) (reserve line on swapinfo).

If you have rebooted this particular system, is there any chance that your lancard settings (HD/FD) are somehow changed on the NIC or the switch. lanadmin -x would give you the info. Did you look clearyly the dmesg and syslog.log for any tips.

What is your sar -b (buffer activity - look for %rcache and %wcache output).

Regards,
Rasheed Tamton.
Dennis Handly
Acclaimed Contributor

Re: Slow I/O performance

>Rasheed: It is better to add more swap space as suggested above.

Hmm, I've never heard anyone say, add more swap to fix a performance problem. I've only heard adding more memory.

>O'Kevin: Isn't that too much swap? Could swap that size affect performance negatively?

Only if you were actually doing I/O to it.
meno
Advisor

Re: Slow I/O performance

Torsten,

I've posted output from sautil and saconfig.


Hein,

I tested read and write performance and both was very bad.
read - 40.000-45.000 blks/s
write - 35.000-40.000 blks/s

Second box was:
read - 350.000 blks/s
write - 150.000 blks/s

Box was after replacing bad battery rebooted.

I think problem will be with MSA controller.
What do you think?
Should I shutdown server and power off and then restart?

Thank you.

Marian
Peter Nikitka
Honored Contributor

Re: Slow I/O performance

Hi,

check these parameters on the 2nd box:
Disk write cache enabled at spin up............ no
...
Disk write cache enabled in current page....... no
Disk write cache disabled in default page...... yes

I would enable the disk write cache on this array - its UPS function will prevent, that there may be unwritten cache data on power interupts.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Torsten.
Acclaimed Contributor

Re: Slow I/O performance

Output from MSA1:

---- SCSI DEVICE 1:14 [DISK] -------------------------------------------------

Connector Location............................... external
Channel Number................................... 1
SCSI ID.......................................... 14
Device Type...................................... DISK
Disk Capacity.................................... 146.8 GB
Device Status.................................... OK
Device Vendor ID................................. COMPAQ
Device Product ID................................ BF14689BC5
Device Serial Number............................. DN01P7807U5S0733
Device Firmware Version.......................... HPB1
SCSI Transfer Rate............................... Ultra-320


Output from MSA2:

---- SCSI DEVICE 2:0 [DISK] --------------------------------------------------

Connector Location............................... external
Channel Number................................... 2
SCSI ID.......................................... 0
Device Type...................................... DISK
Disk Capacity.................................... 146.8 GB
Device Status.................................... OK
Device Vendor ID................................. COMPAQ
Device Product ID................................ BF14689BC5
Device Serial Number............................. DN01P7807U660733
Device Firmware Version.......................... HPB1
SCSI Transfer Rate............................... Sync


Rate Ultra320 vs. sync ?


Maybe the controller log will help

sautil with get_disk_err_log and get_fw_err_log option can help.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Torsten.
Acclaimed Contributor
Solution

Re: Slow I/O performance

Some more lines to compare:

MSA1:

Physical Disk Flags:
Disk present and operational................... yes
Non-disk device detected....................... no
Wide SCSI transfer enabled..................... yes
Synchronous (Fast/Ultra) transfer enabled...... yes
Narrow disk tray detected...................... no
Wide transfer failed, reverted to narrow....... no
Ultra SCSI transfer enabled.................... yes
Ultra-2 SCSI transfer enabled.................. yes


MSA2:

Physical Disk Flags:
Disk present and operational................... yes
Non-disk device detected....................... no
Wide SCSI transfer enabled..................... yes
Synchronous (Fast/Ultra) transfer enabled...... yes
Narrow disk tray detected...................... no
Wide transfer failed, reverted to narrow....... no
Ultra SCSI transfer enabled.................... no
Ultra-2 SCSI transfer enabled.................. no



So this looks like the second channel is operating slower than the first - this would explain your performance problems.

Not sure what is causing this, but maybe a controller reset could solve this.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
meno
Advisor

Re: Slow I/O performance

Torsten,

thank you.

Controller reset has solved this issue.

Marian
Torsten.
Acclaimed Contributor

Re: Slow I/O performance

Good news.

Have fun!

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!