1824808 Members
4185 Online
109674 Solutions
New Discussion юеВ

Storage performance

 
Ivan Ferreira
Honored Contributor

Storage performance

Hi all.

I'm having a problem with storage performance. HP-UX:

uname -a
HP-UX B.11.31 U ia64 0070246111 unlimited-user license

This is a new two node Oracle Cluster. Just started the configuration of disk devices for OCR and Voting disk.

Both nodes can access the same disks. The disks are located on HP EVA4100 Storages. Both nodes have two HBAs.

Testing performance, one node has no problems, but the second node has terrible performance. For example:

Node 1:

time dd if=/dev/rdisk/ocr of=/dev/null bs=8k count=131072
131072+0 records in
131072+0 records out

real 1m13.79s
user 0m0.09s
sys 0m1.07s

sar output:

device %busy avque r+w/s blks/s avwait avserv

disk8 98.80 0.50 1697 27149 0.00 0.58
disk8 98.40 0.50 1787 28596 0.00 0.55
disk8 91.20 0.50 1296 20742 0.00 0.70
disk8 100.00 0.50 485 7754 0.00 2.19

This node is "normal" if you can say it.

Node 2:

time dd if=/dev/rdisk/ocr of=/dev/null bs=8k count=131072



Sar output:

11:22:42 device %busy avque r+w/s blks/s avwait avserv
disk8 80.00 0.50 2 32 0.00 400.96
disk8 100.00 0.50 2 32 0.00 500.50
disk8 99.80 0.50 2 32 0.00 500.00

As you can see, the %busy is 100% and the service time extreamely high.

I don't know what else to check. Node 1 works correctly, so should not be a storage problem. I have tried with only one HBA enabled on node 2, same results. I have tried a a new non-shared LUN, same results. The HBA seems to be in the correct bus at the correct speed.

Any help would be apreciated.

Hardware information:

Node 1:

Model: ia64 hp server rx6600
Main Memory: 16352 MB
Processors: 8
OS mode: 64 bit

/opt/fcms/bin/fcmsutil /dev/fcd0

Vendor ID is = 0x1077
Device ID is = 0x2422
PCI Sub-system Vendor ID is = 0x103C
PCI Sub-system ID is = 0x12D6
PCI Mode = PCI-X 266 MHz
ISP Code version = 4.0.90
ISP Chip version = 3
Topology = PTTOPT_FABRIC
Link Speed = 2Gb
Local N_Port_id is = 0x010300
Previous N_Port_id is = None
N_Port Node World Wide Name = 0x5001438001724859
N_Port Port World Wide Name = 0x5001438001724858
Switch Port World Wide Name = 0x200300051e35a5de
Switch Node World Wide Name = 0x100000051e35a5de
Driver state = ONLINE
Hardware Path is = 0/3/1/0
Maximum Frame Size = 2048
Driver-Firmware Dump Available = NO
Driver-Firmware Dump Timestamp = N/A
Driver Version = @(#) fcd B.11.31.0709 Jun 11 2007


Node 2:

System Hardware

Model: ia64 hp server rx2660
Main Memory: 16363 MB
Processors: 4
OS mode: 64 bit

/opt/fcms/bin/fcmsutil /dev/fcd0

Vendor ID is = 0x1077
Device ID is = 0x2422
PCI Sub-system Vendor ID is = 0x103C
PCI Sub-system ID is = 0x12D6
PCI Mode = PCI-X 266 MHz
ISP Code version = 4.0.90
ISP Chip version = 3
Topology = PTTOPT_FABRIC
Link Speed = 2Gb
Local N_Port_id is = 0x010400
Previous N_Port_id is = None
N_Port Node World Wide Name = 0x5001438001724791
N_Port Port World Wide Name = 0x5001438001724790
Switch Port World Wide Name = 0x200400051e35a5de
Switch Node World Wide Name = 0x100000051e35a5de
Driver state = ONLINE
Hardware Path is = 0/2/1/0
Maximum Frame Size = 2048
Driver-Firmware Dump Available = NO
Driver-Firmware Dump Timestamp = N/A
Driver Version = @(#) fcd B.11.31.0709 Jun 11 2007

Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: Storage performance

Shalom,

Ivan, can you give details on the Oracle Cluster, major version and patch.

You may be looking at storage, however it could be caused by the lack of an Oracle patch or the need for a newly minted OS patch that Oracle now requires.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Torsten.
Acclaimed Contributor

Re: Storage performance

SAN problems are always hard to find - first I would check what is different in system configuration (2GB vs. 4 GB speed settings, driver etc., load balancing policy, ...). I would also suspect connection problems - check the port error statistics on the switch.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Steven E. Protter
Exalted Contributor

Re: Storage performance

Shalom again,

I agree with Torsten.

Every silly little detail plus SAN storage utilities should be checked.

What kind of SAN is it? EVA? EMC?

I've experience with those two brands.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: Storage performance

Ivan,

Some things to look at:

1. If you don't already have a copy of evainfo, get one and compare output between nodes for the LUN:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?swItem=co-53627-1тМй=en&cc=us&idx=0&mode=4&

2. As /dev/disk/ocr is a non-standard disk name I'm assuming you've created it either as a symbolic link to a real disk or used mknod to create a new device file with the same major/minor details as a real disk. Either way, have you repeated the test on the real disk? Have you checked that they were created/linked the same?

3. Looking at the "real" device special file, do you get the same output on both nodes for:

scsimgr get_info -D /dev/rdisk/diskX

4. Try clearing the stats for the disk and then repeating your test:

scsimgr clear_stat -D /dev/rdisk/diskX
time dd if=/dev/rdisk/diskX of=/dev/null bs=8k count=131072
scsimgr get_stat -D /dev/rdisk/diskX

Any significant differences?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Ivan Ferreira
Honored Contributor

Re: Storage performance

Thank you all for your time.

Steven
======
>>> Can you give details on the Oracle Cluster, major version and patch

At this point, the oracle it not even installed, as the test where done with dd.

>>> What kind of SAN is it? EVA? EMC?

As you can see above, is an EVA4100 storage.

Torsten
=======
>> >first I would check what is different in system configuration (2GB vs. 4 GB speed settings, driver etc., load balancing policy, ...). I would also suspect connection problems - check the port error statistics on the switch.

As you can see in the fcmsutil output, 2 GB is the speed of the node 1 and node 2 HBA. And the service time is too high even for a 1 GB HBA.

The ports at the switch reports no problem, and already tested with different load balancing policies, and with only one HBA.

Duncan
======

The disk name was created with mknod because Oracle had problems to identify the device names for the OCR and VDISK the device has more than 4 characters. The performance tests where done over the "original devices", and same results.

I do get the same output for scsimgr get_info.

Not sure if clearing statistics will help, but I will give a try when I go to the customer site again.

I did not know about evainfo, I will give a try.

Thanks to all!
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Storage performance

Ivan

I didn't mean that clearing the devices statistics would fix the issue, but that it would give you a clean starting point to compare the data from a "scsimgr get_stat" command between the 2 nodes after repeating your test(e.g. to see if one system has more IO retries or maybe has LUN path offlines)

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Ivan Ferreira
Honored Contributor

Re: Storage performance

There was a hardware problem with one of the HBAs. Once replaced, the performance problem was solved.

Thanks to all.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?