Operating System - HP-UX
1752270 Members
4524 Online
108786 Solutions
New Discussion юеВ

Re: scsi queue depth and hpux

 
SOLVED
Go to solution
Hein van den Heuvel
Honored Contributor
Solution

Re: scsi queue depth and hpux



>> But I hate it when it timeouts on you after you have typed a loooong never ending response to the posts . And have to type the whole thing again out of memory.

1) Before hitting enter... ^A, ^C to "select all" and "copy".
2) The "back" button tends to bring the typed text back as well
3) Always clone a second window to check on a long delay. Oft the reply is in fact there.
4) Best advise, but one I fail to honor myself most time, it to type into an alternative tool, Gmail, Word,... with spellchecker, and occasionally paste into the reply window as backup and for the final submit.

Hein.
Vadsys
New Member

Re: scsi queue depth and hpux

Thanks a lot for all the comments made to my OP.

We went through a lot of changes since I first knocked on ITRC's doors.

There was not "one" root problem that was causing the issue we had, rather several issue that were overlooked or otherwise not addressed.

We completely segregated the storage for this database on XPSAN and all other services.

Problems that accrued and caused the issue
1. Originally storage was shared with windows,citrix and everything else.
2. RAID-5 on XPSAN
3. Single 2GB HBA for the server
4. Port contention at the XPSAN level (3 databases sharing 2 ports, one was always 88% utilized)
5. No load balancing on hpux, just a fail over for the the LUNs added.

First off, we took the pain, and I mean real pain, to allocate disk space for our DB in question on the current XP SAN, move all the data from those disks around and format that newly allocated space RAID 0+1.

Secondly, we isolated the sharing of ports on the SAN fabric, so we were going on one port for the other databases.

These steps helped us improve our performance by about 70% over the previous time!

Steps that should have been done right in the first place, imho, but never too late to be done at any time.

We still saw some contention at the HBA level, however, and there are plans to get another downtime to add more HBA(s) to the system. There is also some kernel level tuning that should (and will be, eventually) done to get more bang for the buck, specifically related to vxfsd, biod etc..you get the gist.

BTW, I cannot login with my old handle/account, so had to create a new one and am afraid I may not be able to close the thread.

Thank you for taking interest in my problem and providing valuable inputs!

Regards,
-S
Jeff N. Graham
New Member

Re: scsi queue depth and hpux

Scary. I'm having a very similar situation. We are running:
HP9000 rp7420 (8 PA RISC, 16Gb Ram, 2 seperate A6826A HBA (4 HBAs total)
HP-UX 11v1
Oracle 9i RAC (2 nodes/same hardware)
IBM SAN (8x8 array of disks)

We have ETL process that must update 5 million rows into a 1.5 billion row table(188GB) - e.g. lots of I/O. ETL is well tuned and table is partitioned fairly well. There is always room for improvement, but statspack shows our SQL to be efficient, but by far we spend all of our time of db sequential read (88% of time - hardly any time on anything else).
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file sequential read 3,092,363 0 28,319 9 4,507.8

Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
DATA_PRICING_02
555,304 182 9.2 1.0 286,131 94 0 0.0
DATA_PRICING_01
483,639 159 8.9 1.0 268,175 88 1 0.0
DATA_PRICING_03
405,730 133 8.7 1.0 282,873 93 56,048 12.2
DATA_PRICING_04
306,559 101 8.7 1.0 141,780 47 2 0.0
INDEX_PRICING_02
433,181 142 10.4 1.0 51 0 0 0.0
INDEX_PRICING_01
409,071 134 10.2 1.0 367 0 1 180.0
INDEX_PRICING_04
285,588 94 10.0 1.0 27 0 88,147 6.9
INDEX_PRICING_03
176,245 58 10.1 1.0 1,009 0 1 0.0


We originally only had 1HBA per node connected. We saw at least a 50% increase when we connected fiber to 2nd HBA cards and installed multipath (not to mention much needed high-availability!)

We are still having issues and processes are running behind. Storage team shows that disks are peaking around 30-40% and switches are scratching just 8% utilization.

Included is usual SAR/VMSTAT/GLANCE metrics

Highlights:
sar -Muq 5 50

HP-UX dubhst05 B.11.11 U 9000/800 09/09/08

16:02:54 cpu %usr %sys %wio %idle
cpu runq-sz %runocc swpq-sz %swpocc
16:04:19 0 17 10 72 2
1 12 6 77 5
2 12 6 80 2
3 14 4 79 3
4 11 7 69 12
5 13 8 77 2
6 10 8 76 6
7 11 8 79 2
system 12 7 76 4
0 0.0 0
1 0.0 0
2 0.0 0
3 0.0 0
4 1.0 20
5 1.0 20
6 0.0 0
7 1.0 20
system 1.0 8 0.0 0


sar -b 5 50

HP-UX dubhst05 B.11.11 U 9000/800 09/09/08

16:02:54 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
16:03:44 0 23 100 3 14 81 1 1
16:03:49 0 978 100 3 65 95 3 4
16:03:54 0 362 100 2 26 94 3174 3
16:03:59 0 14 100 2 10 81 5015 6
16:04:04 0 48 100 2 19 90 5143 812
16:04:09 0 170 100 3 28 89 3894 215
16:04:14 0 32 100 7 19 63 2686 511

vmstat 5 50
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
2 0 0 1245180 4578774 1144 0 0 0 0 0 0 3935 11206 770 3 4 93
2 0 0 1245180 4560624 496 6 0 0 0 0 0 16749 51650 6168 24 7 68
3 24 0 1313511 4560368 163 0 0 0 0 0 0 31123 98335 11080 41 13 46
3 24 0 1313511 4559388 80 0 0 0 0 0 0 41878 132330 14238 48 17 35
3 21 0 1241939 4563921 76 9 0 0 0 0 0 37473 117464 13265 31 13 57
3 21 0 1241939 4566861 25 2 0 0 0 0 0 27940 83425 10735 15 9 76
2 19 0 1208195 4566749 17 0 0 0 0 0 0 24451 69596 10150 13 8 79
2 19 0 1208195 4566748 12 6 0 0 0 0 0 20947 57637 9343 14 8 78
6 21 0 1221251 4566748 4 0 0 0 0 0 0 23511 63185 10861 16 9 75


We have worked with que_depth and found about 5% increase in going to 32.

I would like opinion on following and the order of importance (i.e. which should we try first?):

1. Oracle recommends that system is i/o bound if sar %wio is consistently above 20 (page 69) http://download.oracle.com/docs/pdf/A97297_01.pdf
Is that true?

2. SAN supports many other servers. Even though they are not 100% utilized, would segregation still be helpful?

3. Would there be a benefit to bringing other HBAs online? Only 1 port on each A6826A card is being used. If we bring on the other 2 that would give us 4 total HBAs per node. Oracle 10G experts say as a rule of thumb that a well balanced system will have 1 HBA for each CPU (Pg: 18-20 ) Does that mean we should have 8 HBAs per node? http://www.oracle.com/global/kr/download/seminar/2008/odd/0805/current_trends_in_database_performance_final.pdf

4. Is there benefit in presenting more LUNS? We have 4 partitions in the table going to 4 tablespaces going to 4 LUNS. Is there value in spreading that out? For the large 1.5B row table we have 4 luns totalling 300Gb.