Re: scsi queue depth and hpux

slydmin · ‎07-09-2008

We have a HPUX B11.11 PA RISC 64 server (hp9000 rp 7420).
We recently moved our storage from netapps to XP SAN (10K).we are facing some I/O issues possibly related to large LUNs and OS tunable kernel parameters - scsi_max_qdepth , scsi_maxphys

We are seeing a high number of blocked processes with vmstat (vmstat 1). The disks have a very low average queue (sar -d 1 3) and a good enough response time (15 ms on avg). Although most LUNs are a 100% busy, though, which could spell trouble.

Reading HP's forums and manuals, I found that we can tune the kernel parameter, scsi_max_qdepth, which by default is 8. I changed it, to 128, to see if we could relieve atleast some of the blocked processes.

However, we have not seen any change in the blocked processes at all.

Would you be able to coment on this problem?

Thanks,
-S

References:
http://publib.boulder.ibm.com/infocenter/dsichelp/ds8000ic/index.jsp?topic=/com.ibm.storage.ssic.help.doc/f2c_aghpfqudep_laxtvw.html

http://forums12.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1215615388033+28353475&threadId=634542

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1057014

http://docs.hp.com/en/B3921-60631/scsi_max_qdepth.5.html

Tim Nelson · ‎07-09-2008

Well IMHO 15ms average for such a high powered array kinda stinks.

The vmstat "b" column does not neccessarily mean disk ( it is anything other than CPU, paging, memory io, disk, semaphores, etc..etc.. )

What do your disk queues look like ?

How many HBAs ?

Bottle neck in the SAN ?

What does the array performance look like from the array's perspective ? If there is a performance issue on the array you can tune the OS until you are blue and never get anywhere.

TwoProc · ‎07-09-2008

There are theories out there... which many people will counter all day long with each other. There are talented forum members here who use and are quite happy with the "large lun" theory.

I don't - on my 10K XP's I use the "lotsa lun" theory. Reason - on a large lun - you've got only one OS structure (the scsi queue) to push all that I/O down. When you use "lotsa luns", you've got lots of scsi structures to push data down. And with that, I rarely have to increase the scsi queue depth.

However, before you begin to think about relaying that stuff out - what the application you're running - can it cache more disk I/O for you ? What about the OS buffers? What are they set at? If they are low, it may be feasible to raise them. Also, what about the layout of the disks in the XP? Is it all Raid 5? Some applications don't do so well with R5 for performance. What about the cache in the XP itself? Do you just have the default amount? You can install lots and lots of cache in your XP, and I generally recommend that sysadmins put as much in there as possible. Why? Well, if you didn't think what you were doing was I/O intensive (and therefore sensitive), you wouldn't have bought an XP storage server. Following along that line of thought - If you didn't need throughput, you could have gone cheaper.

So, I think you need to think about *WHY* you're using up so much I/O, and what you can do to reduce it before you decide what you've laid out isn't good enough.

All of that being said, 10K XPs are nothing less than awesome in most any layout, so I'm afraid that messing with it may not yield much gains as I already feel you're at the point of diminishing returns for incremental configuration changes. So, I'd see if I could reduce I/O through tuning and buffering before I'd go tearing apart the storage configuration. And lastly, did you get any help from HP in laying out the disks in that server at least somewhat optimally for that style of hardware, or did you just cut it up on your own? I'm assuming here that you've already had some best practices already applied here in the layout of the storage server.

We are the people our parents warned us about --Jimmy Buffett

Michael Steele_2 · ‎07-09-2008

Hi -S:

To check on process wait states system wide use UNIX95 and the -o arguement for 'ps'. For example:

UNIX95= ps -ef -o etime,pcpu,vsz,rsz,pid,ppid,comm,state | sort -rn | head

Note state, etime, pcpu, vsz especially. Also, move the arguements around since you're sorting on the first arguement only.

Paste in anything you find interesting. I.e, the highest consumers. What I've seen in the past is one or two processes will appear at the top of highest consumers in several areas.

Support Fatherhood - Stop Family Law

slydmin · ‎07-09-2008

@ Tim

We have a single HBA in our server. (fc 1 0/0/4/1/1 fcd CLAIMED INTERFACE HP 2Gb Dual Port PCI/PCI-X Fibre Channel Adapter (Port 2))

The disk queues are only about 0.5 to 1, but on some LUNs can go up as high as 10-15 during peak activity. Like I mentioned in my previous post, the utiliztion for 5 of these LUNs (datafiles/redologs/undo and temp) is about 100% all the time. Something seems fishy here, doesn't it ?

There about 4 LUNs that are large on this server. 2x600GB 2x1TB. The rest are 100 GB LUNs.

@Two Proc
Originally the plan was to get all 100 GB LUNs and use them, however, things did not go according to the plan and we are with what we have at this time. The server in question is running an Oracle database 9i.

Our storage engineer carved the LUNs himself. I am sure hoping he followed all the best practices for doing that. I think we have 4GB of cache on one controller and this is a 2 controller XP 10k we have here. So, I doubt if this is an issue, however, I am not saying it cannot be.

@Michael

I am not sure I follow you when you say "use UNIX95" See I am a new inductee into the HPUX world and consequently may not understand a whole lot of jargon. I will research for it on the web though. Also, the ps on my system does not have an "-o" switch. Man page does mention it either. Am I missing something?

Thanks for you inputs. BTW did I mention the forums websites is really cool? But I hate it when it timeouts on you after you have typed a loooong never ending response to the posts . And have to type the whole thing again out of memory.

-S

Michael Steele_2 · ‎07-09-2008

If you copy and paste in my command into the command line it will work.

You use UNIX95, which is an environment variable, to activate the -o switch.

Trust me Luke

Support Fatherhood - Stop Family Law

Tim Nelson · ‎07-11-2008

Looks like this is a little bit of everything causing the questionable performance.

1) single pathed server (multiple paths with balancing sw may relieve the stress and/or at least move it to somewhere more visable)

2) small number of large luns ( large luns may impede, one fs stack, one lvol stack, one device stack.( lots of 1s here ). Too many small luns are a nightmare as well. Make happy in middle.

3) disk queues ( caused by single path or disk performance at array ? )

Everything together may be adding a little bit to your issue.

Any stats from the array to report ( sorry if I missed them ).

Are these reads or writes ? or % of. Large number of writes filling the 2GB array cache may just disable the cache ( need to review array stats to be sure ).

Other options.
If you cannot make the hw faster then reduce its load by tuning the application. Maybe this is already done ?

As you can see there is an open road that takes a lot of monitor, tweak, monitor.

TwoProc · ‎07-14-2008

re: redologs/undo and temp
Are these in R01, or in R5? If any of this is R5, you're probably getting hurt big time. On the data areas - you may be hitting disk hard because of sorts on disk. Are you seeing lots of sorts on disk? (Ask DBAs). If so, you've probably got some stuff out there that needs tuning - get the DBA team to run statspack to start getting some data targets. You could also use Enterprise Manager Console for an easy way to look at the top running sql on the box, or the top running sessions on the box. All of this will give you an idea of what processes out there are taxing the server.

For hot disks try moving THOSE to R0/1 instead of R5. You'd see evidence of this if your cache hit ratio is low, OR, your I/O's per second across your database is high. If this is the case, once again it is time to start tuning the worst offending stuff in the database. You'd probably also do well with more cache in the database buffer cache areas. However, review tuning targets thoroghly FIRST before making this step.

We are the people our parents warned us about --Jimmy Buffett

Michael Steele_2 · ‎07-14-2008

Can you paste in the results from this command?

vgdisplay -v | grep -i -e 'PV Name' -e 'PV Status'

And also:

strings /etc/lvmtab

Thanks!

Support Fatherhood - Stop Family Law

Hein van den Heuvel · ‎07-14-2008

>> good enough response time (15 ms on avg

That's just a horrible response time.
Random seeks on a single very roughly equate the rotation time. For a 10Krpm drive with no cache assigned is roughly 1000*60/10000 = 6ms, and for 15Krm what would be 1000(milli)*60(seconds)/15000(rpm) = 4ms

>> The server in question is running an Oracle database 9i.

So what does Oracle tell you about the reason for the block processes. Maybe they are waiting for the client processes to ask for more work (SQl waits)? They would be blocked. Maybe they are waiting for the transactions to commit (logsync)... the processes would be blocked.

For now _please_ forget about the system management tools other that IO/sec and MB/sec.
First and foremost ask Oracle. What are its top wait events, What is the IO response time it is seeing (reads? writes?)?
Later, you may go back to the system tools to help understand, explain and correct what Oracle is reporting.

>> possibly related to large LUNs

Small or large Luns is interesting...eventually.
That's likely a tweak ( less than 10% impact). I suspect there are bigger issues.

>> We have a single HBA in our server.

Now that would worry a lot me from a performance perspective and more so from a sales/design perspective. Who made that decision, based on what? You may need a hose, but you have sold given a straw, to empty a keg!

A 2gb HBA can do 250 MB/sec on a good day.
Just 10 busy discs can deliver that.
You have 200+ disks right? 10x more!
The Data Bandwidth for an XP1000 is 8500 MB/sec... 40x more than a 2gb fibre.
250MB/sec @ 8KB/IO = max 30,000 IO/sec (much less with protocol overhead).
XP1000 cache performance is rated @ 700,000 IO/sec... 20x more.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: scsi queue depth and hpux

scsi queue depth and hpux