Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Steve Post · ‎02-13-2013

rx2880-i2 hpux11.3

I have a program that reads in text to Sybase. The program use java. When it runs on the old pa-risc boxes, it works fine. When it runs on the my itanium box it hangs the box for 2 minutes.

I see the queue on the internal changes from 0, to 43,000. I see the average wait to process data to the disk changes from 0 microsecs to 23,569.85 seconds. I see the internal disks are 100% busy, and 100% of the system is stuck waiting on those darned internal disks.

I see a lot of paging in. I see 30 blocked processes (vs none). I see hardly any memory is in use at all.

I run saconfig /dev/ciss0 and see no cache on the internal disks. I see the "some cache card thingie" is not installed. It is not installed because I don't have it. It does not actually EXIST on the box. I know the internal disks are mirrored raid 1/0. I do not want to mess with the internal disk layout because this is where hpux resides. I'm using those. I can't shutdown the box.

One way I can greatly help performance is to move the work area of this job to disks on an EMC Vnx Disk array. The array disks seem to handle the stuff with no problem at all. But I would like to understand why a more modern system can actually be more junky-er.

Why do the internal disks run so lousy?

Is there a parameter I can tweek on /dev/ciss0 to help without rebooting or destroying anything?

Maybe a kernel parameter?

Maybe a parameter in saconfig or sautil?

one more thing... I am paging out when the system is locking up. pi=379, po=674, All of the swap logical volumes are on the internal disks. I ran ipcs -mob and saw no change in memory usage. I did NOT run swapinfo -tam (guess I should have). So I bet SOME more help would be to move some of that swap to the EMC Vnx Disk array.

Duncan Edmonstone · ‎02-13-2013

Steve,

Have a read of the following thread:

http://h30499.www3.hp.com/t5/General/IO-performance-puzzle/td-p/3977326

I can't explain why you didn't see this problem on a PA-RISC box - are you completely sure that was working off just a single internal boot disk?

I am an HPE Employee

Torsten. · ‎02-13-2013

I think the optional 512MB cache modul for the P410i will speed up the disks.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!

Steve Post · ‎02-13-2013

it will take some time to go through the referenced thread.

I am not in a position to buy anything, or reboot anything. I trying to understand before I affect the use of the system. Well...Affect it worse than I have.

Steve Post · ‎02-15-2013

I looked through the thread. there was a lot of talk. But the kernel parameter mentioned does not exist.

I will look at the queue for the internal disk array /dev/dsk/ciss0. I have a GUESS.....

the internal disks are busy, so they tell HPUX to put they tasks on the side...I guess.

and HPUX does that.

But the swapspace logical volumes are also ALL on the same internal disk. So it would seem to run in an infinite loop. Just saying it sounds like an infinite loop.

Perhaps my imagination is not too good. Here is what I am imagining:

process: the disk is busy. put me in the swap space.

hpux: ok.

process in the swapspace: the disk is busy. put me in the swap space.

hpux: ok

process in the swapspace going into the swapspace: the disk is busy. put me in the swap space.

hpux: sure....why not?

process in the swapspace going into the swapspace that is going into the swapspace: hey... can't I get in here. too? that darned disk is busy. why not put me in swap space.

hpux: sigh.... you too? FINE.....

etc, etc, etc, etc,

Dennis Handly · ‎02-15-2013

>pi=379, po=674, I ran ipcs -mob and saw no change in memory usage. I did NOT run swapinfo -tam

It would help to show use the swapinfo output. I'm not sure what good ipcs would show you?

You can also get page in and out for disk I/O (mapped files).

>But the swapspace logical volumes are also ALL on the same internal disk.

But are you paging data to swap or just doing lots of I/O?

Duncan Edmonstone · ‎02-15-2013

>I looked through the thread. there was a lot of talk. But the kernel parameter mentioned does not exist.

Gee, funny I must have dreamt it then when I wrote that post...

Hang on...

# uname -rs
HP-UX B.11.31
# kctune -v disksort_seconds
Tunable             disksort_seconds
Description         Maximum time period for grouping incoming I/O requests (secs; 0=no limit)
Module              fs_util
Current Value       0 [Default]
Value at Next Boot  0 [Default]
Value at Last Boot  0
Default Value       0
Constraints         disksort_seconds >= 0
                    disksort_seconds <= 256
Can Change          Immediately or at Next Boot

As I think I said in my post in that thread - it isn't documented,as you are discouraged from using it, but the point in those posts about not doing alrge amounts of sequential IO to system disks holds true...

I am an HPE Employee

chris huys_4 · ‎02-15-2013

Hi,

log a support call with HP.

and ask them to ask you to provide a kitrace output, when the issue appears.

Best Regards,

Chris

Steve Post · ‎02-18-2013

Dennis: I think I am just sending a lot of io. I will need to rerun my test while running swapinfo -tam. I was running ipcs -mab just to get more info while this error happens.

Dear Mr Edmonstone: LOL! I jump to conclusions too fast. But I got a kick out of your response. I will leave that kernel parameter alone.

Chris Huys: I am not calling HP yet. I don't like wasting their time with a question that is too vague.

at this point I might be running full bore the wrong way but? Well?

My error is very repeatable. But it might also be just plain nonsense. I have a job that read/writes data to a database that is not on internal disks. But it uses java, that is on /opt. I run one or two at once? No problem. A couple of them hang on for a while because they run notoriously slow. But others run easily under a second. Now if I add more of the jobs (the easy/fast ones), until I have 17 jobs at once? The sucker just hangs everything for 5-10 minutes. I can get the problem to go away as I move more and more program files to an array disk and link it back. But that is a bandaid.

I found another forum entry about this. The guy tried increasing the max_q_depth. via

scsimgr [set_attr|save_attr] -D /dev/rdisk/disk2 -a max_q_depth=32

His system ran a bit slower. But the big hang up in the system went away.

The current Q depth is 8 for that internal disk. I think I might be quickly running for the solution without identifying the cause. But I know the avwait of the disk is 27,754. And the avserv is 15.29. This seems wrong.

So my NEW test: run my stuff with scapinfo -tam on. Change the queue from 8 to 16. Repeat the test.

My best information that I can interpret has been from sar -d. I ran it every minute.

I'll let you know how it goes..... then maybe call HP.

Steve

oops. forgot the forum link.

http://h30499.www3.hp.com/t5/System-Administration/Disk-Utilization-at-100/td-p/5280090

update2:

I ran the repeatable job that hangs. It hangs up the system for 2-10 minutes.

waittime on the disk: 12,251.35. server time: 14.13.

Total time for the test about 15 minutes.

I change max_q_length to 32 via scsimgr.

hangtime: 0

waittime on the disk: 0.1 server time: 4.39

Total time: 40 seconds.

It seems that I have the solution to my test. Increasing the queue sort of helped.

Well I am not sure. I undid it. I made the queue 8 again. And it still runs fast.

Maybe something else changed. So jury is still out. Once again. I've driven in a circle.

Torsten. · ‎02-18-2013

Well, if you read all the data from external disks, the next question would be about the SAN ... switches, arrays, paths, etc.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!

Steve Post · ‎02-18-2013

I found my error in my test. With that fixed, I was able to compare performance by running the same pile of jobs over and over.

Here is how it went.

I ran  scsimgr set_attr -D /dev/rdisk/disk2 -q max_q_depth=8

queue of 8
00:00:00   device   %busy   avque   r+w/s  blks/s  avwait  avserv
10:18:00    disk2    4.00    0.51      22     566    0.07    3.36
10:20:17    disk2   87.61 6552.51     481   26045 12251.35   14.13
            disk2  100.00   43.58    3950  150000   48.60    8.49
    (two entries at 10:20 is not a typo).
10:21:00    disk2  100.00 3884.44     631   25356 3213.06   12.67
10:22:00    disk2   80.09  893.39     619   21720 5262.29    8.67
10:23:00    disk2    6.57    1.44      28     635    1.62    5.47

queue of 32
00:00:00   device   %busy   avque   r+w/s  blks/s  avwait  avserv
12:52:01    disk2    2.80    0.50      10     248    0.00    5.64
12:53:01    disk2    3.83    0.50      28     617    0.00    3.06
12:54:01    disk2   95.99 1915.28    1149   73149 1452.83   23.27
12:55:01    disk2   39.13 1488.82     361   26574 1085.59   19.07
12:56:01    disk2   10.14    1.84      61    1107    2.45    7.64
12:57:01    disk2    8.90    0.50      60    1135    0.00    2.98

queue of 64
00:00:00   device   %busy   avque   r+w/s  blks/s  avwait  avserv
13:06:00    disk2    8.97    0.69      48     877    0.35   10.59
13:07:07    disk2   26.95 2208.50     291   18046 1283.96   47.42
13:08:03    disk2  100.00 5702.70    1605  107088 3470.67   39.31
13:09:00    disk2   34.42    0.53     282    9685    4.06    3.86
13:10:00    disk2    8.58    0.58      49     764    0.17   12.20

when I was watching swapinfo.

NOT BUSY......
             Mb      Mb      Mb   PCT  START/      Mb
TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev        8192     139    8053    2%       0       -    1  /dev/vg00/lvol2
dev       16384     141   16243    1%       0       -    1  /dev/vg00/lvol9
dev        8192       0    8192    0%       0       -    2  /dev/vg00/lvol14
reserve       -     632    -632
memory    31069   26199    4870   84%
total     63837   27111   36726   42%       -       0    -
BUSY......
             Mb      Mb      Mb   PCT  START/      Mb
TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev        8192    1247    6945   15%       0       -    1  /dev/vg00/lvol2
dev       16384    1258   15126    8%       0       -    1  /dev/vg00/lvol9
dev        8192       0    8192    0%       0       -    2  /dev/vg00/lvol14
reserve       -    4490   -4490
memory    31069   26124    4945   84%
total     63837   33119   30718   52%       -       0    -
The less /dev/vg00/lvol2 was in use, the faster every was running.

So my conclusion would be to set the Max Queue depth to 32.

I would like to avoid swapping out also. Perhaps throttling down the number of times my java job can run at once will help. I can imagine two guys trying to get through the same door at the same time. I can also imagine someone's Mom saying "take HUMAN bites."

Is it a good idea to make a swap space on a disk array?

Dennis Handly · ‎02-18-2013

>Is it a good idea to make a swap space on a disk array?

A better idea would be to add more RAM. ;-)

Steve Post · ‎02-19-2013

Maybe I am reading it wrong,(really?), but it looks like I have plenty of RAM. So why is it swapping out some stuff that does not need swapping out? My guess it that it is swapping out because the disk is getting pummelled with data. And the "swap" area is also on the same pummelled disk.

So it all runs like crap because it is sort of in an infinite loop.

At least changing the max_q_depth to 32 helped a lot. Avoiding my scenario will go further.

I don't like Java.

Dennis Handly · ‎02-19-2013

>it looks like I have plenty of RAM.

I see some device swap being used:

dev 8192 1247 6945 15%

dev 16384 1258 15126 8%

>So why is it swapping out some stuff that does not need swapping out?

It could be paging out mapped files and shared memory, not just data pages.

>And the "swap" area is also on the same pummelled disk.

That won't help.

Steve Post · ‎02-22-2013

I think I am done. I can just about eliminate my problem with a few things.....

Wait... I'll step back and explain in my twisted, unofficial examination. (like if I land a bowling ball on my foot I can deduce that is a BAD thing).

When I run 17 of these lousy java jobs at the same time, it causes the internal disk to get really busy. If I run 8 instead? Hardly any impact at all. But let's go back to running 17 of the jobs.

If I use scsimgr to change the max_q_depth from 8 to 32? it helps immensely. I also note that paging out is lot bigger this way. And the memory is at 45% (a bit less that 100 I must say). So it appears that the disk requests are in a queue that drops to swapspace. Because I have pseudo swap on, most of these "thingies, doo-hickies whatchamacallits" are in actual memory. I also tried a queue depth of 64. It was not as good.

I also know that when I move more and more of the lousy java job to external disk, things run faster. Why? Because the external disk is a Vnx EMC disk array that works fast.

So to summarize:

Increase max_q_depth on the internal disk from 8 to 32 and stop obsessing over memory.

Move what the script uses to external disks.

Do not run too many of this poor script at once.

hmmmm...... Bowling Balls

Steve

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 HP-UX 11.31 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 hpux11.3 system

Re: Internal disk so busy it locks the rx2880-i2 HP-UX 11.31 system

Re: Internal disk so busy it locks the rx2880-i2 HP-UX 11.31 system

Re: Internal disk so busy it locks the rx2880-i2 HP-UX 11.31 system

Re: Internal disk so busy it locks the rx2880-i2 HP-UX 11.31 system