High sys waits in CPU

Stojcevski Dejan · ‎02-28-2005

Hi to all,
I have a high values in sys part of sar's output on my HP-UX 11.00 box (rp7400/4GB/4 550MHz CPU's).
What metrics should I watch for which will undoubutly show me what's going on my system?
In other words which metrics in MW are most important to resolve what is causing those sys cpu sycles?
Dejan.

Carpe Diem

G. Vrijhoeven · ‎02-28-2005

Dejan,

I would check top for a simple analyses in what processes are cousing the CPU cycles. Sys cycles could indicate a bottleneck on I/O or memory so sar -d and vmstat are commands you could use to take a look at those values. For measureware metrics you could start by ( installing) glance or gpm with that tool( free for 60 days) you can view current system load and it users the same metrics as measureware.

HTH,

Gideon

Stojcevski Dejan · ‎02-28-2005

I do have gpm installed and I do use it. My question is on what I should concentrate to resolve this thing (metrics)?
I do not beleave it is memory or disk issue because I have 1GB free mem out of a 4GB and I use VA7410 with average LUN response time of 2.5 ms (from sar -d).
What else should I check for and probably eliminate the cause of this behaviour?
Dejan.

Carpe Diem

Ivajlo Yanakiev · ‎02-28-2005

did you install last patch bundles ?
There was problems that is fixed with patches.

Stojcevski Dejan · ‎02-28-2005

OK,
I must admit: this thing started to appear after the changing the old VA7410 with the new one. The procedure was very simple and no problems occured during the activity. But when I rebooted the servers I noticed that in the sar output something is different: bigger sys portion for about 7 to 10 percent. Before it averaged about 4 in maximum. Now it is 13. With wio% of about 5 I get eated 18% of valuable cpu time for nothing! Patches are there and working.
Please advise!
Dejan.

Carpe Diem

Ivajlo Yanakiev · ‎02-28-2005

Try to stop VA and investigate problem.
If you have same problem hight SYS.
Problem is insite HPUX otherwise VA subsystem

Bonnyjose_1 · ‎03-02-2005

Hi Dejan
run the follwoing command and see what is the last line

UNIX95= ps -eo 'pcpu args user'|grep root|sort -nk1

Rgds
Bonny

Tim D Fulford · ‎03-02-2005

Hi

If you have swapped out old for new va check out settings.

RAID level
port speeds
F/W versions
Resiliance setting (Normal)
Q full depth threshold
Prefetch

Also check out what the RAID10 & RAID5DP space is ...

Basically, there are loads of settings on va7410 that can slow down or speed up response.

You may well be right and the va74x0 is not the problem, but it is suspiciuos that the va was changed.

as to the spcifics wrt OVPA.. I would check (for the disks)

BYDSK_PHYS_IO_RATE
BYDSK_UTIL
BYDSK_REQUEST_QUEUE
BYDSK_AVG_SERVICE_TIME

Check out that all the disks/Luns/vDisks have a similar disks utilisation, there are no queues and the service time is also low.

on the Global CPU

GBL_CPU_TOTAL_UTIL
GBL_CPU_SYS_MODE_UTIL
GBL_CPU_INTERRUPT_UTIL
GBL_CPU_CSWITCH_UTIL
GBL_CPU_SYSCALL_UTIL

SYS_MODE is the %sys in sar (roughly), the others are interupts & constext switching & system calls. If you have any historic data before new va you can compare & contrast..

per CPU ...BYCPU

BYCPU_CSWITCH_RATE
BYCPU_INTERRUPT_RATE
BYCPU_CPU_USER_MODE_UTIL
BYCPU_CPU_SYS_MODE_UTIL
BYCPU_CPU_TOTAL_UTIL

makeing sure that there is not a single CPU problem (for whatever reason)... I assume you are workiung in a multri CPU environment, if not the above is really covered by GBL stats.

Regards

Tim

-

Stojcevski Dejan · ‎03-02-2005

Hi Tim,
Here is what I see on the system (4x550MHz/4GB RAM):
1. About VA setting everything looks ok:
Raid = AutoRaid;
Port Speed = 2GB/sec;
Resilency is normal;
Queue depth is 4096;
Prefetch is enabled
There are many other settings on the VA but I think that is not a problem (everything was done by the book).

2. About GBL Metrics:
CPU_TOTAL_UTIL=15 (at the moment of snapshot - it goes up to 45 in busy hour);
CPU_SYS_MODE_UTIL=1.8;
CPU_INTERUPT_UTIL=0.9;
CPU_CSWITCH_UTIL=0.3;
CPU_SYS_CALL_UTIL=2.

3. Sar output shows avqueue=0.52 and avserv=2.9 ms (which is not so good for VA7410?)

In order to solve my problem I have several questions on their own:

1. There is a known "rule" that disk queue should be 0 at all times. I have set async driver in the kernel to be used by our database (Informix with KAIO if it matters in this discussion) and the CPU load dropped significantly! Whole system is more linear than before and the I/O's are much faster. Also I see average disk service time much less than before (presumably because of the load posed to disks).Disk queues however are little bit higher (1.5) than before (0.2). Is this a problem?

2. What is the limit of I/O's which can be handled by regular SCSI disks in KB/s? What about FC Disks?

After having answers to these questions we will continue with our quest :-).
Dejan.

Carpe Diem

Tim D Fulford · ‎03-13-2005

Hi

sorry for taking a while to get back to you...

The majority of what you are doing is excellent. KAIO fot Informix is good choice (I assume you are on HP-UX 11i, as HP-UX 11.0 KAIO is not so good). if you want to post your onconfig I can take a look at that.

You are also correct in assuming that the disks queues should be about 0. But this really depends upon how much load the va7410 is under. A fairly basic way of looking at the va7410 is to split it up.
- Controller IO. The controller is usually not stressed too much, but it is worth calculating things like the ammount of IO going down each port.
- Back-end IO. each disk in the array can potentially be a bottleneck. The array will try not to let a single disk become a bottleneck, but there is a limit to how much IO the back-end disks can take.

The service time of 2.9 ms is really neither here nor there, as it depends upon what it is measured against. As an example, 2 disks in a va7410 and 1 LUN with a service time of 2.9 ms implies that each disks is handling something like 240 IO/s; which is excellent (2.9 ms implies 1000/2.9=345 IO/s; doubling the writ IOs; assume 40%; gives *1.4 ==> 483 IO/s, and divide by the number of disks, ==> 241 IO/s. Ignore caching, which would generally reduce this number, but usually only by say 10%).

The SO to quantify your situation,
o How many IOs do you do, & what size
o what is the read:write ratio of the IO
o how many disks do you have in va7410 and what speed and size are they
o how big are the LUN(s) configured (is any of your space in RAID5DP)

15,000 rpm disks will do about 167 IO/s; 10,000 disks do 125 IO/s.

There are no absolute limits for a va7410, merely there are limits of the components used, how these interact define the limit to your host. You can find HP spec docs that talk about 64,000 IO/s, but these will be wholy to cache, and you will probably get no-where near these figures. It may well be that simple configuration changes could fix the situation. e.g. you are using RAID5DP for small random IOs, so adding disks will convert this to RAID10, AND provide more back-end oomph..

On a different tack, you can look at the disks IO rates using armperf. I'm struggling to remember the exact command

% armperf -c DISK -x COMMA -s -e
% armperf -c ARRAY -x COMMA -s -e

Regards

Tim

-

Stojcevski Dejan · ‎03-13-2005

I will check things with VA and get back to the discussion.
Meanwhile I am little bit confused about your answer for SCSI limitations:
"15,000 rpm disks will do about 167 IO/s; 10,000 disks do 125 IO/s. "
What you thing then about attached graph?
Left axis is OI/s and right is KB/s. Disks are 15k rpm 18GB Seagate. Confusion is about your statement about their limitations: according to the graph I already have a bottleneck! But when you look at the KB/s it is very low. So I guess the limitiations of the whole IO subsystem must not be measured only by IO's per s but also KB per IO.
Dejan.

Carpe Diem

Tim D Fulford · ‎03-14-2005

Hi

Dont confuse and throughput (IO/s) bottleneck with a bandwidth (kB/s) bottleneck. Informix uses 2kB pages, so it is likely that 1000 IO/s is about 2MB/s (or 2048 kB/s), as this is a small "frame" per IO it is unlikely that the controller is the bottelneck, more likely the back-end disks.

Looking at the "regular" performance 100 IO/s and about 250-300 kB/s ( so the IO size is 2.5-3kB/IO). Looking at the peaks, 13MB/s bandwidth and 600 IO/s (21 kB/IO). I'm guessing you are doing backups or large sequential scans during the peaks (Informix has a read-ahead function, which is probably increasing the kB/IO).

You still have not said how may disiks are in the array. or what the read:write ratio is. Until these figures are known I cannot say if the disks are being flattened? That said, the armperf commands will tell you 100% what is happening in the array.

Regards

Tim

-

Bill Hassell · ‎03-14-2005

I/O's per sec and Kb per sec are very different metrics. I/O rates in the THOUSANDS per second are easily obtainable even with JBODs. To see this, try:

dd if=/dev/rdsk/c0t6d0 of=/dev/null bs=512

(where c0t6d0 is whatever disk you'd like to test). Use glance or sar -b (preads) to see the I/O's per second. On an aging D360, a SCSI-2 disk gets about 3500 I/O's per second. However, during those 3500 I/O's, only 512 bytes were transferred and the system overhead skrocketed to 40%, exactly as expected since each I/O was so small. By the way, 3500 I/O's at 512 byes each = a whopping 1.75 megs per second.

Now change the I/O size from (a very poor value) 512 to 64k as in:

dd if=/dev/rdsk/c0t6d0 of=/dev/null bs=64k

and you'll see the I/O's drops to just a couple of hundred (225 per second on my old D350). Is that bad? Well, let's see how much data was transferred: 225 * 64k = 14400k which is 14megs/sec. Not bad for a dumb disk. Oh, the system overhead was about 6%.

So an increase in system overhead coupled with an increase in I/O's per second would indicate that the average I/O size is smaller. Now this is basic disk performance which applies to any system--larger I/Os are always faster and requires less system overhead. Granted, the dd measurement is artificial because it is reading the raw disk (no buffer cache) sequentially but the advantage is that such measurements are very repeatable. Database queries can be quite variable, even with the same data.

Bill Hassell, sysadmin

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

High sys waits in CPU

High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU

Re: High sys waits in CPU