vmstat reports many faults cy and sys% is very high

Frank de Vries · ‎10-31-2007

We have a 2 node N-class cluster with HPUX PA-Risc 11.00 with Mc/Serviceguard.

This week the vmstat and sar statistics
reported that sys% is often above and
around 50%. This compared with stats from previous reports is more then double.

I can understand the ad hoc peak , but this is now going on for more then 48 hours rock solid.

When I look at top (top processes), I see nothing realy standing out. Process come and go.

But when I look deeper at vmstat is is clear that the faults cy (system calls per second) is realy high! Over 50.000

It seems like some kernel system call (or multiple calls) are trying to do something , but fail.

dmesg and syslog don't show anything out of the ordinary.

Can you help me with some diagnostic tips or advices that can help me to dig deeper into this phenonemon so I can find the root cause.

The system overall is okay, if you look at sar -u we see about 2 to 10% idle.
The users are not complaining yet.

But I want to figure this out.
I have uploaded file with vmstat output
Thanks

Look before you leap

Duncan Edmonstone · ‎10-31-2007

Frank,

Do you have Measureware installed on this host?

Try:

mwa status

It can be very useful in situations like this (e.g. Identify processes that spend a large amount of time doing system calls)

HTH

Duncan

I am an HPE Employee

Frank de Vries · ‎10-31-2007

Seems we have it, but it is deactivated.

See output:[root@orasrv1:]/picnew/backup/bin<>>> mwa status
MeasureWare scope status:
WARNING: scopeux is not active (MWA data collector)

MeasureWare background daemon status:
(Should always be running when the system is up)
WARNING: ttd is not active (Transaction Tracker daemon)

MeasureWare server status:
WARNING: alarmgen is not active (alarm generator)
WARNING: agdbserver is not active (alarm database server)
WARNING: perflbd is not active (location broker)
WARNING: rep_server is not active (repository server)

You have mail in /var/mail/root
[root@orasrv1:]/picnew/backup/bin<>>>

What exactly will I be monitoring in measure ware? How do I use it ?

Look before you leap

Frank de Vries · ‎10-31-2007

I have started it.
I remember last year we got disk full errors.
I will have to be carefull.

[root@orasrv1:]/picnew/backup/bin<>>> mwa start

The Transaction Tracker daemon is being started.
The Transaction Tracker daemon
/opt/perf/bin/ttd has been started.

The MeasureWare scope collector is being started.
The performance collection daemon
/opt/perf/bin/scopeux has been started.

The MeasureWare server daemons are being started.
The MeasureWare Location Broker daemon
/opt/perf/bin/perflbd has been started.

[root@orasrv1:]/picnew/backup/bin<>>>

Look before you leap

Hein van den Heuvel · ‎10-31-2007

[Frank, thanks for attaching the VMSstat data. That's great, but if it is just the text then PLEASE, just append it as a .TXT simple file next time? Not all of us want or can run MS Office (or even Openoffice) tools see simple contents]

A good reason for high system time is often a lack of free memory, but that does not appear to be the case.

Frank, the system being call "orasrv" is it safe to assume it does a lot of Oracle work?

If so, then I would focus on oracle.
Is it used in RAC mode?
Is there an oracle process clocking up lots of CPU?
Can you (truss) the system calls from that process for a short while?

Wat is the current setting of the Oracle param 'STATISTICS_LEVEL' ?

What does Oracle STATSPACK give as top wait events? Maybe there is a tell-tale of trouble there?

http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams210.htm

met vriendelijke groetjes,
Hein.

Duncan Edmonstone · ‎10-31-2007

Frank,

Sorry for not coming back sooner - had a few conf calls to attend...

If you think you are going to run into problems with space for measureware then you can look at the size of the files in /var/opt/perf/datafiles and then look at the entries in the /var/opt/perf/parm file which control the sie of these files.

Anyway if measureware has now been running and collecting data for the last few hours you should have some data to look at. A good place to start might be processes spending a large amount of time doing system calls. The following code should extract this from Measureware for you...

cd /tmp

cat > syscall-report.cfg << EOF
DATA TYPE PROCESS
DATE
TIME
PROC_PROC_ID
PROC_PROC_NAME
PROC_CPU_SYSCALL_UTIL
EOF

extract -p -xp -r /tmp/syscall-report.cfg -f /tmp/syscall-report.out,purge -b TODAY-1 -e LAST

HTH

Duncan

I am an HPE Employee

Frank de Vries · ‎11-04-2007

Duncan: Thanks !
Hein, thanks but we have oracle processes coming and going. And the stats look as they should be. Nothing out of the ordinary.
Also we have truss, but what system call do I trace ? Should I not first have a reasonable suspect ?

However, I noticed something else,
we have variable process from our unicenter autosys product.

I will investigate this first, and use Heins
startup into mwa.

Ta

Look before you leap

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

vmstat reports many faults cy and sys% is very high

vmstat reports many faults cy and sys% is very high

Re: vmstat reports many faults cy and sys% is very high

Re: vmstat reports many faults cy and sys% is very high

Re: vmstat reports many faults cy and sys% is very high

Re: vmstat reports many faults cy and sys% is very high

Re: vmstat reports many faults cy and sys% is very high

Re: vmstat reports many faults cy and sys% is very high