TruCluster
cancel
Showing results for 
Search instead for 
Did you mean: 

[Kernel Idle] process is degrading performance

SOLVED
Go to solution
mortadalb
Occasional Visitor

[Kernel Idle] process is degrading performance

Dear All,

I am facing a problem on a 2 clustered ES45 servers.

The [Kernel Idle] process on either nodes, Billing1 or Billing2 & sometimes both, is taking a lot of %CPU cycles (sometimes beyond 200 %). The patch kit applied on the TruCluster 5.1B is PK 3.

I attached a file that shows kernel process threads on both billing1 & billing2. Kernel threads snapshot were taken at a time where Billing1 uptime was below 1 & Billing2 uptime was wavering between 6 & 10.

This problem started around 4 to 5 weeks ago.

Do anyone have an idea for why it is happening?

Rgds,

Mohamad






11 REPLIES
Michael Schulte zur Sur
Honored Contributor

Re: [Kernel Idle] process is degrading performance

Hi,

as much as I understand this process it takes what is not needed by other processes. Why do you think it slows down your machine?
What do you mean by uptime?
How many cpus do your machines have?

greetings,

Michael
Eric van Dijken
Trusted Contributor

Re: [Kernel Idle] process is degrading performance

Are you sure this is happening since 4/5 weeks. Or did you start noticeing this behavior since this time?

Why do you think this is a problem? What i see is normal behavior of a Tru64 Unix kernel.

Explain your problem again?

If you have performance problems, try using tools like "top" and "atsar". Which will do a much better job of pin-pointing the cause.

What you are saying to us is something like this: If i start my car, it makes this humming noise.

Eric.

ps: If you stop rebooting your system, your uptime will go up.
Watch, Think and Tinker.
Han Pilmeyer
Esteemed Contributor

Re: [Kernel Idle] process is degrading performance

The "[Kernel Idle]" process as you call it runs a lot of kernel threads. It also contains the idle loop(s). Seeing a lot of CPU time accounted here is not a problem.

If you look at the amount of system time (using tools like monitor, top, collect, iostat...) then there may be good reason to take a closer look at this. Tools to use could be kprofile or lockinfo. We will tell you how to use those if you come to that point.

It does appear that your kernel is using a lot of memory. I'm not sure what you are running or why you think you have a performance issue, so a better description of that would help. You might take a look with "vmstat -M" to see if anything unusual sticks out.
mortadalb
Occasional Visitor

Re: [Kernel Idle] process is degrading performance

Thanks for your replies,

My cluster consists of 2 ES45 & an HSG80 storage. Each ES45 has 4x 1GHz CPUs & 12GB memory. Application running is Oracle RAC 9i in failover mode.

What I meant with uptime is the load average entry that tells about the number of jobs in the run queue for the last 5, 30, & 60 seconds. And from the tuning guide of Tru64, a load average between 3 & 10 means that the system is loaded either due to I/O, memory, or something else.

I got other standalone ES45 + MSA1000 with same OS image & oracle 9i runing intensively on them, and their kernel idle process never reache 5%.

I do confess that the application runing on the cluster is both memory & I/O intensive. But before 4 or 5 weeks, the system load average never constantly stayed above 3 reaching 12 & 13 often. I know this because I'm running collect on both nodes.

What I'm asking is to see if this kernel process behavior means memory shortage or it is a known issue in patch kit 3 so I should install PK 4.

Thanks,
Mortada
Eric van Dijken
Trusted Contributor

Re: [Kernel Idle] process is degrading performance

I would try the "monitor" program, it can be used to monitor the system (almost) real-time. Before you can fix this problem, you first have to pin-point the cause.

The monitor program comes from the "Open Source Software Collection for Tru64 UNIX" which can be found at http://h30097.www3.hp.com/demos/ossc/html/0Welcome.htm

Run this program on both nodes at the same time.
Watch, Think and Tinker.
Han Pilmeyer
Esteemed Contributor

Re: [Kernel Idle] process is degrading performance

Would you mind posting the output for "iostat 10" for a couple of minutes and the output of "vmstat -M". If it's easy to provide a collect file, that would be nice too.
Han Pilmeyer
Esteemed Contributor

Re: [Kernel Idle] process is degrading performance

"vmstat -P" would be useful too.
mortadalb
Occasional Visitor

Re: [Kernel Idle] process is degrading performance

Hi

I included some collect files from both cluster nodes and the output of vmstat.

As for monitor utility, I don't have it installed on my system. I hope the output of collect would help pin point the problem instead.
Victor Semaska_3
Esteemed Contributor
Solution

Re: [Kernel Idle] process is degrading performance

mortadalb,

High cpu usage of the kernel idle may indicate a configuration or hardware problem. We had two such situations in our cluster of 5 ES45s.

One was because we had a vdump going over a NFS mount using TCP protocol. TCP protocol for NFS mounts of disks in a TruCluster sucks (Telephone Support's words). Switched to UDP and the problem went away.

The other was one of two NICs in a NetRAIN set went bad. Seems it kept causing interrupts which the OS had to keep processing.

Vic
There are 10 kinds of people, one that understands binary and one that doesn't.
Mohamad Mortada
Occasional Visitor

Re: [Kernel Idle] process is degrading performance

Hi Vic,

You may be right because the date, in which I configuered the cluster to be client NFS, matches the exact time at which I've started noticing the performance degradation & high usage of kernel idle process.

For now, I changed the NFS server specs to use TCP threads only through sysman interface.
I am going to wait and see if this was truly the original cause of my problem.

Thanks.
Mortada
mortadalb
Occasional Visitor

Re: [Kernel Idle] process is degrading performance

Thanks guys,

The problem have been resolved by forcing NFS to use UDP instead of TCP as Vic suggested.