Operating System - Linux
1830168 Members
23626 Online
109999 Solutions
New Discussion

Re: cmascsid uses 100% cpu after restart hpasm

 
Coenen_1
New Member

cmascsid uses 100% cpu after restart hpasm

We have RedHat 2.1 AS Update5 installed on our server (DL380/G3). We've updated the psp from 7.10 to 7.11 and that went fine. The problem is when the server is booted and we restart the hpasm with 'service hpasm stop;service hpasm start'. After that the daemon cmascsid takes 100% cpu. Does anybody have a solution for this problem?
6 REPLIES 6
HGN
Honored Contributor

Re: cmascsid uses 100% cpu after restart hpasm

Hi

We have several servers with RedHat 2.1 AS and AS 3.0 running with HPASM, this may be something with the ASM package, there might be a new revision which you can try and upgrade and if it still does not fix then you need to ask HP.

We have not see any issues but I know that the package repalce the SNMP rpm which comes with the RedHat OS.

Rgds

HGN
Coenen_1
New Member

Re: cmascsid uses 100% cpu after restart hpasm

To my knowlegde is this the latest version. Our onsite HP support says that we have to upgrade to the latest firmware version before they put in a call so maybe we have to do this first.
Ross Minkov
Esteemed Contributor

Re: cmascsid uses 100% cpu after restart hpasm

Parker Johnson
Occasional Advisor

Re: cmascsid uses 100% cpu after restart hpasm

Hmm...seems like we have the same problem. Interesting we posted within a few hours. I opened a case with HP but based on my previous experience with their support team, I doubt much will come of it. If I get an answer, I'll keep you posted.
Mike Jagdis
Advisor

Re: cmascsid uses 100% cpu after restart hpasm

I think this is the 5th thread about the same problem so I'll paste the same reply:

-----------------------------------------------
The explanation is somewhat involved...

If you trace cma*d you'll find that it doesn't do anything but open the device, ioctl, close. Admittedly rather more times than should be necessary but that's just incidental bad design.

You'll find the delay - and system time consumption - seems to happen on the close. From here you need a fairly good working knowledge of the Linux kernel...

Ok? still with me then?

Run oprofile for a while and you'll find the cpu time is being consumed by invalidate_bdev. Which is interesting :-).

Invalidate_bdev is called from kill_bdev. Kill_bdev is called from the block device release code. Release is what happens on last close. Now the monitoring daemon is opening the unpartitioned disk device which it is pretty certain nothing else has open. (Off hand I'm not sure if even having an fs on the device counts as it being open. There are subtle differences and I *think* I'm right in saying that block device access and fs access is considered different at this level. Don't quote me or blame me!)

So, each close triggers invalidate_bdev. Why is this so bad? Well, the idea is that when the last close happens on a device you need to flush any cached data because, with much PC HW, you can't be sure when the media gets changed. Invalidate_bdev isn't *meant* to be called often. It works by scanning through the entire list of cached data for block devices to find and drop data related to the device being closed. So it sucks system time and the amount is proportional to the amount of cached (from any device) data you have.

WORKAROUND:
All you need to do is to make sure that each time the cma*d daemon closes the device it isn't the *last* close - i.e. some other process has the device open. The other process doesn't even need to *do* anything. Try something along the lines of:

sh -c 'kill -STOP $$' < /dev/cciss/c0d0 > /dev/null 2>&1 &

Hope that's all clear! (As mud... :-) )

(HP: As well as blind debugging I do Linux & OSS consultancy. I happen to know the answer to this one as it came up at a major investment bank...)
Colin Stuckless
New Member

Re: cmascsid uses 100% cpu after restart hpasm


I used Mike's suggested workaround on our DL380G3 and it seems to work fine. cmaidad was periodically (every 15 seconds I guess) using 10-15% CPU time and now my load average when idle is all zeros as it should be.

I have two controllers in my server, the 5i and a 6400, so I had to run two scripts, one for /dev/cciss/c0d0 and /dev/cciss/c1d0.

Thanks Mike

Colin