Operating System - Linux
1830899 Members
3447 Online
110017 Solutions
New Discussion

Re: cmascsid process taking up all of CPU?

 
Parker Johnson
Occasional Advisor

cmascsid process taking up all of CPU?

I am running rh2.1 2.4.9-e.27enterprise kernel on a 4 proc xeon dl580g2. Whenever I start up hpasm on the machine while running idle (.01 load), my load climbs to over 1 and I see cmascisd go to the head of the line in top and use 99.9% of the cpu. My machine has an uptime of 102 mins, and cmascsid has taken up 96 mins of the cpu! What gives?

I installed the hpasm/hprsm (version 7.1.1-87) agents via the psp support pack. Has anyone else seen this problem? I attached some top, uname, uptime, and rpm output. Any assistance would be appreciated. Please don't make me call the 1-800 hp suport number! haha.
12 REPLIES 12
HGN
Honored Contributor

Re: cmascsid process taking up all of CPU?

Hi

Looks like there is some issue with 7.X
refer to this thread
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=718797

Hope this has some answers for you

Rgds

Gopi
Parker Johnson
Occasional Advisor

Re: cmascsid process taking up all of CPU?

Thanks for the link to the thread but I am not kernel panicing (at least not yet). Not running the agents is an option, but I have over 30 disks attached to this system that I need to have failure notification for. Are there any viable alternatives to the agents when administering servers remotely?
Ross Minkov
Esteemed Contributor

Re: cmascsid process taking up all of CPU?


Do you have:

any amber lights on the SCSI disks?

any mirrors or raidsets that are reconstructing?
Ross Minkov
Esteemed Contributor

Re: cmascsid process taking up all of CPU?

Parker Johnson
Occasional Advisor

Re: cmascsid process taking up all of CPU?

nope, no amber lights and no reconstructing raid sets.
Parker Johnson
Occasional Advisor

Re: cmascsid process taking up all of CPU?

Got a response back from HP on my case and they are currently trying to reproduce it in their environment. The tech I spoke to thought the cmascsid process and my qlogic HBA were not playing nice. In the meantime, I was told to make the following adjustment to /opt/compaq/cma.conf and restart hpasm service to solve the problem:

exclude cpqrid cmascsid
TJ_16
Frequent Advisor

Re: cmascsid process taking up all of CPU?

I am running RH 3.0 Update 3 with Qlogic HBA cards and I too am seeing cmascsid pegged out at 100% CPU usage.

Was wondering if you every got a better fix than taking cmascsid out of the hpasm start up?

Thanks,
Parker Johnson
Occasional Advisor

Re: cmascsid process taking up all of CPU?

nope, I never received a better explanation. The bottom line for me is that HP doesn't have any managements ready for primtetime on linux. I am gonna have to resort to writing hokey scripts that work with command line acu utility to pick up on bad drives. Too bad i'll never know about other failed components.
TJ_16
Frequent Advisor

Re: cmascsid process taking up all of CPU?

I am going to open up a ticket about this and if I learn anything will post here...

I find HP's Management Software to be severely lacking and quite a pain to run.
Mark Satayathum
Occasional Advisor

Re: cmascsid process taking up all of CPU?

We are experiencing the exact same thing (after being told by Red Hat and HP that the 7.11 psp is the panacea to all of our mgmt agent problems).
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=734349
Mike Jagdis
Advisor

Re: cmascsid process taking up all of CPU?

The explanation is somewhat involved...

If you trace cma*d you'll find that it doesn't do anything but open the device, ioctl, close. Admittedly rather more times than should be necessary but that's just incidental bad design.

You'll find the delay - and system time consumption - seems to happen on the close. From here you need a fairly good working knowledge of the Linux kernel...

Ok? still with me then?

Run oprofile for a while and you'll find the cpu time is being consumed by invalidate_bdev. Which is interesting :-).

Invalidate_bdev is called from kill_bdev. Kill_bdev is called from the block device release code. Release is what happens on last close. Now the monitoring daemon is opening the unpartitioned disk device which it is pretty certain nothing else has open. (Off hand I'm not sure if even having an fs on the device counts as it being open. There are subtle differences and I *think* I'm right in saying that block device access and fs access is considered different at this level. Don't quote me or blame me!)

So, each close triggers invalidate_bdev. Why is this so bad? Well, the idea is that when the last close happens on a device you need to flush any cached data because, with much PC HW, you can't be sure when the media gets changed. Invalidate_bdev isn't *meant* to be called often. It works by scanning through the entire list of cached data for block devices to find and drop data related to the device being closed. So it sucks system time and the amount is proportional to the amount of cached (from any device) data you have.

WORKAROUND:
All you need to do is to make sure that each time the cma*d daemon closes the device it isn't the *last* close - i.e. some other process has the device open. The other process doesn't even need to *do* anything. Try something along the lines of:

sh -c 'kill -STOP $$' < /dev/cciss/c0d0 > /dev/null 2>&1 &

Hope that's all clear! (As mud... :-) )

(HP: As well as blind debugging I do Linux & OSS consultancy. I happen to know the answer to this one as it came up at a major investment bank...)
Stephen_126
Occasional Advisor

Re: cmascsid process taking up all of CPU?

Mike,

I saw this problem and slowed the loop until I could return to it..(cheet yes!) just ran into this thread.

I understand, but not where to insert this dummy close.

SuSE8/UL1.0-SP3

-Stephen