- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- management agent performance issues on DL580 G2/RH...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-21-2004 03:45 AM
06-21-2004 03:45 AM
management agent performance issues on DL580 G2/RHEL 3
Well, sure enough, the cmaidad process seems to poll on a 15 second cycle, and was always at the top of the run queue when we lost keyboard response. Running a strace on the cmaidad, it was observed that several ioctl calls similar to the following were occurring at every slowdown:
open("/dev/cciss/c0d0", O_RDONLY) = 3
ioctl(3, CCISS_PASSTHRU, 0xbfffddd4) = 0
close(3) = 0
If we disable cmaidad, the problem goes away.
Our DL580 has 2 X 2.5GHZ Xeons, with 8GB of RAM and a 5i+ smartarray running RAID 5.
The Redhat install is up to date patchwise, with kernel rev 2.4.21-15.ELsmp.
For hpasm/cmastor/cmanic we've tried versions 7.0.0-23/7.0.0-16/7.0.0-4 and 7.1.0-145/7.1.0-12/7.1.0-5 with the same result.
Interestingly, a DL380G3 with a 5i controller and an *identical* OS/software installation does not demonstrate the same problem.
Also, the affected DL580G2 previously had RHEL 2.1 with recent versions of the agents installed without exhibiting this issue.
The keyboard response is merely annoying, but we're also concerned about other impacts on performance that this may be causing.
Any thoughts?
Thanks
Matthew Lee
Fleming College
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-21-2004 11:43 PM
06-21-2004 11:43 PM
Re: management agent performance issues on DL580 G2/RHEL 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2004 04:26 AM
06-22-2004 04:26 AM
Re: management agent performance issues on DL580 G2/RHEL 3
Thanks
Matthew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2004 01:58 AM
06-29-2004 01:58 AM
Re: management agent performance issues on DL580 G2/RHEL 3
I've had to turn off the agents, otherwise
Oracle (the only app on the box) slows to a crawl during periods of medium-heavy activity.
As for nicing processes, per Vernon's suggestion, it has neglible impact, and
certainly shouldn't be necessary.
Anyone running the agents on a DL580G2 with RHEL3 and *not* having problems?
Thanks
Matthew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2004 10:55 PM
07-11-2004 10:55 PM
Re: management agent performance issues on DL580 G2/RHEL 3
This makes me think there's something wrong with the management agents as it's affecting only one class of server.
Just to check though, we're running a 5i for system and 6400 controller for data on the 580's, and only a 5i and/or 5300 on the 380's. Are you running the same?
Do you have a support contract? Have you logged a call or got it fixed yet? If not I might log a call as it's quite worrying the amount of time this process has clocked up on these servers...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2004 07:30 AM
07-12-2004 07:30 AM
Re: management agent performance issues on DL580 G2/RHEL 3
We're using the 5i for the OS and also have an external FC array for data connected via an Emulex HBA, but no 6400. We had a similar setup on our 380.
We're no further along in resolving the issue. We don't have a support contract beyond what the warranty provides, but like you, we're at the point where we'll need to open a call with HP to get this resolved.
Thanks
Matthew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2004 03:07 AM
07-26-2004 03:07 AM
Re: management agent performance issues on DL580 G2/RHEL 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2004 11:56 AM
10-21-2004 11:56 AM
Re: management agent performance issues on DL580 G2/RHEL 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2004 07:43 PM
10-21-2004 07:43 PM
Re: management agent performance issues on DL580 G2/RHEL 3
We do have a support contract and I logged a call with HP themselves. My problem was more that the cmaidad process was utilising way too much processor time, but I guess the side effect could have been as you stated, we don't really use the console here so I couldn't tell you!
Anyway, this problem is apparently now fixed in PSP7.2, which you'll note isn't out yet. HP Are trying to do an interim release of just the agents, but they don't appear to be published yet.
My last update shows that the PSP release is due at the end of the year... I guess we'll have to wait!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2004 06:52 AM
10-23-2004 06:52 AM
Re: management agent performance issues on DL580 G2/RHEL 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2004 09:28 AM
12-05-2004 09:28 AM
Re: management agent performance issues on DL580 G2/RHEL 3
Actually you're half way there. You just lack final kernel enlightenment :-)
-----------------------------------------------
The explanation is somewhat involved...
If you trace cma*d you'll find that it doesn't do anything but open the device, ioctl, close. Admittedly rather more times than should be necessary but that's just incidental bad design.
You'll find the delay - and system time consumption - seems to happen on the close. From here you need a fairly good working knowledge of the Linux kernel...
Ok? still with me then?
Run oprofile for a while and you'll find the cpu time is being consumed by invalidate_bdev. Which is interesting :-).
Invalidate_bdev is called from kill_bdev. Kill_bdev is called from the block device release code. Release is what happens on last close. Now the monitoring daemon is opening the unpartitioned disk device which it is pretty certain nothing else has open. (Off hand I'm not sure if even having an fs on the device counts as it being open. There are subtle differences and I *think* I'm right in saying that block device access and fs access is considered different at this level. Don't quote me or blame me!)
So, each close triggers invalidate_bdev. Why is this so bad? Well, the idea is that when the last close happens on a device you need to flush any cached data because, with much PC HW, you can't be sure when the media gets changed. Invalidate_bdev isn't *meant* to be called often. It works by scanning through the entire list of cached data for block devices to find and drop data related to the device being closed. So it sucks system time and the amount is proportional to the amount of cached (from any device) data you have.
WORKAROUND:
All you need to do is to make sure that each time the cma*d daemon closes the device it isn't the *last* close - i.e. some other process has the device open. The other process doesn't even need to *do* anything. Try something along the lines of:
sh -c 'kill -STOP $$' < /dev/cciss/c0d0 > /dev/null 2>&1 &
Hope that's all clear! (As mud... :-) )
(HP: As well as blind debugging I do Linux & OSS consultancy. I happen to know the answer to this one as it came up at a major investment bank...)