- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- BL460c G6 CPU Spikes?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2010 06:07 PM
10-24-2010 06:07 PM
BL460c G6 CPU Spikes?
Jonathan was looking for advice on a customer issue:
*******************
My customer has seen a significant performance issue on the BL460G6 where they are seeing CPU spikes every 30/120 seconds.
Please can anyone advise if they have seen this issue with the BL460cG6 servers and if there is a resolution? Having searched the advisories I can find no reference to this issue.
*******************
Mark replied:
********************
Can only slightly help if this is Windows based. This following assumes this is a one off Blade, and not lots of them doing the same thing.
Just getting %CPU from the process isn’t particularly helpful, given there is a usermode element to a process and a kernel mode element. Obviously the SmartArray SAS/SATA is a driver, so that’s pure kernel, but the management agents have a user mode element too.
BTW, I wouldn’t suggest running without the SmartArray Event Notification just in case you get an error, like a drive failure. This would mean you wouldn’t see it until a reboot (and even then it would be at Post so you may still miss it). The Event Notification tool passes the error to the System Log, and the management agents see it and shoot it out to the management station. If you ditch this then the system could find itself with a non-redundant array for quite sometime, and you wouldn’t know about it.
So that one is going to be difficult to troubleshoot. The way it’s normally done is via a utility called XPerf but ideally you need the public symbols which ISS L3 will not entertain supplying to you. So this means can you reproduce it, and if so, get an ISS L2 (GCC) call on it for them to do the XPerf session.
Regarding the agents, I’d try to look at what the process is doing. Is more Kernel ? User Space? What’s Interrupt Service Routine (ISR) Percentage, its Deferred Procedure Call (DPC) percentage? If these are high comparative with the process’s CPU percentage, then it’s in kernel, and not the fault of the process. If pure UserSpace you can use things like Process Monitor (from SysInternals) to try and workout what it’s doing, or worst case do a user mode dump of the process when at the Percentage CPU you think is bad (a SysInternals tool called CoreDump is really useful for that from a timing perspective). The UserMode dump can be analysed by a Microsoft engineer in L2, but to get to the nitty gritty of the functions used then you’d need to get public symbols, which is another chat with ISS L2/L3.
These would help, as the procedures to do the above (particularly XPerf) are quite complicated.
*******************
And Richard had some input:
**************
If the "spikes" were short (much less than a second) I would suggest increasing the NIC's RX queue. This assumes the drops are at the NIC and say up at a UDP socket buffer.
I'd probably also be inclined to make sure that the cores taking interrupts from the NICs were distinct from those on which these agents run.
********************
Anyone else have some input?