- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Diagnosing performance issues
Categories
Company
Local Language
Forums
Discussions
- Integrity Servers
- Server Clustering
- HPE NonStop Compute
- HPE Apollo Systems
- High Performance Computing
Knowledge Base
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Knowledge Base
Forums
Discussions
- Cloud Mentoring and Education
- Software - General
- HPE OneView
- HPE Ezmeral Software platform
- HPE OpsRamp Software
Knowledge Base
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 08:54 AM
12-24-2007 08:54 AM
Diagnosing performance issues
We had an issue recently where there was no paging, no memory contention, no CPU contention, yet > 300 processes in the waiting state. The whole system became unresponsive for approx 5 minutes with nothing in the syslog etc. I gathered the output from vmstat, swapinfo, a getsysinfo, and a ps (UNIX95 showing each processes memory usage) etc but I am not really getting anywhere in getting to the root cause...
Any pointers greatly appreciated !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 09:06 AM
12-24-2007 09:06 AM
Re: Diagnosing performance issues
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 09:34 AM
12-24-2007 09:34 AM
Re: Diagnosing performance issues
While I do have glance, I have never used it in anger - if there are any docs suggesting what to look for in glance then they would be a real help... my problem at the moment is I dont know what I am looking for so even if I have glance open, unless something jumps out at me, I am pretty much in the dark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 09:35 AM
12-24-2007 09:35 AM
Re: Diagnosing performance issues
Complete diagnostics:
http://www.hpux.ws/?p=6
System may based on what you post may simply have too many processes running on it. These symptons point to a process binding the CPU into i/o. I/O wait can cause this if all the processes are waiting for I/O or one and other.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 10:27 AM
12-24-2007 10:27 AM
Re: Diagnosing performance issues
I'm not afraid of reading - just have not yet found anything comprehensive which gives any detail about the values shown and how to interpret / what to look for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 10:57 AM
12-24-2007 10:57 AM
Re: Diagnosing performance issues
If you were seeing occasional significant slowdowns under load already then yes, treat is as a performance problem with a very steep knee.
>> during the problem, I could not run *any* commands
So how do you know there was no paging/memory contention and so on as indicated. Some long term log? Some vmstat running per chance?
Some long tem log would / will be handy to catch this should it happen again.
What was still working?
Sounds like you got terminal echo at least.
Any slaves / deamons which spoke in their reports.
It could be a high priority cpu loop, but
this sounds like a connectivity issue.
Some network or fibre switch going burb.
Was there anyone physically near the system at the time of the problem?
Something powered down acidently and re-connected? You'd expect errors and/or time-outs, but still...
Check for reboots on all controllers, switches and such.
good luck!
Hein van den Heuvel
HvdH Performance Consulting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 11:25 AM
12-24-2007 11:25 AM
Re: Diagnosing performance issues
This issue happened twice within one hour, but at no other times, but from looking through sar output and vmstat outputs from just after the system came back to life, I am still struggling to work out what was causing it... We have Patrol on the server and the data gathered from Patrol only shows the Number of processes in a waiting state as increasing from 0 up to 300 for the duration of the outages. Nothing else looks out of the ordinary.. The one thing that jumps out at me, was the first hang occured immediately after Oracle had a failed shutdown and the DBA's were investigating, and the second hang occured after they gave up (when they reported the problem to me) and re-tried sorting Oracle out. I will ask the DBA's for more info as to what they were doing, but I wouldn't have thought Oracle should be able to almost hang the whole server - no matter what it was doing.
>>> during the problem, I could not run *any* commands
>>So how do you know there was no paging/memory contention and so on as indicated. Some long term log? Some vmstat running per chance?
I was able to establish a new logon onto the server, but from that, running commands such as uptime, or glance just didnt do anything.. once the slowdown ceased, they all sprung into life.
I did look at measureware stats to conclude that there was no paging etc, *BUT* having just looked a second time, I didnt pay enough attention... there is no measureware data for the duration of the two hangs..there is data immediately before, and after each lockup, but nothing during.
>>Was there anyone physically near the system at the time of the problem?
No - server was in a locked room with no-one in or near it.
>>Something powered down acidently and re-connected? You'd expect errors and/or time-outs, but still...
I had connectivity into the system as I was able to open a new ssh session as root.
If there was an issue with the SAN disks, or network, I would have expected errors in syslog/cstm/event.log... there is nothing.
Interestingly, syslog has got entries for my ssh connections etc so syslog was clearly working... I am starting to think the issue was only affecting new processes (but then if that was the case, why would ssh still be able to fork and give me a new session?)
>>Check for reboots on all controllers, switches and such.
I will do this when I am next in the office but I am still suspicious it was something local to the server.
Thanks for all the feedback so far guys...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 06:41 PM
12-24-2007 06:41 PM
Re: Diagnosing performance issues
i had observed many system hung states. But i never had problem collecting the mwa/PV statistics whihc is very helpful to determine what was happening on the system.(The syslog was getting updated then why not the mwa data which also resides in /var;definitly mwa processes are not new and might have been running for a long time. )
do you see any /var/tombstones/ts99 created?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2007 08:04 PM
12-24-2007 08:04 PM
Re: Diagnosing performance issues
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-25-2007 07:33 AM
12-25-2007 07:33 AM
Re: Diagnosing performance issues
The measureware stats are simply missing for the two windows..there is an entry at 20:40, then the next entry is 21:07 continuing util 21:25 and a gap until 21:40
>>do you see any /var/tombstones/ts99 created?
Nothing.
>>Do you have a NFS or SAMBA/CIFS filesystems mounted to this system? How about SAN disks?
We have NFS mounts, although there were no issues on other systems accessing these same mounts, and nothing in any logs suggesting issues accessing these filesystems.
There are SAN disks (a *lot* of SAN disks) but I dont have access to the arrays/switches so will have to ask the Storage guys to look at this. Nothing was reported in syslog, although we have already identified that during the problem, some processes were not writing to logs - ie, MWA etc so this isnt conclusive. I will have to see what our Storage guys can tell me about the SAN at that time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-25-2007 04:18 PM
12-25-2007 04:18 PM
Re: Diagnosing performance issues
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-25-2007 05:34 PM
12-25-2007 05:34 PM
Re: Diagnosing performance issues
Here is a quick guide to performance troubleshooting. Hope this help
WK
don't forget to assign points
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2007 12:55 PM
12-26-2007 12:55 PM
Re: Diagnosing performance issues
What is your DNS configuration?
Contents of /etc/nsswitch.hp_defaults or /etc/nsswitch.conf
and /etc/resolv.conf
If using DNS are the hosts doing the resolving under stress at certain times? (affecting your host with freezes when names cannot be resolved quickly).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2007 03:30 PM
12-26-2007 03:30 PM
Re: Diagnosing performance issues
is the CPU/MEM/swap utilization looks normal on the time window you have the data available. is that gradually increasing and stopped data collection at a point..
I hope this server is not part of any cluster.. other wise the server could have done a TOC and you may have a full crash dump for further analysis.