Operating System - HP-UX
1752274 Members
4877 Online
108786 Solutions
New Discussion юеВ

Re: Need help troubleshooting performance issue

 
Raj D.
Honored Contributor

Re: Need help troubleshooting performance issue

again, Tony,

From the output it is showing:
kernel ( pid=12326 ) --> using top cpu
java process (pid=18018) --> using top memory
swap utilization: --> normal.
disk i/o --> to be measure at that exact time of the issue. Or to be measure historically during runing heavy jobs.


- Also this data shows it was taken when cpu utilization was around ~55%. and not during 100%

You ca Prepare a script or multiple in advance and get ready to run during the performance crunch to pin point the cause.

Hth,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
Tony Williams
Regular Advisor

Re: Need help troubleshooting performance issue

Thanks Raj,

Here the question would be:
- Did you see any increased load at that time. i.e may be more oracle process or more java process or more application than usual scenario, or more batch was executed.

No increase every process that was running during the problem was running earlier in the day.

- How many cpu do you have . What is the model of the server.

16, Montecito based Superdome,

- How many process wa runningduring that time, and how many process runs at usual load.

a modest increase in active processes, for most of the day active processes were 1800 ~ 2000. During the 30 minute problem the processes jumped up to 2400 ~ 2500, then back down to 2000.

- what was the load factor at that time. Obviously it would be more than 1, 2 ..

A big increase in load >6,

- What measureware 'extract' report shows the historical data of cpu/mem/io/swap/network in/out etc.
From above we can narrow down the cause,

I have attached a text file of global metrics during a 30 minute period that the problem happened.
Tony Williams
Regular Advisor

Re: Need help troubleshooting performance issue

Hi Michael,

What is this process?

1049892 R 18018 1 java : First in virtual memory and gone to init. Is that normal for it to go to init or should it have a parent pid?

Tnis is a SAP Netweaver processes. I don't know if its normal but when I look at that process its PPID is always init.

What is this process?

90.82 R 18669 18375 jlaunch : 2nd in cpu activity only behind the kernel.

Its a 2nd Netweaver process, both have VM profiles of > 6 GB.

Question to Others:

Is it normal for 'kernel' to be consumming the most CPU time?

kernel is a SAP application process. Yes its normal. SAP and Oracle consume a lot of this server. Its normal for most system resources to be hogh ~80%. I'm pretty sure its one of 5 processes that pushed the server over the edge, the 3 SAP processes, a Oracle Enterprise Manager Process, or a Backup process. The server goes back to normal when the OEM process is stopped.

So was it that one process, and if so what did it do to over consume the server, or was it a bad combination of 5 processes that all decided at that moment to increase their load?
Michael Steele_2
Honored Contributor

Re: Need help troubleshooting performance issue

Its hard for me to say because of the formatting but from 1630 to 1655 Disk I/O was 100%.

Would you attached the totals of the sar -d report?
Support Fatherhood - Stop Family Law
Raj D.
Honored Contributor

Re: Need help troubleshooting performance issue

Tony,

>> a modest increase in active processes, for most of the day active processes were 1800 ~ 2000. During the 30 minute problem the processes jumped up to 2400 ~ 2500, then back down to 2000.

- Well, 2000 to 2400 increase in process number are good amount of bump of processes, and it will consume large amount of resource. And in this case the processes are cpu intensive as cnsuming more cpu.



>> A big increase in load >6,
- This is a huge load for hp-ux system, I have seen 3 to 4 load factor makes the server freeze.

- 16:30 to 16:55 cpu utilization was 100%
- at that time only noticeabe change is little bit increase in swap usage : 4%.
That means the increased number of processes are consuming more cpu.
- next ste would be track down the process details, application details and try to figure out is it normal for those extra process to consume 70% of the cpu.
As it was bumped 30% to 70%.
I have seen a 128 monteito cpu SD performs low with increase in load. So the team who is putting the load on the server keep asking us how much is the load and accordingly they increase the load.

- If you get a difference between the current process and increase in process ( ps -ef ) , notify the application team that this 400 process caused cpu to go from 30% to 70%. And verify if it is normal . If it is normal , then the system may need more 'horse power'.


Hth,
Raj.



" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor

Re: Need help troubleshooting performance issue

Correction:
bumped 30% to 70% --> to be read as "30% to 100% "
[That means 70% sudden increase of cpu reources.]
" If u think u can , If u think u cannot , - You are always Right . "
Dennis Handly
Acclaimed Contributor

Re: Need help troubleshooting performance issue

memory 524023 124095 399928 24%
total 1251063 691602 559321 55%

You actually have 524 Gb of memory and an extra 700 Gb of device swap?

You probably should remove lots of that device swap.
Michael Steele_2
Honored Contributor

Re: Need help troubleshooting performance issue

Note: you attached your collection script but not the sar -d date - only provide the TOTALS please!!!!!

So you've got a process bottleneck, a cpu bottleneck and a disk bottleneck. But which process caused it?
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: Need help troubleshooting performance issue

16:35 - disk bottleneck
16:50 - cpu
17:05 - free memory drops from 651 x 10**6 MB
-to- 9 x 10**6 MB -or- about 85% less of normal if 651 is normal.

So the first thing to happen was a disk bottleneck, based upon you MWA data.

Since paging jumped astronomically this explains the disk bottleneck.

A high priority page in request will cause the processing to stop until the page is found.

This is most certainly an application issue -

What in the application causes high priority requests?

What was running / happening / were the users doing at 16:45 ?????

Note: 16:45 - end of day - some monster report / select statement
Support Fatherhood - Stop Family Law
Michael Steele_2
Honored Contributor

Re: Need help troubleshooting performance issue

Hi

I am thinking this is a global select statement. Why? Its not just one high priority request, its a lot of high priority requests. So many that memory filled up and still incomplete, still lookiing for more, when the box crashed.

I am also thinking this was run from a power user in SAP. It fits. Basic users aren't going to have the high priority privileges. But you can verify this with SAP as to user priorities.

And that leaves and SAP admin. And its going to be a fight to get it out of a college.
Support Fatherhood - Stop Family Law