Operating System - Tru64 Unix
1752579 Members
3887 Online
108788 Solutions
New Discussion юеВ

Help with performance analysis

 
SOLVED
Go to solution
OFC_EDM
Respected Contributor

Help with performance analysis

I'm working on a Tru64 OSF1 V5.1 2650 alpha server.

There's an application on the server which generates 3 types of reports. 1 of the reports now times out.

We're trying to identify where the problem lyes.

I'm not used to analyzing performance on Tru64 (I have an HP-UX background) and was wondering:
1) where to start :)
2) What tools are available to collect data
3) What are the types of bottlenecks on Tru64 which could cause a process to time out? (aka run queue, gbl mem cache etc equivalents)

Any advice I could get on this would be appreciated.

We suspect it's an App problem as I'm seeing the following while the report is running.

Low End
==========================================
load averages of 0.04, 0.03, 0.03 in top
CPU states: 0.0% user, 0.0%nice, 7.3% system, 92.6%idle
Memory Real 1051M/1979M act/tot Virtual: 3143M use/tot Free:751M


High End
==========================================
load averages of 1.03, 0.61, 0.38 in top
CPU states: 50.1% user, 0.0%nice, 1.2% system, 48.5%idle
Memory Real 1053M/1979M act/tot Virtual: 3143M use/tot Free:649M

So I'm wondering if there's any queue/kernel thresholds/etc. in Tru64 I should be looking at to indicate if the problem on at the OS.

Cheers
The Devil is in the detail.
13 REPLIES 13
Steven Schweda
Honored Contributor

Re: Help with performance analysis

What do you mean by a process timing out?
Does it die unexpectedly? Does it get hung
up somewhere, and stop using CPU? Does it
get into a loop and use all the CPU? What,
exactly, happens?
OFC_EDM
Respected Contributor

Re: Help with performance analysis

The report is generated by a custom app and has vendor support.

The only information provided, which is a point of frustration, is that 1 of their 3 reports is timing out....and that's it. The vendor told the client it's an OS issue...we don't know how they concluded that...and now we have to show if it is or is not an OS issue.

They haven't even provide the name of the process.

All I have is the output from top in 1 minute increments from a window of time when they ran the reports.

I'm hoping to get them to re-run the report so that I can collect more detailed data which I can then analyze and hopefully find something useful.

But I have to figure out the appropriate tools to collect the information and which information will most likely identify an issue.

Very poor position to be in...

The Devil is in the detail.
Hein van den Heuvel
Honored Contributor
Solution

Re: Help with performance analysis


>> Tru64 OSF1 V5.1 2650 alpha server.

Clustered?

>> 1 of the reports now times out.

There is no such thing as 'report timeout' in Tru64. Nor in any OS for that matter.
If there is a timeout, then that is an application choice and its decision needs to be reviewed. With any luck the comments give a clue.

> We're trying to identify where the problem lyes.

Then see where the timeout is generated and try to describe in functional terms what was being waited on... for data to come in over the network?

>> I'm not used to analyzing performance on Tru64 (I have an HP-UX background) and was wondering:
1) where to start :)

Do NOT look at the OS.
Think application.

>> 2) What tools are available to collect data

Uh... "collect". Seriously. Comes with reasonable GUI.
Also 'sar', and folks often toss commands like vmstat, iostat into a data collection environment.

My favourite for 'live' watching is a 'freeware' tool called 'monitor', but it does not collect.

>> 3) What are the types of bottlenecks on Tru64 which could cause a process to time out? (aka run queue, gbl mem cache etc equivalents)

NONE. There is no such concept.

The biggest performance surpise Tru64 packs which other systems do not have is high write (allcoation) activity from a cluster member to a mountpoint which an other member owns (cfsmgr).

The biggest slow down on Tru64, and every other Unix on the planet, can present is pretty much the old running low on memory, and the swapper sucking all live out of the system disk. (vmstat 100 100)

>> Any advice I could get on this would be appreciated.

Engage a consultant to help you get onto speed in this new (to you) environment. (I know a few good ones :-).

>> We suspect it's an App problem as I'm seeing the following while the report is running.

Yes. Try to understand the application tasks. What is it reporting on? Where does the data come from? FTP? File searches? Oracle?

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting



OFC_EDM
Respected Contributor

Re: Help with performance analysis

Thank you for that reply.

The only additional information I can provide is that:
1) It's not clustered
2) The app and the database are local on the host. So it "shouldn't" be a network issue.

I'm going to arrange another run of the report and collect more data with the "collect" utility and vmstat.

Best Regards,
Kevin
The Devil is in the detail.
OFC_EDM
Respected Contributor

Re: Help with performance analysis

Steven...

The CPU goes up to about 50% used for about a minute then all goes back to normal.

That's what I see from the OS side.

The "timeout" is terminology used by the client. I don't know whether that means it hangs and they give up or if it dies.
The Devil is in the detail.
Hein van den Heuvel
Honored Contributor

Re: Help with performance analysis

A tracing syscalls (truss, trace) has often given me clues fro unknown applications.
Allthough it can be tedious, often generates reams of data, and sometimes, and may influence the measurement, it can be surprisinglhy helpful.

Maybe the Database has performance/usage reporting tools?

Hein.

Kapil Jha
Honored Contributor

Re: Help with performance analysis

Hey Kevin,
As told above it is not at all OS issue ask vendor how did they conclude that.
They must be having some ground for that.
CPU being 50% does not mean it crashed the application ,although it may be the cause and your vendor is supposed to prove it.
They must be having some traces or logs which i suppose UNIX people may not be able to fully understand it.
Hope this help.
BR,
kapil
I am in this small bowl, I wane see the real world......
OFC_EDM
Respected Contributor

Re: Help with performance analysis

I put this issue back to the vendor. With a list of questions for them to address...

Waiting to see what comes of it.

For those waiting for points I'll do that once this is resolved.

Thanks for the help so far....your input has helped me move this forward.

Cheers
The Devil is in the detail.
Hein van den Heuvel
Honored Contributor

Re: Help with performance analysis

>> For those waiting for points I'll do that once this is resolved.

But I needs those points NOW

1) Buddy Dennis Handly is at 19,500+ today and moving quicky. I'd like to beat hime to 20,000.

2) I'm in Las Vegas this week amongst several forum participants. I'll have reduced time to try to help out, but if I could make it there this week... that would garantuee a few free beverages!

Just kidding!
Points just happen and are not critical.
And free drinks and food also just happens during the HPTF.

Cheers,
Hein