Operating System - HP-UX
1833060 Members
2590 Online
110049 Solutions
New Discussion

Re: Feedback required on Diagnostics Products Usage

 
Karthik S S
Honored Contributor

Feedback required on Diagnostics Products Usage

Dear All,

Our IT Group wants to find out if Administrators are making use of Diagnostics tools that are available with HP-UX.

1. What are all the tools that you commonly use to diagnose/troubleshoot a HP-UX system (like STM/logtool/offline diag). If you are using any such tool how frequently do you use them?

2. What features do you find very useful with these tools?

3. What new features would you like to be included with these tools?.

4. Post the success stories using these tools if any.

Thanks for your valuable inputs and time in advance.

Thanks,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
11 REPLIES 11
Steven E. Protter
Exalted Contributor

Re: Feedback required on Diagnostics Products Usage

1. Regularly use stm, mstm xstm xstm is to show managment red so they agree to downtime.

2. I like the ability to test hardware and know when its bad

3. Easier user interface.

4. Just used cstm to definitively figure out that my cpu was not about to fail when performance got all screwed up.

I also like EMS though I've just begun to really use it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Naveej.K.A
Honored Contributor

Re: Feedback required on Diagnostics Products Usage

Hi Karthik,

I have been using nickel to collect all the diagnostics regarding the hardware, It does create a wonderful diagnostics html page for you. Internally the nickel script is calling the cstm utiliyt only

With best wishes
naveej
practice makes a man perfect!!!
Hoefnix
Honored Contributor

Re: Feedback required on Diagnostics Products Usage

Hi,

We use stm (xstm ...) to analyse for example the messages of EMS, EMS alarms are picked up by ITO (and generates tickets).

We also use glance/top/sar/vmstat for performance troubleshouting and we are also lucky to have MeasureWare and Perfview on our systems to help tuning or predict performance trouble.

So all above is very useful for use.

Regards,
Peter
Naveej.K.A
Honored Contributor

Re: Feedback required on Diagnostics Products Usage

To just add up to my previous reply,

1. To diagnoise or troubleshoot an HP-Ux system, we constantly use top/sar/vmstat/iostat/bdf outputs to finger out the draft of where the problem could be. To diagonise on the hardware level we use mstm/cstm or the nickel as i have already told.

2. sar has helpful in giving us the activities happening to our block device files using the -d option.

3. currently not thought of this

4. Once we had a system crash, and from the cstm output we could clearly figure out, we had a defective memory module, as it was showing lot of single bit errors. REplaced the DIMM and system was up and running

With best wishes
naveej
practice makes a man perfect!!!
Robert Binkhorst
Trusted Contributor

Re: Feedback required on Diagnostics Products Usage

Hi,

1. We use cstm/glance/top/perfview/sar/vmstat /EMS regularly. We also have monitoring for proactive checks and a syslog host for reactive checks implemented. We're looking into using MC/SG Consistency software to regularly check the consistency of our clusters.

2. I like glance very much as an overview tool, plus it has great capabilities of digging deeper. I use perfview to drill down through the history of the box.

3. I'm not missing any serious stuff at the moment when using these tools.

4. Recently we've used sar/glance to diagnose a performance problem on a disk (which was starting to fail) and the disk has been replaced before it really failed.

HTH,

Robert
linux: the choice of a GNU generation
Colin Topliss
Esteemed Contributor

Re: Feedback required on Diagnostics Products Usage

- mstm, cstm, xstm (depending on whether I want to specifically query something or take a general look at health). Use this quite often as it gives you lots of diagnostic information.

- EMS. This is used to generate mail/ITO alerts. Its running all of the time. Problem with it is that it is capable of generating false alarms (depending on firmware revisions of components). We see alerts for tape problems and power problems where no such problem exists.
Also sometimes get large numbers of alerts being generated. Haven't found a way to set thresh-holds (ie only send EMS alert if you get 10 specific errors in a certain timeframe).

- ISEE. This is HP's latest offering. It automatically contacts HP when a problem occurs. Its proactive (you may find HP calling you and telling you there is a problem before you know you have one). Early version was a pain. Security may have issues as it needs to make an HTTP connection to HP. Not exactly easy to set up in that some of the questions are not very clear. If you screw up the installation, it means removing the software and starting again (at least with the earlier version). There is a bug with the current version where the cron job to prune the incidents fails (usual problem with cron - no environment gets set, and their script doesn't make use of the full path to the executable).

For general trouble-shooting, I use the plethora of tools out there - each one has its own use depending on what the problem is. Tools include:
sar
vmstat
tusc
tcpdump
top
glance
Measureware

and so on.

The features are too long to list - would take a while. Each have their own merits.

As for success stories - well, I still have a job! :-)
Ulrich Deiters
Frequent Advisor

Re: Feedback required on Diagnostics Products Usage

I never heard of *stm before. It would be nice if HP-UX could be equipped with some README files or other informations tools telling you what some of the software or services are for.

I mostly use ioscan, glance, top for local problems, and various network tools for remote troubles.
Karthik S S
Honored Contributor

Re: Feedback required on Diagnostics Products Usage

Hi Ulrich,

I agree with you. Most of the admins are not aware of STM, ODE and EMS.

Regards,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Shaikh Imran
Honored Contributor

Re: Feedback required on Diagnostics Products Usage

1) I Normaly use Xstm to diagnose/troubleshoot a HP-Unix system.I use it Almost everyday.
2) a)The Information : This will give you the information of all your hardware
b)The Excercise : This will check your hardware for the Intermitent problems if exists &
c) The verifiers : (This requires licence to run so i haven't used it)

3)ODE
4)I was able to detect the memory errors e.g. single bit error on L & N class systems.
I have replaced the DIMMS to get rid of those errors.
Also on of my cpu of L class got deactivated all of a sudden,I came to know from XSTM.
And Most important I have upgraded the firmware revision of a DLT Tape in the DLT Library successfult with Xstm

Thatâ s all I have don
I'll sleep when i am dead.
Andrew Rutter
Honored Contributor

Re: Feedback required on Diagnostics Products Usage

hi , I use either stm or cstm quite frequently to check things out and verify full functionality off systems and for testing various parts of the systems we have, and reading the infologs

Ode is also a good tool to boot if having problems.

Would be better if there were a few more test points within stm I think. In my opinion the Cpu, memory, disc and tape are all very good tests.
Peter Leddy_1
Esteemed Contributor

Re: Feedback required on Diagnostics Products Usage

Hi,

Like Peter & Colin have already mentioned we use EMS to generate ITO Alarms and then use the stm's to gain more information on the problems. I would agree though that EMS has a tendency to generate false alarms( but when managing a lot of servers(hundreds/thousands) it comes in extremely handy and very time efficient. I have seen some false power and temperature events but I would rather have a few of these than risk missing something that could be very costly to a production environment.