appropriate means to check for disk failures

Ralph Grothe · ‎11-01-2000

Hi,

though looking back over only a short period since I've been set loose on a couple of HP-UX servers, I already had to experience some disk failures/replacements. :-(

Therefore I now run regularily a script that includes a statement similar to the line below.

for lv in `vgdisplay -v|awk '/LV Name/ {print $NF}'`;do lvdisplay -v $lv|grep -i stale;done

But I doubt this is sufficient to get notice of disk outages.
(n.b. the disks are either mirrored or in disk arrays)

Any suggestions of a better way to check?

Maybe add a dd to /dev/null from the devices?

Regards
Ralph

Madness, thy name is system administration

Stefan Farrelly · ‎11-01-2000

Use STM (XSTM). This is proper diagnostic tool for logging all hardware problem. Once youve got it running (if installed - its called DIAGNOSTICS on the Support Plus CD) go to TOOLS -> UTILITY -> RUN Select LOGTOOL and view the raw or formatted summary log for lists of all hardware problems. If a disk is on its way out errors will be logged here. This should be the first indication of an impending disk failure. The other piece of software to installed is PREDICT. This reads the hardware logs and analyses them and sends an email to root recommending you replace a piece of hardware because its heading for failure.

Im from Palmerston North, New Zealand, but somehow ended up in London...

Victor BERRIDGE · ‎11-01-2000

I agree with Stephan,
Therefore you can see where the logs are, and after just be aware of the growth meaning to have a look with stm in detail, I wonder what happens in stm if you have redundancy such as 2 controllers and alternate links, will it react like 2 procs:
I had the case of a K360 that rebooted one week-end some time ago, and looking in stm found nothing wrong, all was green, till I realised it "autoconfigured" itself beeing mono processor... and so had nothing wrong with the hardware..

Regards
Victor

Devbinder Singh Marway · ‎11-01-2000

you can check in your script for the number of mirrored copies ( check each filesystem
lvdisplay command ) and if mirrored copies is 0 then send out an alert .

Seek and you shall find

James R. Ferguson · ‎11-01-2000

Ralph:

The Predictive Support module that Stefan mentioned can be configured to email you (root) AND to email or modem transfer HP engineering a notification of a failure or degradation. System checks are generally configured to run nightly. By using an internal, dedicated modem for Predictive Support, Predictive rule sets will be automatically updated (pushed to you). This allows Predictive to alert you to refresh things like disk firmware as it compares your version to the latest standard.

You can also easily install the EMS (Event Management System) monitoring tool from the Support CD. This too performs "health check" functions.

For Predictive Support see:

http://docs.hp.com/hpux/onlinedocs/H2571-90009/H2571-90009.html

For EMS see:

http://docs.hp.com/hpux/onlinedocs/B6191-90020/B6191-90020.html

...JRF...
...JRF...

Ralph Grothe · ‎11-01-2000

Thank you all for your suggestions.

I have to apologize for my belated response, but was busy with other stuff meanwhile.

To Stefan,
I have the diagmond running on my servers and actually used stm a couple of times in the menu mode.
The other day when a disk silently passed away (maybe it hadn't been silent, and I just missed the device's epitaph owe to my incompetence) and I was remotely guided through mstm by an HP supporter on the phone, the defunct device wasn't accessible at all, even from stm.
A few days ago I posted a question for the invocation of stm from a script and was instantly served with an answer.
About the PREDICT software.
Is this standard HP fare or a third party tool (under GPL)?

To James,
I also had EMS already installed but unfortunately wasn't included in the hp-ux admin mailing list in the aliases file of our MDA.
Now it occasionally happens that I'm flooded by EMS-mails from other HP-UX servers of our domain which I am not administering.
To overcome this I yesterday installed procmail in my $HOME on the MDA and will soon be setting up rules that pipe into according Perl scripts which I still have to write.
On CPAN I also discovered a Net::SMS module which comes just handy to write a filter to send alarms to my mobile.
Due to lack of time I haven't been yet able to follow the links you posted, but will do this later.

To all,
many thanks for your efforts.

Madness, thy name is system administration

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

appropriate means to check for disk failures

appropriate means to check for disk failures

Re: appropriate means to check for disk failures

Re: appropriate means to check for disk failures

Re: appropriate means to check for disk failures

Re: appropriate means to check for disk failures

Re: appropriate means to check for disk failures