Operating System - HP-UX
1848631 Members
5024 Online
104033 Solutions
New Discussion

Re: Monitoring mirrored disk failures

 
SOLVED
Go to solution
marko asplund
Frequent Advisor

Monitoring mirrored disk failures

i've setup our HP-UX 11i v1 system to mirror disks using MirrorDisk/UX. i've also made a hot spared disk available.

now i'd like to have the system automatically monitor and report disk failures. how is this typically done on HP-UX? does HP-UX include tools for this?


br. aspa
17 REPLIES 17
Henk Geurts
Esteemed Contributor

Re: Monitoring mirrored disk failures

you can use diagnostics
swlist:
OnlineDiag B.11.11.13.14 HPUX 11.11 Support Tools Bundle, Dec 2003
and set up EMS trough SAM >resource management>event monitoring service to configure monitoring for the disk (among other configurable items.

regards.
Henk Geurts
Esteemed Contributor

Re: Monitoring mirrored disk failures

look here :
http://docs.hp.com/en/diag.html

check EMS.

regards.
marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

when i try to start Resource Management / Event Monitoring Service from sam i get the following error:

Client connect failed. An error occurred while trying to set up a socket: Connection reset by peer.

any idea on what might be causing this? i tried browsing through the Diagnostics docs but couldn't find any help there.

the system came pre-installed with OnlineDiag B.11.11.15.13.
Henk Geurts
Esteemed Contributor

Re: Monitoring mirrored disk failures

Marko
check your system with the replies in this thread :

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=206322

let me know.
marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

access to the EMS registrar service was not allowed. i changed this through sam / Networking and Communcations / System Access / Internet Services and now i'm able to access the EMS gui.

now i'm having problems adding monitoring requrests. when i try to add a request e.g. for /storage/status/disks i get the following error messages:

The selected monitor is not currently available.

No resource instances of /storage/status/disks found.
Henk Geurts
Esteemed Contributor

Re: Monitoring mirrored disk failures

any output from

# resls /


?
Henk Geurts
Esteemed Contributor

Re: Monitoring mirrored disk failures

from sam >rm >ems > add /storage/status/disks

is there a group default ?
if do select it and you'll see the hardwarepath to the disk...
and you ll be able to configure the monitor..

can you post output from
ps -ef|grep resmon
?

marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

here's the output from 'resls /':

Contacting Registrar on foo.bar.net

NAME: /
DESCRIPTION: This is the top level of the Resource Dictionary
TYPE: / is a Resource Class.

There are 7 resources configured below /:
Resource Class
/system
/StorageAreaNetwork
/adapters
/connectivity
/cluster
/storage
/net
marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

i tried installing a monitoring request for /system/filesystem/availMb/home and that seemed to work fine.

i'm not sure if i understand what you mean with the default group part. when i try to add a monitoring request and navigate to /storage/status/disks i get the following error message immediately after selecting (double clicking) disks:

The selected monitor is not currently available.

No resource instances of /storage/status/disks found.


here's the process list output:

foo% ps -ef|grep resmon
root 22483 1 0 10:30:19 ? 0:00 /etc/opt/resmon/lbin/p_client
root 22480 1 0 10:30:19 ? 0:00 /etc/opt/resmon/lbin/emsagent
root 25828 25707 0 11:39:11 pts/1 0:00 sh -c LANG=C LC_ALL=C /opt/resmon/bin/emsui /opt/resmon/bin/EMS
root 25831 3522 0 11:39:11 ? 0:00 /etc/opt/resmon/lbin/registrar
root 25829 25828 0 11:39:11 pts/1 0:00 /opt/resmon/bin/emsui /opt/resmon/bin/EMSconfig.ui
root 25955 3522 0 11:42:04 ? 0:00 /etc/opt/resmon/lbin/registrar
aspa 26156 21143 1 11:47:05 pts/1 0:00 grep resmon
root 23633 1 0 10:44:47 ? 0:00 /etc/opt/resmon/lbin/mibmond
root 25957 25831 0 11:42:09 ? 0:00 /etc/opt/resmon/lbin/fsmond
root 23635 1 0 10:44:47 ? 0:00 /etc/opt/resmon/lbin/krmond
Henk Geurts
Esteemed Contributor
Solution

Re: Monitoring mirrored disk failures

try this (not 100 % sure but this cant hurt either)

cd /etc/opt/resmon/lbin
./monconfig
a (add)
select the number of /system/status/disks/default

3 3 5 c y
enable

then try to configure ems trough sam again.

marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

after doing the monconfig thing i'm now able to configure monitoring requests for the disks.

thanks very much for your help Henk!
Devender Khatana
Honored Contributor

Re: Monitoring mirrored disk failures

Hi,

Apart from EMS monitoring you can also check for stale extents periodically. Just run the command or automate it through cron.

#lvdisplay -v /dev/vg*/lvol* |grep stale

If the output is zero you are safe. If it greps stale means problem.


HTH,
Devender
Impossible itself mentions "I m possible"
Andrew Merritt_2
Honored Contributor

Re: Monitoring mirrored disk failures

Hi Marko,
What you've set up, following Henk's instructions, is EMS Resource Monitoring. This will tell you when the disk becomes too full (or whatever you've configured).

There are also the EMS Hardware Monitors, which do something much closer to what you were asking for, which is reporting failures on the disk. If the OnlineDiags are installed, this will be happening by default.

The status of these can be checked by using 'monconfig'; for disks look under the heading "/storage/events/disks/default" for the device paths which are being monitored. (Note that the Hardware monitor paths have the second element as "events", where as the Resource Monitoring has "status".)

I would recommend installing the latest version of the OnlineDiags, A.47.00 for 11.11, plus the latest patch for that version, which is PHSS_32492, in order to have any known problems fixed.

Andrew

marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

the OnlineDiag package revision on my system is B.11.11.15.13 which apparently corresponds to STM version A.47.00. i installed PHSS_32492 as per your instructions.

it is the disk hardware failure events i'm planning on monitoring.

i'm a bit confused here, however. when i check the monitoring request parameters through SAM / Resource Management / EMS i see a monitoring request for /storage/status/disks/default/0_1_1_1.2.0 for example with the notify parameters: 'When value is Not Equal UP (0)'. When i click 'Show Instance Description' i get the following explanation:

UP:
A state of "UP" means the hardware is available for use.

DOWN:
A state of "DOWN" means the hardware is unavailable for use. The /opt/resmon/bin/set_fixed command must be used with the resource name once the hardware is fixed to reset the hardware to the "UP" state.

UNKNOWN:
A state of "UNKNOWN" means it is unknown whether the hardware is unavailable for use or not. The hardware should be considered DOWN.


Is the system now monitoring hardware failures or available space in the file system?

Andrew Merritt_2
Honored Contributor

Re: Monitoring mirrored disk failures

Hi Marko,
The status of the disk is another thing that can be monitored by the Resource Monitors, as well as available disk space (I did say "or whatever you've configured"). I'd forgotten about the status monitoring details as I'm mostly interested in the HW monitors. Sorry for any confusion.

This status is based on what is reported by the EMS HW monitor ('disk_em' is the monitor for disks). When the disk monitor reports a Critical level event for a disk path, the status of the device changes from UP to DOWN, and you'll get the notification you configured in SAM. You'll need to set the status back to 'UP' manually, using set_fixed, after replacing any failed disk.

The HW monitor will report other, less severe, events, which you may also want to monitor; the output of 'monconfig' shows where the events of different severity are logged to. If you're only interested in a complete disk failure, then what you've set up in SAM will do the job.

Andrew
marko asplund
Frequent Advisor

Re: Monitoring mirrored disk failures

thanks very much for your valuable note, Andrew.
Steven E. Protter
Exalted Contributor

Re: Monitoring mirrored disk failures

The attached checkhardware script will let you know when any disk, mirrored or otherwise is in trouble. Its a backup to EMS.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com