Insight Remote Support
cancel
Showing results for 
Search instead for 
Did you mean: 

HPSIM RSP issue

Mike Dunphy_1
Occasional Contributor

HPSIM RSP issue

Hello,

All of our servers were being monitored with ISEE and EMS. It catches
hardware failures just fine. We have tried unsuccesfully to switch to HPSIM
and RSP with SFM.

For the 11.11 servers we ran into the sfm module degrade which
there is a workaround but I still have not implemented as I was wanting
to test out the 11.23 servers which I had switched to SFM. And I realize as
well that 11.00 is not supported. Anyways I know that we have till June 2009
to get this to work, so I had started out by installing all the stuff for our
11.23 servers and enabling SFM.

To make a long story short one of our 11.23 servers suffered a
failed internal disk drive. I incorrectly assumed that by following
all the instructions that HPSIM/RSP/ISEE would all just work. I was
waiting for just such a event like this to prove that is indeed working
so that I could change to SFM on the rest of our servers.

dmesg reported

snip ..................
SCSI: Request Timeout; Abort Tag -- lbolt: -1744923218, dev: 1f026000, io_id:
2b91c17

SCSI: Async write error -- dev: b 31 0x026000, errno: 126, resid: 8192,
blkno: 26101768, sectno: 52203536, offset: 26728210432, bcount: 8192.

SCSI: Async write error -- dev: b 31 0x026000, errno: 126, resid:
8192,
blkno: 24197424, sectno: 48394848, offset: 24778162176,
bcount: 8192.
WARNING: VxVM vxio V-5-3-0 voldsio_timeout: Timeout value 240
seconds, actual io time 240 seconds, I/O Timedout, bad disk?
WARNING: VxVM vxio V-5-3-0 voldio: I/O Timedout on disk
c2t6d0, 1 I/Os hung on the disk!
WARNING: VxVM vxio V-5-3-0 voldio: I/O hung; disallowing all
I/Os to the disk.
WARNING: VxVM vxio V-5-0-151 error on Plex optvol-02 while
writing volume optvol offset 5270384 length 16
WARNING: VxVM vxio V-5-0-4 Plex optvol-02 detached from volume
optvol

snip ..................


However the SFM never reported a incident and it never showed up on the
HPSIM server under Service events. A fellow admin of mine even pulled
the drive out. With ISEE/EMS running we would get a call from a TAM
within hours of such an event.

I do know that SFM needs to be monitoring the devices and it
is ...

/opt/sfm/bin/sfmconfig -w -q
EMS hardware monitors are disabled & SysFaultMgmt is
monitoring devices.


I also tried to send a test event

/opt/sfm/bin>> sfmconfig -t -a
Sending test event for fpl_em monitor.
Sending test event for ia64_corehw monitor.

and apparently it sent 2 test monitors into thin air. I cannot
find them on HPSIM -> HP Service events.

/opt/sfm/bin>> sfmconfig -m list -t ALL
Filter Name : General Filter
Filter Type : HP Defined Filter
Filter Unique Identifier : 1
Filter Query : Select * from HP_DeviceIndication
Filter Query Language : WQL
Filter Source Namespace : root/cimv2
Filter Description : General Device Indications.
Filter State : Enabled Filter State
Filter Last Operation : No Operation
========================================================

So my questions for any of you all if you can help me is.

1. Is there a way to view any and all SFM events and to be
able to tell that yes indeed it saw the event and a message was sent
somewhere.

2. If the message is being sent, where is it being sent to
and do I need to configure something on the client so that it is talking to
our HPSIM server with the RSP module ?

3. Where are the "sfmconfig -t -a" going ?

4. Apparently all our 11.23 servers now are broke since the
monitoring is not working, should I switch back to ISEE/EMS or is there
something I can quickly test/check either on the client side or the
HPSIM side that I can do to verify that hardware monitoring is indeed
working. The servers are all entitled and everything looks green but it doesn't
work.

Thanks for any help in this matter.

Regards
-mjd

p.s. some more outputs of stuff from the server

(hxsdev1) /opt/sfm/bin>> cimprovider -l -m SFMProviderModule
CPUProvider
CPUStatusProvider
EMSWrapperProvider
SFMIndicationProvider
EventIndicationConsumer
MemoryProvider
MemoryStatusProvider
DiskProvider
DiskStatusProvider
StateChangeIndicationProvider
ChassisProvider
CoolingStatusProvider
PowerStatusProvider
ThermalProvider
VoltageProvider
MPProvider
MPStatusProvider
FirmwareRevisionProvider
HPUX_ControlProvider
FMDProvider
HealthStateProvider
EMArchiveConsumer
EMEmailConsumer
emdprovider
SubscriptionConfigAssociationProvider
ThrottlingConfigInstanceProvider
WBEMToEMSConsumer
(hxsdev1) /opt/sfm/bin>> cimprovider -l -s
MODULE STATUS
OperatingSystemModule OK
ComputerSystemModule OK
ProcessModule OK
IPProviderModule OK
DNSProviderModule OK
NTPProviderModule OK
NISProviderModule OK
SDProviderModule OK
IOTreeModule OK
HP_VParProviderModule OK
HPUXLANProviderModule OK
HP_iCODProviderModule OK
HP_iCAPProviderModule OK
HP_GiCAPProviderModule OK
SGProvidersModule OK
HPUX_ProviderModule OK
HPUXLVMProviderModule OK
HP_UtilizationProviderModule OK
HP_NParProviderModule OK
HPUXRAIDSAProviderModule OK
HPUXSCSICSProviderModule OK
HPUXSCSIProviderModule OK
HPUXFCCSProviderModule OK
HPUXFCIndicationProviderModule OK
HPUXFCProviderModule OK
SFMProviderModule OK
1 REPLY
NMCI Group
Advisor

Re: HPSIM RSP issue

One thing you should look at is:
evweb subscribe -L -b external

This will list your WBEM subscriptions. If you successfully subscribed from your CMS you will see 2 or 3 subscriptions with it's hostname/IP address. A major one you want to see starts with HPWEBES_ipaddress....from HP_AlertIndication....

That is your WEBES subscription which will forward those events to WEBES on the CMS. If that doesn't exist then it didn't subscribe properly.

Aaron