Server Management - Systems Insight Manager
1833187 Members
2549 Online
110051 Solutions
New Discussion

System is unreachable Notification

 
SOLVED
Go to solution
Mike Angley
Advisor

System is unreachable Notification

Is there any configuration options for "system is reachable/unreachable"? I cannot run an event notification on this due to too many false positives. I receive server is reachable just seconds/minutes after i receive server is unreachable in some cases. I would like to be able to set retries or timeout before notification.
I did see another thread about traps showing fault then no fault seconds later. I also experence this with several SCSI devices on several servers, i have to turn off monitoring on SCSI for those units. I understand this will be repaired with the new client, but i did not think the server down check relied on the client.

7 REPLIES 7
Pat Wilson
Valued Contributor
Solution

Re: System is unreachable Notification

System unreachable messages originate from the 'Hardware Status Polling' tasks. The system reachable messages originate from either the polling tasks, or when a discovered system sends an SNMP message. If the polling task is marking the system as unreachable even when it is working, check the following:

1) Check the Global Protocol Settings - check the default ping (ICMP) settings.
2) Check the Hardware Status Polling tasks, (Logs -> View All Scheduled Tasks) and verify the protocol settings. Enable ping. This means that if nothing else works, but the HPSIM server can still ping the device, the device is still 'reachable', and won't generate the first message.
Mike Angley
Advisor

Re: System is unreachable Notification

I was unaware that it was generated from the Hardware status polling task. I have reset the ping in Global settings and reset the hardware polling tasks to 15 second timeout and 5 retries. I will re-enable my notification task and see how many i get.
I will let you know how it goes.

Thanks a bunch
Mike
Pat Wilson
Valued Contributor

Re: System is unreachable Notification

Be careful with too many retries, and long time-outs. This could make your polling task run long. If you set 15secs x 5retries, that makes for 75secs per unavailable server - just using ping. Remember that the polling task will try each protocol, so if you've enabled SNMP in your polling task, and you've got time-outs and retries set for it, you might get the jobs piling up. The 'task scheduler' enters a job on the time schedule without checking to see if the previous job finished first. I ran into this problems when I had set it to run every 5 minutes, and for a while the task was taking over 7 minutes to complete.

Good Luck :)
Mike Angley
Advisor

Re: System is unreachable Notification

I currently have it set for 5 minutes, and it is taking 3 minutes, 33 sec to run. I see your point, if i lost all network connections it would take slightly over 18 hours to run just the ping check part. I am going to increase it to 10 minutes.

Since there are 3 hardware status jobs, ping, non-server status and server status, do they queue behind each other? Or do they run independly? Can any one of these report server unreachable?
Pat Wilson
Valued Contributor

Re: System is unreachable Notification

I'm not sure what you mean by a 'ping' status job. Is this the Autodiscovery?

There is an 'Initial Hardware Status Polling' task that is driven by the discovery of a system (You can see it is event/node driven), and it is this that generates the system discovered event. This task is not regularly scheduled.

There are two regularly scheduled hardware status polling jobs I see. They run independently against the list of devices identified in HPSIM as servers or non-servers (everything else ie Unknown, Unmanaged, Switch, etc), and these can be configured to use ping.

The downside to setting the status jobs to run every 10 minutes, is that a system could be unreachable for as long as 10-13 minutes before you receive notification.
Mike Angley
Advisor

Re: System is unreachable Notification

My Apologies. I created a hardware status polling job that checks against "All Systems" instead of just servers and non servers.I called this job "Hardware status ping",Since i did not know if the other 2 lists combined encompasses all of the systems. I have too many systems showing as unmanaged or unknown to leave them off the list. In some cases i can not repair the unmanaged/unknown due to cross domain/firewalls/companys/dns structures, but still need to know when the system goes down. If servers/non-servers encompasses all systems i can delete this job and rely on the other 2.
Pat Wilson
Valued Contributor

Re: System is unreachable Notification

You should be OK with the default two. Nice thing is you can set different execution frequencies for the two jobs.

Let me know how it goes !!

:)