Server Management - Systems Insight Manager
1847225 Members
3813 Online
110263 Solutions
New Discussion

Blackscreen of Death

 
Annette Jones_2
Regular Advisor

Blackscreen of Death

Anyone experiencing Proliant servers that just black screen? Even ping fails whilst it is in this state.

Interestingly HPSIM 4.1 failed to alert on this server, we found out when we got in this morning. HPSIM seems to be very hit & miss when it comes to system unreachable alerts.

Anyone got any ideas on how to improve both of the above?

Also the CMS at the time of the BLKSOD failed to alert, but it did have the following stderr in the task, anyone know what causes this problem? When the offending server was bouncde the CMS alerted ok?

Thanks A
6 REPLIES 6
Rob Buxton
Honored Contributor

Re: Blackscreen of Death

If it goes into that state again, trying Pinging the Server.
It may be that there's enough of the TCPIP stack functioning to fool HPSIM.

No idea what's wrong with the task though.
Annette Jones_2
Regular Advisor

Re: Blackscreen of Death

Rob, I will but it tends to happen at unreasonable hours, but I know when we have dialed in to look at these systems 'ping' does not work.

Ping is a sure way of testing the state of the server.

A
Rich Purvis
Honored Contributor

Re: Blackscreen of Death

Just a question, as I am really not sure what the issue is. Are you running with Automatic Server Recovery enabled? If you are, is it not doing a reset of the server when you get in this state?

-Rich
Why does my tivo keep recording Nickelodeon?
Annette Jones_2
Regular Advisor

Re: Blackscreen of Death

Rich

We have disabled ASR, the question really is whu HPSIM didn't alert that the server was unreachable for at least 7 hours?

The hardware status task polls every 10 mins, and the server is not pingable whilst in this state.

A
Rob Buxton
Honored Contributor

Re: Blackscreen of Death

Then yes, it should pick up the outage.

Is the Server recognised in HPSIM as a Server, I ask as typically there's a Server and Non-Server polling task. They have different polling intervals.

Check the Server Hardware Polling task. I've split mine up. I've changed the default to NOT use ping and added a new, more responsive task (every minute) Ping Only Hardware Polling task.
But check what protocols it is using.

You may need to try ping the server from the CMS during the problem. A duplicate IP Address perhaps?
Annette Jones_2
Regular Advisor

Re: Blackscreen of Death

Rob,

Yes it's recognised as a server, the daft thing was as soon as it was reboot HPSIM reported it as reachable.

Sytem Unreachable alerts are generated from the Hardware Polling task is that right?

If that's the case if I create my own ping task and disable the Default Polling task, will this then be used to generate the alerts via HPSIM? I did create a task but I couldn't see how to ensure that HPSIM used that new task to generate the alerts.

Thanks A