ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

TommyL
Occasional Contributor

Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

Hello all!

I have a recurring issue with a Domain Controller that has been a PITA for some time now. The DC is a Server 2012 R2 Standard on an HP ProLiant DL360p Gen 8 hardware platform. 

The system will randomly become unreachable by other systems and our Active Directory will begin sending out alarms that Replication has been interrupted and our NOC will get a notification that the system has gone down.  When we logon to the DC through the Out-of-bounds network, we can see that the OS is still up, but there is nothing being sent or received through the NIC.

In checking the System Event Logs just prior to rebooting (when the system becomes unreachable), we see about 330 Warning Events with Event ID 16002, Source: AFD

"Closing a UDP socket with local port number [55048-54920] in process 1044 is taking longer than expected. The local port number may not be available until the close operation is completed. This happens typically due to misbehaving network drivers. Ensure latest updates are installed for Windows and any third-party networking software including NIC drivers, firewalls, or other security products." 

The port range listed in the Event Description isn't even a range that we use.  I have a NetStat monitor running and none of those ports show up in the monitor logs before, during or after the event.

Further, I have run DCDIAG and BPA on the system.  DCDIAG yielded nothing useful (all Tests passed) nor did BPA (Nothing unusual or that is not on another system that isn't affected by this issue).

The NIC drivers were originally Microsoft and we installed HP drivers on the NIC to see if that would help.  It hasn't.

Also, since I don't know the cause of the issue, I cannot manually recreate the error but simply have to wait until it happens again.

Has anyone else seen this issue and if so, how did you remediate it?

 

7 REPLIES
AlecKeeler
Advisor

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

Have you tried updating the firmware drivers using the HP SPP (ServicePack for Proliant) DVD,

http://h17007.www1.hpe.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx

you can reboot the server from the DVD and it will check using SUM to see if any of the firmware on the various devices have outstanding updates.

Just be aware that you should have access to the console when using the HP SPP DVD to boot as it pops up a menu  that asks if you want to do an automated or interactive update and if you don't choose interactive within 30s it proceeds to update everything it can automatically

Alec

Torsten.
Acclaimed Contributor

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

First of all I would check if the BIOS is up to date.

 

Fix in 2015.07.01:

 

Problems Fixed:

Addressed an issue where a device interrupt may not be handled properly and result in a lost interrupt or an uncorrectable machine check exception. This issue is NOT unique to HP servers. HP recommends that users experiencing these issues update to this revision of the System ROM before replacing any hardware components.

 

http://h20566.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=5194968&swItemId=MTX_4a66f6bc8f0948368f65aa1dfd&swEnvOid=4103#tab4


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
AlecKeeler
Advisor

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

The HP SPP DVD will also update the BIOS to latest as well, if run in interactive mode you can see what it intends to update and you can choose what you want to let it do or not

Alec

TommyL
Occasional Contributor

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

I updated the BIOS to the latest version (July 2015) this past Friday.  No dice.  I had another event yesterday with the same issues and same symptoms.

NJK-Work
Honored Contributor

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

Did you ever get this resolved?  I am having the same problem on GEN8 servers (DL380p).  The event is AFD 16002 and we see 10's of them in the system log of the server (Windows Server 2012 R2 Standard) when the problem happens.  The only solution is to reboot that we have found.

I am hoping to find a permanent solution (firmware, driver, etc), but if no permanent solution is available, I am hoping for a short term fix that does not require a reboot (for example is there a service that can be restarted or disabling/re-enabling a driver).

Thanks

NK

Jimmy Vance
HPE Pro

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

Anything listed in the iLO or IML logs?




__________________________________________________
No support by private messages. Please ask the forum!      I work for HPE

If you feel this was helpful please click the KUDOS! thumb below!   
NJK-Work
Honored Contributor

Re: Proliant DL360p Gen8 becomes unreachable. Getting Event ID 16002 prior to crash.

No, I am not seeing anything at the hardware level in terms of logs or alerts.  The only thing I see is that the server is up and running but stops communicating on the network - I use the iLO to get to it and reboot.  Once rebooted, I see AFD 16002 errors in the Windows logs during the time of the outage.

I reviewed the revision list for firmware and drivers for this model, and I don't see anything that resembles a fix for the symptoms I am seeing.  These servers are NOT up to the latest firmware/drivers - but they are pretty critical so I dont want to schedule an outage to update them if I dont know for sure a specific driver/firmware will fix the issue.

I am having our networking team review the speed/duplex on the switch ports to see if maybe there is a mismatch there.  The servers are set to AUTO...does anyone have a recommendation on this?  Is AUTO good (assuming switch is at AUTO) or should I be hard-coding GIG FULL at both ends?

Thanks

NK