ProLiant Servers (ML,DL,SL)
1825002 Members
2756 Online
109678 Solutions
New Discussion ī„‚

iLO 5 SNMP read stops working after several hours

 
alexmern
New Member

iLO 5 SNMP read stops working after several hours

Hello,

I'm using Zabbix 4 LTS with its standart template for HPE iLO for server monitoring. Every sensors reading working well, but after some time (about 10-12 hours after reset) iLO stops to respond. This problem appears on several ProLiant DL560 Gen10 with iLO 5 (1.35). Only iLO reset helps.

Anyone using Zabbix SNMP for monitoring?

 

20 REPLIES 20
Jazz_ISS
HPE Pro

Re: iLO 5 SNMP read stops working after several hours

Hello, 

The reported issue could be a compatibility issue with your 3rd Party software.Again since HPE has not tested Zabbix with HPE ILO hence I can't confirm the same .
HPE iLO for ProLiant support delivery of SNMP server agent alerts as well as internally generated management processor alerts, to a management console such as HPE Systems Insight Manager
(HPE SIM), insight control, OneView.

HPE Integrated Lights Out (iLO 5) for Gen10 Servers - Configuring iLO Management Settings:
https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us&page=GUID-8FE3C00D-92C9-4801-A5EE-3E283932CDEB.html

ILO5 User Guide:
https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us

 Regards, 

Jazz

[Moderator edit: Updated the broken links.]


I work for HPE

Accept or Kudo

Bart Van Cutsem
Advisor

Re: iLO 5 SNMP read stops working after several hours

Yes, running Zabbix 4 LTS with a DL380 Gen10 en iLO5 firmware v1.37. Here, iLO regularly stops responding to SNMP requests but recovers, after a few days it stops responding completely and needs to be reset.
Bart Van Cutsem
Advisor

Re: iLO 5 SNMP read stops working after several hours

Also monitoring a bunch of DL360p Gen8 and DL380 Gen9 servers, no issues (with iLO4).

tivapi
Occasional Visitor

Re: iLO 5 SNMP read stops working after several hours

Hello,

I meet the same issue with ILO 5 / firmware from 1.20 to 1.37 including /

Steps to reproduce:

1.) Log in into ILO on Gen10 server with ILO v5 firmware from 1.20 to 1.37 including

2.) Go to Management

3.) In SNMP Settings under Read Community 1 place community name to the desired name, for example: public

4.) Under Status - double check if SNMP server is enabled - default SNMP service is enabled on port 161

5.) Perform snmpwalk -v 1 -c public ILO5_IP-address sysName.0 once every hour

I have also tried as suggested here: https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us instead of filling only read community 1 - to include all values in SNMP settings / system location, system contact, system role, system role detail, read community 1, read community 2, read community 3 /.

After a couple of hours SNMP stops responding on port 161. Tested with version 1 and version 2. Only resetting ILO helps to restore the connection to the SNMP server.

In the last 10 years I am collecting server health information via SNMP (v1 or v2 running on port 161 with community name public) . Most of my servers are DL380 / from G6 to G10 incl/ . I meat this issue only on Gen10 servers which are with ILO v5. I believe it is non sense which are 3rd party softwares I'm using, since it is not related to why SNMP service stops reponding.

My Gen10 servers arrived with firmware 1.20 and I upgraded firmware to version 1.37.  I did an upgrade only on ILO firmware. However, from version 1.20 up to version 1.37 SNMP service stops working after a couple of hours.

I've already tried to configure remote syslog and SNMP traps to check what is going on. Unfortunately no related information regarding why SNMP servers stops responding.

If you need any other information, please let me know.

Could you please suggest how to debug why integrated SNMP server in ILO v5 stops working ? (I've already tried to collect information via remote syslog and traps, but no related regarding why SNMP server stop working)

Thank you

---

Edit: Forgot to mention that I'm using ILO Dedicated Network Port

 

[Moderator edit: Updated the valid link.]

RMSTO
Occasional Visitor

Re: iLO 5 SNMP read stops working after several hours

Hello,

we have the same problem here.


ProLiant DL380 Gen10
Server System ROM: U30 v1.42 (06/20/2018)
ILO Firmware Version: 1.37 Oct 25 2018

ILO SNMP v2 public

ILO stops responding to SNMP requests after a few hours or days. Only resest of the ILO restores the SNMP connection.

Zabbix itself has no own snmp implementation it uses the default Linux/Unix net-snmp utils for accessing the ILO SNMP. If Zabbix did not get values, it is also impossible to get values by snmpwal or snmpget or any other SNMP MIB Browser.

Seems to be a problem of the ILO Board.

CU

Thomas

HPYEEE
New Member

Re: iLO 5 SNMP read stops working after several hours

I have the same problem with snmp at ProLiant DL380 Gen9 iLO 4.  Please, fix it!

Jason_E
Occasional Visitor

Re: iLO 5 SNMP read stops working after several hours

Uncheck 'Use bulk requests' for the snmp settings for the host in Zabbix.

Hyper-Bob
Frequent Visitor

Re: iLO 5 SNMP read stops working after several hours

BUMP...

HPe can we please have a solution to this issue.

I have a number of gen9 DL380's which function completely fine using SNMP sensor checks in PRTG.

PRTG monitoring has the same issue being described here, I have 3 new Gen10 DL380's which have the exact symptom.

All firmware is the latest, snmp checks have been setup using version 2 and version 3 snmp without a suitable result.

HPe have even gone to the degree of replacing one of the iLO integrated system boards, which previously resolved the issue until the latest SPP firmware release.

 

 

archspangler
Occasional Visitor

Re: iLO 5 SNMP read stops working after several hours

HPE,

Any resolutions to this?  I am seeing the same problem with ilo5 1.37.

baber2
Occasional Contributor

Re: iLO 5 SNMP read stops working after several hours

This is spending more than 2 years and the problem still exist on G10 and I met this issue with ILO 2.65 .

Is there any solution ?

Sunitha_Mod
Honored Contributor

Re: iLO 5 SNMP read stops working after several hours

Hello @baber2

Thank you for reaching out to us! Since you have posted in an old topic and there is no response yet, I would recommend you to create a new topic using the create "New Discussion" button, so the experts can check and help you further. 

Smartyp4nts
Visitor

Re: iLO 5 SNMP read stops working after several hours

Hey Baber,

I had similar issues with item not refreshing and others refreshing randomly. I figured that in the "HP iLO SNMP" template in Zabbix, all the items have a Preprocessing setting "Discard unchanged with hearthbeat". This setting prevent items from being shown as refreshed in the latest data if they are unchanged.

After removing that parameter from all the discovery item prototypes, it fixed it for me. I personnaly prefer to change the updates intervals and store all the data then having missing data. It makes nicer graphs.

Let me know if it helps.

placka
Regular Visitor

Re: iLO 5 SNMP read stops working after several hours

still present on storeonce with ILO5 2.72 Sep 04 2022

Smartyp4nts
Visitor

Re: iLO 5 SNMP read stops working after several hours

I forgot to mention in me reply above that Jason_E comment about disabling bulk requests is also valid. With bulk requests checked, it does not work for me either. I agree that there's definitly a bug in iLO and HP should look into it. It should work right out of the box, that technology is 25 years old .. But for now I have it working for more than 6 months on multiple servers without interruptions. I haven't tried it on StoreOnce though but I imagine iLO should be working the same on every platforms.

ram-sys
Senior Member

Re: iLO 5 SNMP read stops working after several hours

Same here

We use CheckMK for Monitoring and about 12 HPE servers
But only one has this error. We have exact the same machine twice the other one has not that problem.
If the error occurs, I can do s simple snmpwalk from my workstation and it times out after 5 answers.

Strange is, that an ilo reset does not fix the issue. I have to reboot the whole server.

I have increased the SNMP Timeout to 60 seconds. Now it happens only after about two weeks.
HPE Support was not really helpful until yet, still have an open case.

But it is a problem of the ilo!!

ram-sys
Senior Member

Re: iLO 5 SNMP read stops working after several hours

Hi

Just got a response from the CheckMK forum.
They supposed to use the Redfish.
I tested it on my monitoring system while the SNMP was broken and it worked!
So I will not use SNMP anymore.

CheckMK Forum 

I move all Servers from SNMP to Redfish and I'm happy. I also read that that will be the future and SNMP will not be more developed in the future on the ILO plattform.

Maybe there are also other plugins for Redfish for Zabbix, PRTG and all the other tools.

placka
Regular Visitor

Re: iLO 5 SNMP read stops working after several hours

Sure I know aboyt redfish but not all customer allow it ( https traffic ) or if traffic is routed via FW you'll get longer response times compared to "simply working" snmp. Where did you read it that ILO will be not developed ?

Eg. in my opinion this is a bug on a perfectly supported system ( ILO ) which is not EOL a was there without resolution for years/still is

GoldyGopher
Occasional Visitor

Re: iLO 5 SNMP read stops working after several hours

Also some of us are not running an OS we can manipulate (toaster-ized storage node) on our servers so it's snmp queries against the ILO card or nothing. I guess I wouldn't mind it if it was all or nothing but I have two multi node clusters where all of the ILO cards work except 1 - why? And other clusters where all of them work.

Fence5050
Visitor

Re: iLO 5 SNMP read stops working after several hours

Hi,

Also having this issue on 4 servers. I guess HPE isn't interested in helping clients since the first message was 6 years ago. Nevertheless, hoping to see some HPE folks over here to help us.

 

nhs192
Regular Visitor

Re: iLO 5 SNMP read stops working after several hours

I also faced a similar issue; HPE really doesn't respond to customers, which is quite disappointing.