Servers - General
1753571 Members
6156 Online
108796 Solutions
New Discussion

ML350 Gen10 Error IDs 129 & 5008

 
dlcole0708
Frequent Visitor

ML350 Gen10 Error IDs 129 & 5008

I have a ML350 Gen 10 Server that has been running with no issues for some time.  Last night we started getting error IDs 129 & 5008 in SmartSAMD.sys on this Server.  File access is extremely slow, especially on large files or folders with more than 50 or so files.  Tried Restarting the server, performance was slightly better but soon (15-20 minutes) was just as slow.  All Hard Drives appear to be functioning normally.

Any ideas on what could be the issue?

Thanks,

Dave

6 REPLIES 6
support_s
System Recommended

Query: ML350 Gen10 Error IDs 129 & 5008

System recommended content:

1. HPE ProLiant ML350 Gen10 Server - Troubleshooting

2. HPE ProLiant ML350 Gen10 Server Maintenance and Service Guide

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

dlcole0708
Frequent Visitor

Re: Query: ML350 Gen10 Error IDs 129 & 5008

Reviewed the Troubleshooting infomation and all if the LEDs on the server are reporting that there are no issues.  Some additional information:

1. The slow perfomance only happens when reading large files from the RAID array.  When writing to the array, performance is fine.

2. When Read/Write to other drives on the server (A pair of mirrorred SATA drives and a USB Drive) performance is fine.

3. Does not seem to happen from all folders on the Array.  Moved a 1.5GB File from a folder where it was experieincing the issue to the Root of the array and was able to access it normally.

4. The errors always happen in pairs, first the 5008 then the 129 error.

5. The do not happen all the time, we just had over a 30 minute period without an error and then 8 pairs in 16 seconds.  Performace was still slow when the errors were not happening, but not quite as bad.

Could a hard drive be failing, but still having a Green LED?

Thanks,

Dave

 

Suman_1978
HPE Pro

Re: Query: ML350 Gen10 Error IDs 129 & 5008

Hi,

I am not sure what type of Raid controller you are having, but its a good practice to keep the controller firmware updated.

If you believe the hard drive is failing, you may perform Array Diagnostics on the server.

Thank You!
I work with HPE but opinions expressed here are mine.
Recent Support Video Releases


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

dlcole0708
Frequent Visitor

Re: Query: ML350 Gen10 Error IDs 129 & 5008

THank you for your response.  Last night the server returned to normal function.  I'm not sure why at this point, but I will check to verify the firmware is updated at my next downtime and also perfom and Array Diagnostics.

Thanks,

Dave

dlcole0708
Frequent Visitor

Re: Query: ML350 Gen10 Error IDs 129 & 5008

Last night the issue returned, I ran Array diagnostics and it shows the following error:

Consolidated Error Report:
Controller: HPE Smart Array P408i-a SR Gen10 in Embedded Slot
Device: Logical Drive 2
Severity: Warning
Message: This logical drive has Unrecoverable Media Errors Detected on Drives during previous Rebuild or Background Surface Analysis (ARM) scan.
Errors will be fixed automatically when the sector(s) are overwritten.
Backup and Restore are recommended.

From the sounds of this the error will be corrected automatically, but it doesn't seem to be happening.  I'm assuming I should replace Logincal drive 2?

Some adidtional information, when I go into windows performance monitor, the % Idle Time for the Array is 0.000 and the Avg Disk Queue Length goes as high as 250.

Thanks,

Dave

dlcole0708
Frequent Visitor

Re: Query: ML350 Gen10 Error IDs 129 & 5008

Update: Drilling through the Array Diagnostics I realized I misread the Error Message and that was not Physical Drive 2, it was the Second Logical Drive on the Controller.  Looks like I do have a physical drive that is reporting a lot of errors, so I'm getting a replacement ordered.  Does anyone think of would be worth a backup and restore as recommended in the error message above or am I just wasting my time?