ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

serious issues with HP ProLiant DL 380 G5

 
chief_media
Occasional Contributor

serious issues with HP ProLiant DL 380 G5

I've been recently tasked the wonderful "luxury" of managing a Windows 2003r2 Standard server running on an HP ProLiant DL. I don't have the exact model or generation at the moment, but I'll add that tomorrow when I get back into the server room. I'm not sure what the previous IT admins before me did, but the HP software looks incomplete so I'm unable to obtain any useful information even logged in remotely.

 

- Is there anything I can install to determine the model/generation of this machine?

- last night this server shutdown and I can't seem to determine any useful information to figure out why... there is no HP System Management software on this machine for some reason. looking through the Windows' Event Viewer I can see that activity stopped around 11:15pm, but nothing useful to determine a cause. Application and Security logs show nothing, while System shows a Event ID: 6008 Source: Eventlog when I rebooted the machine this morning.

 

only thing that I see that may cause a system crash in the System log is a "The file system structure on the disk is corrupt and unusable." that looks to have been showing up every day at a specific time, 6:46pm. This system is a RAID5 that's split into two partitions and this System log looks to be occuring on the second partition that's used for file serving.

 

- around 930am this morning when I rebooted this machine it took approximately 10 minutes before it gave the usual bootup beeps, woke up the monitor and proceeded to boot. at this particular point in time, it wouldn't load the Windows boot loader and kept giving the message "Disk Error". checking out the F9, F10, F11 options it rebooted the system after exit and this time Windows began loading. disk checks ensued for about half an hour before Windows came up and services were restored.

 

now I'm trying to determine:

1. the easiest way to install the HP software that should have originally been on this machine

2. cause of the shutdown last night

 

if anyone has any suggestions or need further information that can be obtained remotely, please HELP!

6 REPLIES
chief_media
Occasional Contributor

Re: serious issues with HP ProLiant DL 380 G5

just found out that this machine is an HP ProLiant DL380 G5
scharchouf
Trusted Contributor

Re: serious issues with HP ProLiant DL 380 G5

Access remotly can be done via iLO

 

last software for HP server under Windows 2003 : CA 2013.09.0(B)

 

if you have the possibility to send me an HPSreport in order to analyse logs I will try to help you about the issue of rebooted

 

http://update.external.hp.com/HPS/HPSreports/

I work for HP
A quick resolution to technical issues for your HP Enterprise products is just a click away HP Support Center Knowledge-base
See Self Help Post for more details

waaronb
Respected Contributor

Re: serious issues with HP ProLiant DL 380 G5

The long bootup time and subsequent disk errors means it probably did see some errors getting the drives going, although you probably should have seen video much earlier than you mentioned, even if the drivers were doing a click-of-death dance.

 

Is there any particular drive showing a flashing or solid amber LED indicating it's bad (or any that fail to light up at all)?

 

Go to the download page for the DL380 G5 and you shouldn't have any problem getting all of the latest (if you can call it that) drivers, software, firmware for Windows 2003, or if you have access to the latest SPP download (or can find an older one), use that.

 

Get the ILO configured if it isn't already, so you can remotely connect to that and check system status, any errors it logged, etc.

 

I'm also curious if the system takes that long to boot each time, or if it was just that one time because maybe it hadn't been rebooted in a while or something.  I have some old G5 machines and the button battery in them (a CR2032) are starting to die.  If they're without power for too long now, they lose all their BIOS settings and I need to setup the ILO again.  That even includes losing the serial/model # info, which is interesting.

 

If they did happen to lose their BIOS settings, it does take a little longer to boot up, I've noticed.  Not 10 minutes, but maybe a couple, plus the usual memory check it does when it detects a change in memory configuration.

 

Try booting into the diagnostics when it's starting up and run through things in there while you're at it.  Also go into the array config boot utility and check the status of the config from there.

chief_media
Occasional Contributor

Re: serious issues with HP ProLiant DL 380 G5

@waaronb - not sure if all click of death events are always audible clicks, but there was nothing strange in terms of audible sounds to indicate this... I don't recall the number of drives exactly, but I remember there being enough to support RAID5 and all of their lights were green. there was an extra drive however that did not have any lights on and since the HP Array Diagnostics and SmartSSD Wear Gauge Utility keeps crashing, I can't get any info from that as to whether or not this was a drive for standby or if it was part of the array. Just realizing this now and going off my own failing memory, the two partitions add up to 410GB so if each disk was 75GB each then that last one has to be part of the array... unless my memory/math is wrong. If it was part of the array then that last disk with no lights has failed?

 

I wish I could set up iLO, but attempts to start that application crash... and when it doesn't crash, I get this:

ERROR: Unable to establish communication with iLO/RILOE-II.
Management processor is busy

 

oh I can assure you that this long bootup can be reproduced! my first experience with this server was maybe two weeks ago when I restarted it to run the disk check over the weekend. I may have been occupied for a bit, but it got my attention when I finally heard the beeps at bootup. then about a week ago I was hoping the IE crashes would be resolved with a reboot, but unfortunately it did not. during that restart it took so long I thought it was a goner. that was pretty scary.

chief_media
Occasional Contributor

Re: serious issues with HP ProLiant DL 380 G5

logs are pretty big... you can find them here https://db.tt/RKSrcq0R

 

waaronb
Respected Contributor

Re: serious issues with HP ProLiant DL 380 G5

I skimmed the logs you attached but they're probably not going to show anything related to the hardware... you'd want to check the IML for anything the server might have logged.

 

It sounds like you're saying the system has 8 x 72GB drives, and there's 430 GB of space.  Tossing aside the 1024 vs 1000 difference, that says there are 6 drives used for data and 1 for the RAID (well, it's striped across all 7 disks, but you know what I mean).  That 8th drive might be assigned as a spare.

 

When you first turn the system on, all of the drives should light up initially even if the drives are unassigned, so that would be the time to check for drive activity.  If you don't hear any clicking or weird noises and all the lights look good, the drives should be okay.  Enterprise hard drives are thankfully designed to avoid the horrible repeated access requests that can kill performance... if a drive read or write error happens, it may try a couple times but then let the controller take over, which means failing that drive or whatever.   That click-of-death is when a desktop drive has an error and it repeatedly resets the read/write heads over and over again.  Drives tuned for A/V systems are better too (DVR's for instance).

 

If the ILO isn't responding, I wonder if it got scanned by a Heartbleed scanner, and it does't have the fixed firmware (2.25).  That would knock it offline until power is physically removed from the server.

 

I'd try that... remove power, take some time to check the interior for dust or anything that could cause heat issues.  Plug back in, boot up, go into the ILO config and get that setup or at least check it, go into the array config... it's a simple interface from the boot utility, but you can see how the drives are assigned.  And then go into the system setup options as well and check out things there.

 

Gen 5 servers are 6 years old by now so things will happen to them.  Servers tend to run pretty hard in 24/7 environments.  I've still got a handful of them myself but we've retired a lot of them as well.  You reach that point where it seems like every week some drive failed, array battery died, PSU died, some memory module is reporting errors, etc.  Until the motherboard itself dies (and some people are happy to replace even those with used parts sellers on Ebay) .