ProLiant Servers (ML,DL,SL)
1752477 Members
5988 Online
108788 Solutions
New Discussion юеВ

Re: ProLiant DL380 G3 weekly ASR reboot

 
SOLVED
Go to solution
Michael Mize
Occasional Advisor

ProLiant DL380 G3 weekly ASR reboot

I have an HP Proliant DL380 running Windows Server 2003. Every week at 6:00AM, as recorded in the Integrated Management Log, "ASR Detected by System ROM" occurs for the past three weeks. The server is running MSDE 2000. There are no jobs of any nature scheduled for 6:00AM or even close to that time. The Windows Event Viewer does not contain any errors. As expected, the System Log reports an unexpected shutdown, but there are no other indications of a cause. I understand that ASR can be caused by Hardware or Software malfunctions. The consistency of this error, and the absence of any potential causes is baffling. I have considered disabling the ASR feature, but I would rather fix the problem. Any suggestions?
26 REPLIES 26
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

Hi,

ASR Disabeling won't going to help you. But what is the exact error reported under iml in same case any bug check error or any other notification. any error information under event veiwer.

Regards,
Prashant S.
Nothing is impossible
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Thank you for your response. Here is the exact IML entry:
"Index Description Time Of Event Update Time Count
1 ASR Detected by System ROM 4/21/2004 6:10AM 4/21/2004 7:57AM 1"
There are no further errors. The Windows Event Viewer does not report any errors. I utilize Compaq Insight Manager. It reports the following Minor error: "Event description:The server is operational again. The server has previously been shutdown by the Compaq Automatic Server Recovery (ASR) feature and has just become operational again."
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

HI,

Run Server Diagnostics if feasible to determine if this is a hardware or software issue and Increment ASR timeout in the RBSU if still does not help you then go for disabling.

Regards,
Prashant S.
Nothing is impossible
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Thank you again for the response.
I ran Server Diagnostics and no errors were reported. I have Google'd this topic and understand that I am not the only individual encountering this situation. Perhaps there is a flaw in the ASR implementation. Would a BIOS upgrade potentially correct this problem ?
robert lojek_4
New Member

Re: ProLiant DL380 G3 weekly ASR reboot

We have the same problem, except running redhat Advanced Server 2.1AS. We disabled ASR's to try to gather more kernel debugging information during lockups, but the kernel is usually locked up hard. The problem isn't reproducible, and diagnostics (SmartStart 7.x) never show anything.

So far, we're exploring bad ServerWorks chipset revisions with HP--you might want to confirm that your motherboards don't have a flawed ServerWorks GC-LE chipset.

See this link for the story:
http://www.linuxworld.com/story/35213.htm

Beware the ServerWorks GC-LE Chipset

Summary
There's a bug in some of the ServerWorks Grand Champion LE chipsets that's bad enough to hang the servers they're used in. The company says that - after IBM found the problem - ServerWorks ran up a software utility that will let OEMs screen out defective boards.

Both the dl360-g3 and dl380-g3 contain this chipset. So far, I'm unaware of a consumer tool to screen for defective chipsets.
robert lojek_4
New Member

Re: ProLiant DL380 G3 weekly ASR reboot

FYI: We see lockups at least once per machine per 3 week period, but the timing is unpredictable: we've had a few lockup within a few days of reboot, while most lockup withing 2-3 weeks of being rebooted.

Here are all the things we've tried to solve this problem (to no avail):

- Kernel upgrades (up to 2.4.9-e.39 using RedHat's Enterprise series)
- upgrading/turning off all HP server/health agents
- hardware swapping:
1. memory
2. cpu
3. motherboard
4. ppms (power modules for each CPU)
- multiple diags pre/post lockup
- BIOS upgrades: system ROM & 5i RAID controller ROM
- OS reinstallation/rebuilding
- ethernet controller driver switch (HP's bcm5700 -> RedHat's tg3)

Also: we sent a machine to HP's lab in Texas about a month ago, but they haven't found anything yet.

Please post if you get any new information or clues.
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

My ASR reboots are happening at exactly 6AM every Wednesday morning. Whereas a hardware problem is feasible, I am excluding it temporarily. As elementary as this sounds, right now, I am going to reset the date/time to Wednesday morning at 5:55AM and see if I can duplicate the problem. That will eliminate the server hardware/software as the cause. Then, I might try to identify an outside environmental factor that causes the issue. Who knows? Our Operations staff might be hosing down the servers every Wednesday morning.... I will post results.
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Changing the system time did not work. I will let the nightly backup run, perhaps that interaction has a role in the problem. Then, again tomorrow morning, I will modify the time to reflect the time this problem always occurs.
kris rombauts
Honored Contributor
Solution

Re: ProLiant DL380 G3 weekly ASR reboot

Hi Michael,

the default schedule of the survey service is Wednesday 6 am and this on a weekly basis (or after every reboot) so it looks like something that happens during the survey service data collection process causes your server to lock up. You say no jobs are scheduled , you probably mean your custom jobs then since the above looks a bit to much of a coincidence to me.

Check if you're using the latest versions of the management agents and surveyor.exe or disable it till you found a fix.

Attached is a doc on how to change the default schedule of the survey service.


HTH

Kris