ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ProLiant DL380 G3 weekly ASR reboot

SOLVED
Go to solution
Michael Mize
Occasional Advisor

ProLiant DL380 G3 weekly ASR reboot

I have an HP Proliant DL380 running Windows Server 2003. Every week at 6:00AM, as recorded in the Integrated Management Log, "ASR Detected by System ROM" occurs for the past three weeks. The server is running MSDE 2000. There are no jobs of any nature scheduled for 6:00AM or even close to that time. The Windows Event Viewer does not contain any errors. As expected, the System Log reports an unexpected shutdown, but there are no other indications of a cause. I understand that ASR can be caused by Hardware or Software malfunctions. The consistency of this error, and the absence of any potential causes is baffling. I have considered disabling the ASR feature, but I would rather fix the problem. Any suggestions?
26 REPLIES
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

Hi,

ASR Disabeling won't going to help you. But what is the exact error reported under iml in same case any bug check error or any other notification. any error information under event veiwer.

Regards,
Prashant S.
Nothing is impossible
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Thank you for your response. Here is the exact IML entry:
"Index Description Time Of Event Update Time Count
1 ASR Detected by System ROM 4/21/2004 6:10AM 4/21/2004 7:57AM 1"
There are no further errors. The Windows Event Viewer does not report any errors. I utilize Compaq Insight Manager. It reports the following Minor error: "Event description:The server is operational again. The server has previously been shutdown by the Compaq Automatic Server Recovery (ASR) feature and has just become operational again."
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

HI,

Run Server Diagnostics if feasible to determine if this is a hardware or software issue and Increment ASR timeout in the RBSU if still does not help you then go for disabling.

Regards,
Prashant S.
Nothing is impossible
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Thank you again for the response.
I ran Server Diagnostics and no errors were reported. I have Google'd this topic and understand that I am not the only individual encountering this situation. Perhaps there is a flaw in the ASR implementation. Would a BIOS upgrade potentially correct this problem ?
robert lojek_4
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

We have the same problem, except running redhat Advanced Server 2.1AS. We disabled ASR's to try to gather more kernel debugging information during lockups, but the kernel is usually locked up hard. The problem isn't reproducible, and diagnostics (SmartStart 7.x) never show anything.

So far, we're exploring bad ServerWorks chipset revisions with HP--you might want to confirm that your motherboards don't have a flawed ServerWorks GC-LE chipset.

See this link for the story:
http://www.linuxworld.com/story/35213.htm

Beware the ServerWorks GC-LE Chipset

Summary
There's a bug in some of the ServerWorks Grand Champion LE chipsets that's bad enough to hang the servers they're used in. The company says that - after IBM found the problem - ServerWorks ran up a software utility that will let OEMs screen out defective boards.

Both the dl360-g3 and dl380-g3 contain this chipset. So far, I'm unaware of a consumer tool to screen for defective chipsets.
robert lojek_4
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

FYI: We see lockups at least once per machine per 3 week period, but the timing is unpredictable: we've had a few lockup within a few days of reboot, while most lockup withing 2-3 weeks of being rebooted.

Here are all the things we've tried to solve this problem (to no avail):

- Kernel upgrades (up to 2.4.9-e.39 using RedHat's Enterprise series)
- upgrading/turning off all HP server/health agents
- hardware swapping:
1. memory
2. cpu
3. motherboard
4. ppms (power modules for each CPU)
- multiple diags pre/post lockup
- BIOS upgrades: system ROM & 5i RAID controller ROM
- OS reinstallation/rebuilding
- ethernet controller driver switch (HP's bcm5700 -> RedHat's tg3)

Also: we sent a machine to HP's lab in Texas about a month ago, but they haven't found anything yet.

Please post if you get any new information or clues.
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

My ASR reboots are happening at exactly 6AM every Wednesday morning. Whereas a hardware problem is feasible, I am excluding it temporarily. As elementary as this sounds, right now, I am going to reset the date/time to Wednesday morning at 5:55AM and see if I can duplicate the problem. That will eliminate the server hardware/software as the cause. Then, I might try to identify an outside environmental factor that causes the issue. Who knows? Our Operations staff might be hosing down the servers every Wednesday morning.... I will post results.
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Changing the system time did not work. I will let the nightly backup run, perhaps that interaction has a role in the problem. Then, again tomorrow morning, I will modify the time to reflect the time this problem always occurs.
kris rombauts
Honored Contributor
Solution

Re: ProLiant DL380 G3 weekly ASR reboot

Hi Michael,

the default schedule of the survey service is Wednesday 6 am and this on a weekly basis (or after every reboot) so it looks like something that happens during the survey service data collection process causes your server to lock up. You say no jobs are scheduled , you probably mean your custom jobs then since the above looks a bit to much of a coincidence to me.

Check if you're using the latest versions of the management agents and surveyor.exe or disable it till you found a fix.

Attached is a doc on how to change the default schedule of the survey service.


HTH

Kris
Oliver Manz
Occasional Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

Hi Michael,

we had a similiar problem with a ProLiant ML370 G3. The server was rebooted from time to time by asr, never at the same time so we couldn't find out what the error was caused by. The event logs had no entries either.

Now, after changing the memory modules everything works fine again.

Bye

Oli
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

hi,

What about Increment ASR timeout in the RBSU if still does not help you then go for disabling.

Regards,
Prashant S.
Nothing is impossible
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

Thank you, kris rombauts. The survey utility is definitely causing this weekly failure. Presently, I am disabling this service using the survey.exe command-line parameters. I will investigate the existence of the Survey Utility update from HP. Thank you all for your prompt responses and helpful information.
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

Good Man.

I got same thing also now.i have checked now.
-----------
Sample interval is every 7 days
The next 3 samples are scheduled for:
Wednesday 12/17/2003 at 6:00:00 hours
Wednesday 12/24/2003 at 6:00:00 hours
Wednesday 12/31/2003 at 6:00:00 hours
-------------
But what is the psp on same srver.

Regards,
Prashant S.
Nothing is impossible
Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

ProLiant Support Pack for Microsoft Windows Server 2003. v7.00A dated Dec. 17th, 2003.
Lisa Jackson_3
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

We have the same issue on both dl370 and 380s. We attempted to upgrade to psp7 and had it die completly at the update of surveyor.exe. I am assuming this wouldn't have helped anyway, as it appears that you may be running this version already.

Have you had any luck resolving the issue?
dwight rumph
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

What is the fix for this problem, becase I am having a similar problem. My DL380 G3/w/Server 2003 has been rebooting. The error is:
"System Information Agent: Health: The server is operational again. The server has previously been shutdown by the Automatic Server Recovery (ASR) feature and has just become operational again.
[SNMP TRAP: 6025 in CPQHLTH.MIB]"

Please supply me with a fix if any.
Thanks

Lisa Jackson_3
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

We may have resolved, but have yet to test on a second server.

1. Unistalled Sophos Antivirus (do you have this?)
2. Unistalled HP Surveyor Utility/Proliant Support Pack.
3. Installed V7 of above (latest off the HP Site). Engineer noted that this installed an older version of the network drivers than we had installed. We also note that other people who have reported the ASR reboot problem, had this version already.
4. Reinstalled Sophos Anti-virus.

Prior to this, the server had also just been patched with *all* the latest Microsoft Patches - barring the update that came out this morning.

Michael Mize
Occasional Advisor

Re: ProLiant DL380 G3 weekly ASR reboot

My temporary solution was to disable the survey utility. There are several ways to do this:
-There is a service that can be disabled.
-You can disable survey.exe using command line options.
-You can open the HP Management Home Page for your server and disable it there.

I have HP Management Agents for Servers v7.00.0.0 installed and have not found updates for this software.

Without understanding the Survey utilty's purpose, I have no problem disabling it. I noted that the Survey utility reports on Server hardware, for example, but I have other methods of collecting that information. I use Compaq Insight Manager 7, and it still receives updated Data Collection reports, and I can still view the health of all server hardware.

Obviously, there is a problem with the Survey utility. Without a compelling reason to use it, I will leave it disabled until the next Management Agents update.
Daniel Harrison
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

I have a DL380 g3 Cluster solution that I have just run the latest MS critical updates on both nodes now report the unexpected shutdown in the event viewer system log. I did one node first and it had this issue (which I thought was my mistake in a setting on the ilo card). Then I updated the second node and it had the same issue. I am considering uninstalling the updates, but am a little worried over the implications?!
Lisa Jackson_3
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

We have a cluster on the same hardware which has just been patched with everything - and is still running ok. Be aware that one of the cluster configuration options is that errors in the cluster will show up on all Event Logs. So if you get an asr reset on one server, the event will log in all servers in the cluster. If you check out the event logs, it will actually identify the name of the server that the event actually occured on.
Prashant (I am Back)
Honored Contributor

Re: ProLiant DL380 G3 weekly ASR reboot

Hi,

Can you confirm the following information that your server falls under following category or not.

server is attached to 40 (or more) hard

ProLiant server running Microsoft Windows 2000 or Microsoft Windows Server 2003
(any edition) and any of the following versions of the Survey Utility for Windows:
zVersion 2.46.4.0
zVersion 2.52.7.0
zVersion 2.53.0.0
zVersion 2.56.8.0

Then try this :
------------------------------------------------
To avoid the application exception message, disable the drive volume information as
follows:
1. Stop the Surveyor service.
2. Navigate to the \Compaq\Survey directory.
3. Use Notepad to open the SURVEY.INI file.
4. Modify the SURVEY.INI file. Change the following line:
; + all IDI_WINNT_VOLUME_INFORMATION
to:
- all IDI_WINNT_VOLUME_INFORMATION
5. Save the file and close it.
6. Restart the Surveyor service.




Where as HP Say's this.
---------------------------------
The Survey Utility for Windows is an end-of-life product; therefore,
no additional resolution is planned.


Check doccument for confirmastion.

Regards,
Prashant S.
Nothing is impossible
Sebastien Gonzalve
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

I had the same problem with a Proliant DL380 running Linux RH 8.0, I tried sevral things, but It seams that flashing the ROM with the lastest version solved the problem. The server is now up for more than one week and no more reboot was encountered (when It was usually rebootint nearly once a day due to ASR)
I hope this will help you.
Regards
David Martin_12
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

I have 2 Proliant servers, a DL360 G3 and a DL380 G3 affected by "ASR Detected by System ROM" at 06:10AM on Wednesday mornings. Both are about 2 months old (latest BIOS,PSP), both in the same rack, both running Windows Server 2003, both attached to their own SDLT 320 tape drive. Other Proliant servers of the same models in the same rack do not seem to be affected.

As a result of reading this thread, I experimented by running a job on the DL360 this morning at 07:45AM to restart the Surveyor service and found "ASR Detected by System ROM" in the IML with a time of 07:55AM.

This points at the survey utility as being at least a cause of the problem (the DL360 has version 2.56.8.0).

I did read the document at http://forums1.itrc.hp.com/service/forums/getattachment.do?attachmentId=107910&ext=.PDF

although neither affected server has anything like 40 hard drives. The suggestion that no fix to the survey utility will be available seems to leave disabling the Surveyor service as my only option.
Baris_3
Occasional Visitor

Re: ProLiant DL380 G3 weekly ASR reboot

I have the same problem with two DL380 G3 running MS 2000 Cluster. After i read all the forum i removed all HP related management agents and other software to see if the problem persists and i saw that one of them rebooted itself but this time there is no indication ASR as the software was removed? I'm doubtful also about the other.

Is ASR something embeded also in BIOS?

I'll try downloading the newest firmware to fix it.
Is there anyone that would offer somethings else?

Please help if you can.