ProLiant Servers (ML,DL,SL)
1756440 Members
3831 Online
108847 Solutions
New Discussion юеВ

STOP error after PSP 7.8

 
SOLVED
Go to solution
Joshua Small_2
Valued Contributor

Re: STOP error after PSP 7.8

Can anyone suggest any method of

- Technical escalation
- Bringing this issue to someone in management

?

We've got the most replies on any thread I've ever seen on this forum from people with the same issue.

It simply can't be my imagination any more.

Technical support have stopped even responding to my emails, under some sort of basis that the issue is definitely resolvable by simply "installing the latest drivers".

We paid for raid drives, redundant fans and power supplies because we expected a highly available server. Something that's only supported when running a driver that takes it offline daily doesn't live up to that expectation.

Before our rollback, we saw corruption, that is to say, Exchange data damaged as a result of blue screens. If I was accepting of data loss I would have purchased a workstation and placed it on its side.

At the moment, we have workstations with more uptime than our server.

Re: STOP error after PSP 7.8

I still have open case to Microsoft and HP on this, and late yesterday afternoon HP asked me to turn off the Storage Agents.

Coincidentally, I did get a STOP 0xC4 driver fault BSOD that when run through debug shows the storage agents process faulting storport.sys driver.

I turned this off, and enabled all Driver Verifier settings to put pressure on the system to see if it makes a difference, you may all want to try the same, I'll post my results after a while.

Robert Mader
Occasional Advisor

Re: STOP error after PSP 7.8

Hey Bruce,

i remember i had the storport.sys being pointed at from windbg in one of my first BSODs but i assumed that it s highly likely that a third party driver was the root cause of the state for storport.sys to fail ....

Never tried to disable the storage agents ... r u still on 7.80 ?

wbr
robert

Re: STOP error after PSP 7.8

Yeh, still on 7.8, though I have downgraded the p400 controller to 6.4.0.64 per earlier HP recommendation.

Storport.sys has been fingered a couple of times, once by Microsoft itself, and they provided me with a newer patched version.

Unfortunately, even with Agents disabled, and verifier running, took about 10 minutes, and the machine BSOD again....
Jesse Zellmer
Frequent Advisor

Re: STOP error after PSP 7.8

I also still have a support case open with Microsoft. I have sent them several dumps, but are still unable to clearly identify the issue. They lastly had me change some GFlags settings to get more detailed diagnostics from the dumps and run Driver Verifier on TCPIP.SYS.

I am going on almost a week of no BSOD since rolling back to PSP 7.7A. I still have the NIC's teamed (NFT) w/ TOE enabled. Smart Array P400.

- DL360 G5 - W2K3 SE w/ SP2
GSD_2
Occasional Advisor

Re: STOP error after PSP 7.8

Joshua
I work for a reseller and so I'll ask our HP account manager and see if he can do something here.
Jonathan Rees
Occasional Advisor

Re: STOP error after PSP 7.8

My testing indicates there are 2 seperate faults. TOE is randomly enabled on servers, this results in a server "freeze" which can only be cleared by a reset or power down. This fault applies to both PSP 7.7 and 7.8. The random BSOD fault applies to PSP 7.8 only, and is effected in some way by load. Backing up more than 200 GByte (using DataProtector 5.5) of data will result in garanteed BSOD. Interestingly restoring data does not have the same effect. Normal production use will also result in the random BSODs.
Morten Dalgaard
Occasional Advisor

Re: STOP error after PSP 7.8

The BSOD issue is most definitely related to the HP NC-Series Multifunction Driver (Broadcom driver) version 3.0.7.0, which is also included on PSP 7.80.
This is effectively the driver all DL380 G5 machines use for their on board NICs. Running with PSP 7.80, but rolling the NIC driver back to 2.8.22.0 fixed the BSOD issue for me.

Interestingly enough, in my attempt to locate the problem, i installed Win2k3 std R2 x86 + service pack 2 on the server at some point, and that stopped the BSODs. Same install, just x64 instead started the BSODs again.

About the TOE issue, i do not get lockups from having TOE enabled, but i do notice network traffic "stalls" when having it enabled. The machine simply stops transmitting, and recieving any data for a couple of seconds, where after resuming.
Disabling TOE fixed that, rolling the driver back didn't though, so i guess i will be running without it enabled.

I'm appalled by HP releasing a defunct driver, and the ongoing issues that TOE gives. Why can't they use some proper NICs/drivers, or at least admit the issue, and release a fix/workaround for them.
PZel
Trusted Contributor

Re: STOP error after PSP 7.8

Has it something to do with the firmware
for Broadcom as represented on the firmware CD 7.8 ???

When booting the firmware CD you can choose
default to upgrade firmware for Broadcom for
Linux (CPxxxx.scexe). For W2K3 systems not
recommended (but harmless ??).You deselect it by clicking Left (de-select). Afterwards,
when booting the O/S you can insert the firmware CD again, and then select Firmware
for Broadcom for Windows (cp007672.exe)
PZ

Re: STOP error after PSP 7.8

Well, I have a completely stable system currently, and here's how I did it.

Uninstalled everything that says "HP" in add remove programs.

This leaves drivers installed for required components, but removes management software, and has resulted in a rock stable system so far.

Previously, the easiest way I got these machines to blue screen was to turn on driver verifier with all options, it would blue screen in under an hour with this setting. After removing all the management software, this system has been running driver verifier non-stop and stable.

Which leads me to believe it's a management driver issue, or some interaction between the management software and installed driver or firmware on a component. Which one that is, is unclear currently, but I will run them like this until HP can isolate the issue.