ProLiant Servers (ML,DL,SL)
1753460 Members
4607 Online
108794 Solutions
New Discussion юеВ

Re: STOP error after PSP 7.8

 
SOLVED
Go to solution
Jded
Occasional Advisor

Re: STOP error after PSP 7.8

The first thing that made result here was to desolve the teaming and then disable one NIC.
That was with PSP 7.7.
The only problem that we had was the BSOD on every restart or shutdown (it is better than a random BSOD).

Now with PSP 7.8 and all the firmware updates I reenabled the NIC and made the team again. The server is running fine (I think).
Joshua Small_2
Valued Contributor

Re: STOP error after PSP 7.8

I have new light to shed. On clean (start->shutdown->restart) reboot, I got this today in my IML:

POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator

I know exactly what this error is supposed to mean. That the card kept data in its cache after some kind of unclean shutdown. Only the shutdown was perfectly clean. Event log also shows there was nothing in the way of a STOP error.

Am I correct in assuming there could be something wrong with my E200 RAID card?
I'm seeing continuing event logs relating to checksum errors in my SQL service, Im' really suspicious *something* is wrong with my storage.

I've had a case open with HP for several days, and I'm really hoping the next person to call me back suggests something other than reflashing the system board again.
Jonathan Rees
Occasional Advisor

Re: STOP error after PSP 7.8

I have had this error as well, but it has been after a server crash so is expected. Testing so far seems to indicate that using PSP 7.7 has fixed problem, but have all firmware/bios at latest version. We have been able to induce the crash fault by backing up 200 + Gbytes of data. We are using DataProtector 5.5, fully patched. Seems to be related to server load, i.e. backup results in enough load to trigger fault. If you are running Storport drivers there is a new release from 28 March that fixes a number of stop errors. Check http://support.microsoft.com/kb/912944/en-us
Joshua Small_2
Valued Contributor

Re: STOP error after PSP 7.8

Hi Jonathon,

That's an old storport patch :)
We already applied this one to try resolving the issue ourselves:

http://support.microsoft.com/kb/932755

It didn't :(

Jonathan Rees
Occasional Advisor

Re: STOP error after PSP 7.8

Oops, posted the wrong link, mean't the 1 you posted, did not fix our problem either, but MS blurb does mention stop errors with other HP raid cards, i.e. 5x/6x, which require driver updates.
Mikearm
Occasional Advisor

Re: STOP error after PSP 7.8

Hi Joshua
Sorry been away for a while, I understand your frustration with random errors, a nightmare to troubleshoot.

To summarise your problem:
Standard DL380 G5, updated to latest PSP 7.8
Since then had problems with fltmgr.sys, one relating to memory_corruption, cpqteam.sys, and then your last one about array accelerator issue.

If it was me, I'd follow this course of action, one by one and let it run for 24 hours before changing anything else.
Roll back to 7.7 agents, drivers and firmware.
Remove BBWC, proving it's not corrupt raid memory.
Add a second raid controller and put disks on this one, proving it's not your E200.
Put disks in a second spare server (if possible).
Remove memory modules to as few as possible, and try swapping if BSOD keeps happening.
If all this fails, either rebuild or get HP to swap motherboard out.

I know this is all time consuming, but like you say you can't keep having a production server dropping out of service.
hwn_1
Occasional Advisor

Re: STOP error after PSP 7.8

We had 6 crashes with all kind of BSOD error codes within the last 2 days. (Analysing dump with WinDbg always shows different memory location, drivers - so no help)

The system (DL380g5, Windows 2003 R2 SP2 Standard 32bit, psp 7.80) was installed last week and was running stable until it was put into production as primary fileserver on Monday. After that it crashed every day at 7-8, 11-12 and 17-18 o'clock.

Boot and storage is on SAN. Local disc (RAID1) are only used for shadow copies. After first crash I've moved pagefile from SAN to local disc. Next crash I've downgraded QLogic-FC adapter firmware, qlogic driver and system bios to the same version as one other system running stable. Nothing. After next crash removing McAfee 8.0 (with HF15) completely. Only software installed now: BackupExec Agent. Still crash.

Last thing I did yesterday was:
- disabled shadow copies
- upgrade P400 to Firmware 2.10 from 2.08
- disabled teaming (NIC drivers still at 3.0.7, I will downgrade on next crash)

It's running without a crash for 20 hours now.

We had a very negative performance impact with TOE (TCP offline engine) and Teaming enabled with some XP clients: only 150KByte/sec transfer rate. Between servers and 1GB-Connections everything was running fine (around 100MByte/sec). After disabling TOE I get 4-6MByte/sec to all clients. So keep an eye on TOE and disable at least this.

I have four additional DL380 G5 (one with psp 7.70 running 2 months without a problem). 3 with psp 7.80 currently in testing without a problem but there isn't any load at the moment.

So my advise: Remove Teaming, go back to the last NIC-Drivers and/or disable TOE in the HP drivers or Windows Registry!! Maybe it's a memory corruption due to cpqteam like mikearm wrote.

We've around 30 server, most of them HP, but I've never seen such a mess. I hope HP would do some more tests before releasing these drivers.
Joshua Small_2
Valued Contributor

Re: STOP error after PSP 7.8

Hi,
I've gone with hwn's suggestion for the moment since it's the least disruptive.
I'll carry on with further suggestions next.

I'll note that we had lots of issues in the past with McAfee 8.0, I'd recommend you upgrade to 8.5patch1 which has been flawless.
Joshua Small_2
Valued Contributor

Re: STOP error after PSP 7.8

I just had an opposing issue.
Disabling TOE made my server so slow to perform, RDP sessions dropped out.
Re-enabling appeared to have fixed it.

Perhaps the difference between us is that I applied this update earlier:

http://h18023.www1.hp.com/support/files/server/us/download/27259.html

I'll note this doesn't appear in the version control agent, or on the firmware maint CD as far as I could tell.
hwn_1
Occasional Advisor

Re: STOP error after PSP 7.8

Hi Joshua,

our system is stable for 48 hours now. I've just checked the network performance and iperf continuously gives 950MBit/sec (110MByte/s) with a window size of 64Kb between different systems. So I can't see a big performance drawback due to TOE disabled for us.

NIC BIOS: 3 systems already had 2.1.05B installed by default. Running the installer on the both in production shows, that only the iSCSI firmware has changed (1.0.0 to 1.1.7), so this shouldn't make a real difference.

Yesterday evening I tried to disable TOE on the teaming configuration of the 3 servers in testing. The first one BSOD immediately after pressing OK. So I dissolved teams completely from DL380G5 systems with psp 7.80 and disabled the second eth port.

I've the following DL380G5 systems running at the moment:
old NC Bios, driver 2.8.13 - production, running since feb 2007 without a problem, TOE not enabled by default, teaming still active
old NC Bios, driver 3.0.7 - stable without load, crashed regularly in production, TOE and RSS enabled by default, without teaming and TOE seems stable
new NC Bios, 3.0.7 - testing, low load, crashed while disabling TOE on team, teams removed
new NC Bios, 3.0.7 - testing, low load, no problems, but teams removed now
new NC Bios, 3.0.7 - testing, low load, no problems, but teams removed now
All Systems Windows 2003 with SP2, some R2, one is x64R2

Maybe it is save to use TOE on nic but without teams? Maybe something like RSS, offline checksums may also have an impact? I did only modify the setting in HP network cfg, never in registry.

Thanks for the McAfee 8.5 hint. We've almost all File/TS-Servers with 8.0 (long time with HF13, HF14 and now HF15) without a problem. Arround HF10 there were some problems (especially on citrix with logon) if I remember correctly. McAfee 8.5 is only used at Vista clients at the moment. But I have a look at this if problems occur.

Hope you find your solution soon!