ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML350 G5 Problem

 
SOLVED
Go to solution
Walter R-T
Occasional Advisor

ML350 G5 Problem

We have a ML350G5 with similar symptoms to what Chris Devine described in his thread here:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1184406

We are running Server 2003, SQL Standard 2005, no AV, only other program/function is Arcserve for tape backups.

Basically, Explorer sits and waits for pending IO requests to the HpCISSs2 driver. The server doesn't really crash, the open applications continue to run (except the start bar, which locks up) and the kb and mouse are still responsive, but nothing can access the disks at all, and network traffic also stops. The only way around this is to restart the server with SQL services disabled.

Note that as nothing can access the HDD's once this happens, there is nothing logged in Event Viewer, or in the SQL logs. We don't get any error messages or blue screens. The applications think that the server is still running and you can even Alt+Tab between them, it's just that nothing can access the disks. If you start Task Manger before the crash (cant afterwards, presumably because it would need to be loaded from disk which is unavailable), then CPU usage drops to 0% except for a small spike of 2-3% every 4 seconds or so.

Also worth mentioning, is that when this problem occurs, we get the regular blinking of the drive lights on C: (mirror, used for OS) to indicate an array problem. The drive lights on D: (RAID5, where the database and SQL backup files reside) remain on solid.

This does seem to be triggered by SQL in our case, Microsoft Product Support have analysed our memory dump files (we can get this by using the Ctrl+Scroll Lock method for user generated crash dump, not sure why this still works even though nothing else can access the drives) and they think the problem is with the HpCISSs2 driver, due to Explorer waiting for pending IO's to it.

However, we always get this problem when restoring a particular SQL backup from file, we get to 100% on the restore and then the above problem occurs. If we restore an older version of the backup (same database, same file size) then the backup restores successfully and the server runs stably.

We also have versions of the raw database files (.mdf and .ldf files) and these trigger the above symptoms as soon as they are loaded/mounted, or if they are put in place on the server with SQL disabled, it will crash as soon as the SQL service starts.

We can restore other backups, and move large (nearly 10gb) files around on the RAID array without issue, it's only when we either restore this one version of the backup file or use these particular .mdf files that the crash occurs.

We don't think this is due to the HpCISSs2 driver, despite what MS are saying. We are pretty sure that it is due to some sort of corruption in the database. The version of HpCISSs2 we are using is dated July this year, we updated it off the HP site just a couple of days ago, and that made no difference. Everything else on the server is very stable, except these specific files.

Also, when we first started getting this problem a couple of months ago, we tried to restore the database onto a fresh install of Server 2003 and SQL 2005 on an ML 110 G4, with exactly the same result/symptoms. This was just using SATA HDD's, not in a RAID array. Unfortunately, back then we didn't know about the Ctrl+Scroll Lock method of generating a dump file, so have no further debugging info relating to our efforts on that server. We also no longer have that server available for our testing, so can't recreate the problem there anymore.

I'm just downloading the firmware updates as suggested in Chris Devine's thread, will let you know how I get on with that. Does anyone else have any other suggestions?
5 REPLIES
TRS
Frequent Advisor

Re: ML350 G5 Problem

Hi..

share below details
1)O/S STD2K3 Or Enterprise
2)F/W OF system board/array controller
3)Memory ?


I also faced same problem with dl 380 g2
our O/S 2K3 ENTERPRISE
16GB
Before my 1st activity cu working on 2k3 std/4gb
and that time server works fine.
Cu upgrade it to 2k3 std with 16gb.Then he faced problem like u.
I sugges him go for 2k3 enterprise.2k3 std will not supporting for 4gb.
he upgrade it for 16 gb.After that we fased same problem.But after that my second activity is to upgrade f/w (system/array).Now the servers working fine.


U do 1thing update firmware & try.
Walter R-T
Occasional Advisor

Re: ML350 G5 Problem

Ok, I've run the firmware updates suggested in Chris' thread, the CD updated the system firmware, NIC, and iLO - the array was up to date already. No change, symptoms are the same as before.

We are using Server 2003 Standard.

Interestingly, we originally had 6GB of memory in the server, but we have pulled 2GB out, leaving a total of 4GB. However, Windows only recognises 3.5GB. POST shows 4GB, and I've run CPUz which detects 4GB as well, but the OS only shows 3.5GB. How do we get it to recognise all 4GB? Could this be having an effect on the system?

Thanks for your help and advice.

- Walter
KarloChacon
Honored Contributor
Solution

Re: ML350 G5 Problem

hi walter

weird issue I mean it depends on the version of the backup.... I feel useless about that issue..


regarding 4GB - 3.5 check this

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00883105&jumpid=reg_R1002_USEN


regards
Didn't your momma teach you to say thanks!
Chris Devine
Occasional Advisor

Re: ML350 G5 Problem

Hi Walter

I seem to have fixed my problem re

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1184406

Well so far so good any way

I am no DBA but it seems your problem is database related. Does the server lockup only occur when you are restoring this one database ?

You could test restore this database using the free version of SQL 2005 express on just a PC and that would eliminate anything specific to your server.

If there any Offline database maintenance tools you can use to clean up the database before you attempt to restore it.

Cheers
Chris
Walter R-T
Occasional Advisor

Re: ML350 G5 Problem

Hi Chris,

Looks like we were having the same problem - we have been given that same patch by Microsoft. It appears our server is now stable, we are putting the server live again in about 2 hours time so that will be the real test. Fingers crossed, but it's helpful to know that the updated Storport.sys has fixed your problem as well.

Also, we are now able to access all 4GB of RAM thanks to Karlo's solution above.

Appreciate your help guys.

- Walter