ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

CHKDSK errors on DL380 G7

 

CHKDSK errors on DL380 G7

Hello.

 

We have a problem with 2 out of 3 identical servers we bought together.

 

When running CHKDSK errors are shown which could not be repaired

in read only mode. After running CHKDSK on reboot errors are gone but

always reappear on the next day.

 

The servers are all DL380 G7 with P410i + 256 MB BBWC and two 72 GB

harddisks running in simple RAID 1 configuration. One server is also

using FC SAN storage on multiple P2000 G3 FC boxes.

 

They are all on current identical firmware (per DEC 2011) running Server 2003 R2 SP2 x64.

Patchlevel and Software of all servers are identical as they are provisioned completely

unattended by scripts.

 

Interrestingly the server which is also attached to the SAN only shows those

errors on the DAS drives. On the SAN devices CHKDSK runs flawlessly.

 

All Servers are covered by carepacks and we already opened an incident,

but I 'm quite confused about the answer we got from HP support after we

submitted the HPS report.

 

They told us that they could not find any hardware problems so this must

be a software problem and we should call Windows for support. They also

asked us to try running a newer version of CHKDSK (not sure what he was talking

about).

 

Besinde the fact that I don't know how to call Windows (I would try MS instead)

we are running 60 servers here where 45 are running Windows 2003 R2 SP2 x86

and 10 are running Windows 2003 R2 SP2 x64 an non of them are showing similar

problems.

 

Any suggestions what to do next?

 

Thanks for your suggestions.

 

Cheers.

 

12 REPLIES
rguha
Advisor

Re: CHKDSK errors on DL380 G7

Hi, 

 

Please provide the array diagnostics report for analysis.

You can download the utility from the link given below:

 

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=428936&prodNameId=3288114&swEnvOID=4024&swLang=8&taskId=135&swItem=MTX-c3e7acb4f845425d9be5dea323&mode=3

 

 

How to say thank you? Please click the white Kudos star :)

Re: CHKDSK errors on DL380 G7

Thank's for your response.

 

Here is the report.

Johan Guldmyr
Honored Contributor

Re: CHKDSK errors on DL380 G7

Hey,

 

see the section "Monitor and Performance Statistics " for each disk.

 

There's no errors reported for either of the two disks.

 

So there doesn't appear to be a problem with the disks, physically.

 

Maybe the problem lies with what's writing to the disks (OS, drivers, bad applications).

 

Are you running chkdsk while the server is online - and then seeing errors?

And when you reboot they are not seen? Maybe you only see the errors because you are running them while the server is online?

Why are you running chkdsk? Are you having any actual issues?

Re: CHKDSK errors on DL380 G7

Hi Johan.

 

Thanks for your response.

 

You are right there are no physical errors on the disks itself, so the question is why the filesystem on that disks get corrupted. So maybe a problem with conrtoller-cache, controller itself, cabeling, backplane?

 

We see that errors when running CHKDSK online.

Then we correct them by running CHKDSK on reboot.

After that running CHKDSK online does not show errors.

Next day running CHKDSK online show errors again.

 

Strange enough CHKDSK on attached FC LUNs works flawlessly.

Same problem on second machine (but there is no SAN attached)

Third machine has no problem.

 

All machines same OS same FW same support pack components.

 

We sometimes also get a Windows message that directory structure

has been corrupted and cannot be repaired.

 

We also sporadically see the following eventlog messages:

 

Source: bfad

ID: 11

Description: The driver has detected a controller failure on \Device\RaidPort1

 

and

 

Source: bfad

ID: 129

Description: Reset to device \Device\RaidPort1 was issued

 

I agree with you that the problem lies whith what is written to disk.

But from my understanding this is outside the scope of an application, so what's left is OS, driver, hardware.

 

Cheers.

 

Edit:

 

The bfad errors are only present in the event log two times and only on the machine with the FC controller so it does no seem to be related to the problem.

 

Johan Guldmyr
Honored Contributor

Re: CHKDSK errors on DL380 G7

Hi,

 

usually when there's a backplane issue the 'bus faults' in the ADU reports are larger than zero.

 

One thing that's easy to check - as you are running RAID1 - is to run it without the BBWC.

 

I am not sure, but probably the disks from the FC HBA are using a different driver.

 

As you say that the error comes "the next day" - maybe you could try to find out after exactly how long before you see the errors? And then maybe it would be possible to figure out what the OS has been doing in between? Scheduled jobs, etc,

 

Is it possible to find out if it's always the same part of the disk / file that is having these errors via CHKDSK? Or is it always different?

Re: CHKDSK errors on DL380 G7

>>One thing that's easy to check - as you are running RAID1 - is to run it without the BBWC.

 

That we checked meanwhile.

Problem persists.

 

Strange is that chkdsk states that errors were found and to run it in read write mode but I don't see any errors corrected when running in read write mode. After that chkdisk states that everything is ok.

 

Cheers.

Johan Guldmyr
Honored Contributor

Re: CHKDSK errors on DL380 G7

Perhaps if you provided the logs from the scans that would help us, at least so we can see what these 'errors' are that you're referring to.

Re: CHKDSK errors on DL380 G7

Stupid question but where do I find logs created by chkdsk?

 

As I said chkdsk moans to be run in write mode as it has detected errors but when run

in write mode on reboot I did not see any errors displayed on the screen?

Maybe this happens to fast so I did overlook it.

 

Thanks for your help.

 

Cheers.

Johan Guldmyr
Honored Contributor

Re: CHKDSK errors on DL380 G7

Sorry for the delay but I don't know if you can find them, but google says that some logs are in the event viewer.

 

But anyway, could it be that at some point was something bad written on the filesystem and chkdsk cannot handle it?

Re: CHKDSK errors on DL380 G7

Today I was at the customer's site and rechecked.

 

There is no entry in the event log after it ran chkdsk in write mode on reboot.

Also we saw chkdsk moaning again today that it found errors and to run it in write mode.

Running out of ideas.

 

BTW: Got a call from HP support today. They told us that they do not support ADIC Tape libraries and as such will close the call?

 

Hmm it is correct that the server accesses a ADIC library via firbe channel which is located at the SAN but what has this to do with the problem? Next time they will deny support cause there are Ricoh printers located on the LAN and they don't support Ricoh?

 

If anyone still has some suggestion I would really be thankful to hear about it?

 

Cheers.

Johan Guldmyr
Honored Contributor

Re: CHKDSK errors on DL380 G7

Are both servers accessing the ADIC library?

Maybe you could disconnect / zone out the connection between the server and the library and see if the problem persists?

 

 

Re: CHKDSK errors on DL380 G7

No.

 

The second server has no FC controller at all.

I could pull the FC plug to omit the 2T part of the B2D2T backup strategy but I suspect it will change anything regarding CHKDSK errors on DAS disks.

 

Seems like HP is looking for something to deny support in the first place.

 

Cheers.