StoreEver Tape Storage

LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

 
SOLVED
Go to solution
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I'm not aware of any Storport driver dependencies. Later drivers introduce a "feature" that requires a new registry entry for the behavior when a command can't be sent immediately.

See:
http://support.microsoft.com/kb/932755

You should set the BusyRetryCount to at least 75 and higher wouldn't hurt. If you have that registry entry then there shouldn't be any problem with the most recent Storport drivers.

The failure your system is logging might be consistent with what would happen if that registry entry wasn't set as the OS could give up and abort the command without it ever getting sent out. I don't know exactly what the event log signature is for the failures caused by this registry entry not being set correctly.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Great I'll give this a go.

I've check and there isn't a key at all for this. Perhaps I should create it?
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I've installed that hotfix.

When I rebooted the server the event log filled up with these messages:

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 29/09/2009
Time: 3:00:00 AM
User: N/A
Computer: CANFP1
Description:
The driver detected a controller error on \Device\RaidPort1.

Data:
0000: 0f 00 18 00 01 00 68 00 ......h.
0008: 00 00 00 00 0b 00 04 c0 .......Ã
0010: 00 00 13 31 00 00 00 00 ...1....
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 00 00 00 00 ........
0030: 00 00 00 00 0b 00 04 c0 .......Ã
0038: 00 00 00 00 00 00 00 00 ........

Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Date: 29/09/2009
Time: 3:00:00 AM
User: N/A
Computer: CANFP1
Description:
The device 'Hewlett Packard LTO Ultrium-4 drive' (SCSI\Sequential&Ven_HP&Prod_Ultrium_4-SCSI&Rev_U52D\5&2f1e44f4&0&000500) disappeared from the system without first being prepared for removal.

Data:
0000: 00 00 00 00 ....

The drive is appearing and disappearing from Device Manager constantly.
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

If you installed a hotfix from the Microsoft KB article I pointed to you probably want to remove that. The information about the new registry entry that is required is in the document but the hotfix is old and the changes have been rolled into the standard distribution.

The "BusyRetryCount" registry entry has to be created and then set to at least 75.

It sounds like things are getting worse. You are getting errors on boot up? Any of the known issues really should be infrequent and usually only during activity. If you are having errors during the boot phase that sounds like there must be some faulty hardware. I would be a bit suspicious of the SAS signal conditioner in the 1U enclosure. That is P/N 403721-002. Your CE should be aware of it.

If there is any way you could get a loaner standalone drive that would be a really interesting test.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

After a power cycle the tape drive is showing a hardware error now. LTT reports the drive should no longer be used. I have an open support case for this so it's really upto HP how they want to progress now.

I have installed hotfix KB957910, storport.sys version 4485, onto the other server I had the same issues with. I downgraded the StorageWorks 1760 firmware to U29D also. I've been able to run several successful backups using ARCserve.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Yes it looks like I have a stable backup on my DR server now.

SC44Ge FW 1.23.43.0, Driver 1.28.2.1
Ultrium 1760 FW U29D, Driver 1.0.5.2

Currently getting a throughput of 5600MB/minute.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Forgot to mention Proliant Support Pack 8.30 was installed and Storport.sys version 5.2.3790.4485, no registry changes were made.
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I wouldn't recommend leaving a 1760 at U29D with the SC44Ge controller long term. Eventually if you load it heavily there will be an error but U52D can recover from that error.

I'm glad to hear that there has been some improvement.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I downgraded from U52D. Backup would fail a minute or two into NT Backup job. ARCserve failed almost immediately.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I was able to do several backups on my 2nd server. I have done a scan and merge of the tape this morning and there is an error in the log. There is a LSI_SAS error in the system log also:

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 30/09/2009
Time: 1:53:41 AM
User: N/A
Computer: CANFP2
Description:
The driver detected a controller error on \Device\RaidPort1.

Data:
0000: 0f 00 18 00 01 00 68 00 ......h.
0008: 00 00 00 00 0b 00 04 c0 .......Ã
0010: 00 00 19 31 00 00 00 00 ...1....
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 00 00 00 00 ........
0030: 00 00 00 00 0b 00 04 c0 .......Ã
0038: 00 00 00 00 00 00 00 00 ........

ARCserve seems to be quite happy to carry on though and I was able to restore a 100GB file.

We have an option to change the drive to a StorageWorks 1840 SAS. Would this model suffer from the same issue?

My boss is suggesting we should throw money at the problem now as we have not had a decent backup for so long.