StoreEver Tape Storage
1820882 Members
3477 Online
109628 Solutions
New Discussion юеВ

LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

 
SOLVED
Go to solution
CLEB
Valued Contributor

LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I have two DL380G5 servers with SC44Ge HBA cards connected to StorageWorks 1760 1U rackmount tape drives.

Both servers fail to backup using NT Backup or ARCserve 12.0 SP2.

The event log shows Error Event Source: Lsi_sas
Event Category: None
Event ID: 11
Description:
The driver detected a controller error on \Device\RaidPort1.

I have upgraded the HBA driver to v1.28.2.1 (B) and the firmware to 06.18.05.00 (1.23.43.00A)


Tape drive firmware is U52D

I have disabled all insight agent services too.
30 REPLIES 30
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Tape drive has been replaced.

Still get lsi_sas errors in event log and LTT read/write test fails.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

The HBA has been replaced also.
GustavoT
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

What version of OS are you running? See if you are also getting event IDs 129 along with the Event ID 11. That and if you are running Windows 2003 Server SP2 you may want to check the storport driver. All sp2 systems are required to run storport KB945119. Upgrading the storport driver could help you fix this problem.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

OS is Windows 2003 R2 SP2.

I am only getting EventID 11.

I have tried every hotfix for storport.sys bar this one!

I have another server which is running the same hardware but the firmware version and scsi driver have never been updated since install. The storport.sys is SP2 version though.

Unfortunately I cannot downgrade the firmware on the hba as it does not allow that.

I will try this storport.sys, thank you for your help.

Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Post the full DWord format binary details from the event log.

The 1U rack mount enclosure has an active interface card internal to the box. Has that been replaced or the cables?
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

The cables were replaced and the HBA. I believe just the actual drive was shipped. This server is the other side of the world to me.

If the internal interface card is faulty then this would make sense as my local technician swapped the complete rackmount unit with another at the DR site. Perhaps there is an issue with this model?

I have since tried all newer firmware and driver revisions.

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 25/09/2009
Time: 8:50:55 AM
User: N/A
Computer: CANFP1
Description:
The driver detected a controller error on \Device\RaidPort1.

0000: 0018000f 00680001 00000000 c004000b
0010: 31130000 00000000 00000000 00000000
0020: 00000000 00000000 00000000 00000000
0030: 00000000 c004000b 00000000 00000000

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 15/05/2009
Time: 9:04:01 PM
User: N/A
Computer: CANFP1
Description:
The driver detected a controller error on \Device\RaidPort1.

Data:
0000: 0010000f 00680001 00000000 c004000b
0010: ad0f1600 00000000 00000000 00000000
0020: 00000000 00000000 00000000 00000000
0030: 00000000 c004000b

First message was before the tape drive was swapped.

I see now that HP have closed my support case even though I told them I still cannot perform a backup successfully.
Curtis Ballard
Honored Contributor
Solution

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

This error record dump looks familiar. I think you may have posted it on another thread where we were looking at a different issue.

The issues that we have identified and fixed in the various firmware was for an abort after a command was sent to the device. In generic terms a "hang" condition.

This is a different failure and the error data is reporting that the host issued an abort before the HBA had even processed the command to send on to the target. I'm not certain what might cause that.

You mention that the CE replaced the rack enclosure at your DR site. Is that the site that is having problems or was that another site that previously had problems but is now working?

You may have mentioned it before but I can't see most of your comments during a reply. What length cables are you using? I assume you are using the standard skinny "tape" cable. If you have a normal full width SAS cable it might be worth trying that just to eliminate the possibility that somehow you got multiple bad cables. There is nothing unique about the special tape cable except that it is a little cheaper than the standard cables because it doesn't need all the extra SAS pairs.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

The tape drive and HBA have definitely been replaced. I'm told it was the actual drive that was swapped. The 1U enclosure is still the same.

I have an exact same setup at DR site and the SAS cable was taken from there to try. The cable being used is the one that came with the tape unit.

Both sites still have issues, with my primary site getting progressively worse. This morning there is lots of these events:

Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Date: 28/09/2009
Time: 7:21:02 AM
User: N/A
Computer: CANFP1
Description:
The device 'Hewlett Packard LTO Ultrium-4 drive' (SCSI\Sequential&Ven_HP&Prod_Ultrium_4-SCSI&Rev_U52D\5&2f1e44f4&0&000500) disappeared from the system without first being prepared for removal.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 00 00 00 00 ....

I have just checked and the tape drive is showing in device manager again.

I will run a test and post the results.

Unfortunately this site is in Canada, I am in the UK and my local support person is away for the next few weeks. I trying to get hold of an alternative technician.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

What storport version is required? Someone suggested 5.2.3790.4189. I have tried this but have since reverted to 5.2.3790.3959.
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I'm not aware of any Storport driver dependencies. Later drivers introduce a "feature" that requires a new registry entry for the behavior when a command can't be sent immediately.

See:
http://support.microsoft.com/kb/932755

You should set the BusyRetryCount to at least 75 and higher wouldn't hurt. If you have that registry entry then there shouldn't be any problem with the most recent Storport drivers.

The failure your system is logging might be consistent with what would happen if that registry entry wasn't set as the OS could give up and abort the command without it ever getting sent out. I don't know exactly what the event log signature is for the failures caused by this registry entry not being set correctly.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Great I'll give this a go.

I've check and there isn't a key at all for this. Perhaps I should create it?
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I've installed that hotfix.

When I rebooted the server the event log filled up with these messages:

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 29/09/2009
Time: 3:00:00 AM
User: N/A
Computer: CANFP1
Description:
The driver detected a controller error on \Device\RaidPort1.

Data:
0000: 0f 00 18 00 01 00 68 00 ......h.
0008: 00 00 00 00 0b 00 04 c0 .......├Г
0010: 00 00 13 31 00 00 00 00 ...1....
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 00 00 00 00 ........
0030: 00 00 00 00 0b 00 04 c0 .......├Г
0038: 00 00 00 00 00 00 00 00 ........

Event Type: Error
Event Source: PlugPlayManager
Event Category: None
Event ID: 12
Date: 29/09/2009
Time: 3:00:00 AM
User: N/A
Computer: CANFP1
Description:
The device 'Hewlett Packard LTO Ultrium-4 drive' (SCSI\Sequential&Ven_HP&Prod_Ultrium_4-SCSI&Rev_U52D\5&2f1e44f4&0&000500) disappeared from the system without first being prepared for removal.

Data:
0000: 00 00 00 00 ....

The drive is appearing and disappearing from Device Manager constantly.
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

If you installed a hotfix from the Microsoft KB article I pointed to you probably want to remove that. The information about the new registry entry that is required is in the document but the hotfix is old and the changes have been rolled into the standard distribution.

The "BusyRetryCount" registry entry has to be created and then set to at least 75.

It sounds like things are getting worse. You are getting errors on boot up? Any of the known issues really should be infrequent and usually only during activity. If you are having errors during the boot phase that sounds like there must be some faulty hardware. I would be a bit suspicious of the SAS signal conditioner in the 1U enclosure. That is P/N 403721-002. Your CE should be aware of it.

If there is any way you could get a loaner standalone drive that would be a really interesting test.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

After a power cycle the tape drive is showing a hardware error now. LTT reports the drive should no longer be used. I have an open support case for this so it's really upto HP how they want to progress now.

I have installed hotfix KB957910, storport.sys version 4485, onto the other server I had the same issues with. I downgraded the StorageWorks 1760 firmware to U29D also. I've been able to run several successful backups using ARCserve.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Yes it looks like I have a stable backup on my DR server now.

SC44Ge FW 1.23.43.0, Driver 1.28.2.1
Ultrium 1760 FW U29D, Driver 1.0.5.2

Currently getting a throughput of 5600MB/minute.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Forgot to mention Proliant Support Pack 8.30 was installed and Storport.sys version 5.2.3790.4485, no registry changes were made.
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I wouldn't recommend leaving a 1760 at U29D with the SC44Ge controller long term. Eventually if you load it heavily there will be an error but U52D can recover from that error.

I'm glad to hear that there has been some improvement.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I downgraded from U52D. Backup would fail a minute or two into NT Backup job. ARCserve failed almost immediately.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I was able to do several backups on my 2nd server. I have done a scan and merge of the tape this morning and there is an error in the log. There is a LSI_SAS error in the system log also:

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 30/09/2009
Time: 1:53:41 AM
User: N/A
Computer: CANFP2
Description:
The driver detected a controller error on \Device\RaidPort1.

Data:
0000: 0f 00 18 00 01 00 68 00 ......h.
0008: 00 00 00 00 0b 00 04 c0 .......├Г
0010: 00 00 19 31 00 00 00 00 ...1....
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 00 00 00 00 ........
0030: 00 00 00 00 0b 00 04 c0 .......├Г
0038: 00 00 00 00 00 00 00 00 ........

ARCserve seems to be quite happy to carry on though and I was able to restore a 100GB file.

We have an option to change the drive to a StorageWorks 1840 SAS. Would this model suffer from the same issue?

My boss is suggesting we should throw money at the problem now as we have not had a decent backup for so long.
Curtis Ballard
Honored Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

It is hard to say if the 1840 would behave similar to the 1760. The SAS chipset in the two drives is completely different and there are quite a few differences in the firmware.

From the description of your history I don't think the problem is so much the drive hardware as it is a configuration issue.

Occasional LSI_SAS errors in the log is usually at least one app (often Microsoft plug and play) that is constantly checking for "are you there" and occasionally it won't be possible to respond in time so an error will occur on that "ping" but it doesn't cause any problems with backup/restore. The error you posted was not an ABORT of any kind and didn't even contain a bus error code so it may not have been a bus error.
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

That's interesting info.

Thanks very much for your help on this.

HP US support have admitted that there is a known issue with the signal controller card that you mentioned.

We're waiting now to see how this is going to be progressed.

CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

We have had the signal card replaced inside the 1U rackmount unit. We were able to perform a normal backup afterwards. About 90GB.

I wanted to backup some big archive files, four files that are 80-85GB in size.

This got to about 200GB and then failed with a Windows NT SCSI port error. Same error message in the event log as previously.

Event Type: Error
Event Source: Lsi_sas
Event Category: None
Event ID: 11
Date: 17/10/2009
Time: 6:50:45 AM
User: N/A
Computer: SERVER1
Description:
The driver detected a controller error on \Device\RaidPort1.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 0f 00 18 00 01 00 68 00 ......h.
0008: 00 00 00 00 0b 00 04 c0 .......├Г
0010: 00 00 19 31 00 00 00 00 ...1....
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
0028: 00 00 00 00 00 00 00 00 ........
0030: 00 00 00 00 0b 00 04 c0 .......├Г
0038: 00 00 00 00 00 00 00 00 ........
CLEB
Valued Contributor

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

Forgot to mention the drive is running U52D formware.
Henrich Kovacik
New Member

Re: LSI_SAS event ID 11 with SC44Ge and StorageWorks 1760 1U Rackmount

I have the same problem with the same hardware and software, particularly when I try to backup bigger files (more than 20GB).
If you find any solution, please post it to this thread.