StoreEver Tape Storage
1751901 Members
5065 Online
108783 Solutions
New Discussion юеВ

Re: SCSI Data Overrun on Storagetek L700 Library Controller

 
Oliver Charni
Trusted Contributor

SCSI Data Overrun on Storagetek L700 Library Controller

I've got a Problem on an Storagetek L700 - Library connected direct through SCSI to an HP N-Class. Since 3-4 Weeks ago there are recurring unforced "Resets" of the Library. It doesnt actually reset I think cause its pingable all the Time. Theres a SCSI-Reset(shows up in syslog) being sent to the Controller-Device and that forces the Controller to reboot/reset.
It allways happens after a Data Overrun on the Contrller:
May 13 08:39:43 uxbupn05 vmunix: SCSI: Data Overrun -- lbolt: 5784081, dev: cb065000

Does anyone know how I can trace back the Dataoverrun or have a Clue what could produce this Error ?

regards
oliver
if it smell's funny on the outside, it's worse on the inside
6 REPLIES 6
Thayanidhi
Honored Contributor

Re: SCSI Data Overrun on Storagetek L700 Library Controller

Hi,

There could be SCSI cable/terminator problem, even a poor grounding can cause lbolt errors. I am not very sure about "data overrun", may be some thing to do with buffer or SCSI cumulative patch!!

Analyse the lbolt errors as below.
=================================

Lbolt error itself a time stamp.

For example if you receive vmunix: SCSI: Request Timeout; Abort -- lbolt: 120842710, dev: cd061000, io_id: 6684ee3

The SCSI error occurred after 120842710 milliseconds of system reboot. You can find out system uptime

By #uptime and calculate when the SCSI lbolt occurred.

The second useful field is ├в dev├в . From example dev:cd061000, the ├в cd├в stand for major number of the device in hex.

The decimal value for ├в cd├в is 205.

#ll /dev/(r)dsk/*
or
#ll /dev/rmt/*

Check for major number 205.

06 is bus number. Using ioscan we can findout Bus number.

#ioscan ├в kfnC ext_bus

check for bus number 6.

10 is SCSI id 1 and lun 0.

Once the you identify the bus/device, make sure all the cables are secured properly.

Replace the cables/terminators if suspected.

Make sure latest SCSI/IO Cumulative patches are installed.

If it is disk device change timeout value with pvchange ├в t option.

Also see the driver associated with the device may need t
Attitude (not aptitude) determines altitude.
Oliver Charni
Trusted Contributor

Re: SCSI Data Overrun on Storagetek L700 Library Controller

I allready got to the appropriate Devicefile.
dev: cb065000 is the Library Controller(/dev/rac/pickers and /dev/rac/c6t5d0).
The Terminator as well as the Cable have allready been replaced and we also tried a different Driver. Changing the Driver only made Things worse cause since then we also encounter hangs on the SCSI-BUS which forces us to reboot the Library to get it back to working normal.

if it smell's funny on the outside, it's worse on the inside
Alzhy
Honored Contributor

Re: SCSI Data Overrun on Storagetek L700 Library Controller

Hmm..

1. How long has this environment been up and running?
2. Is this a new configuration/setup?
3. How many drives from the L700 do you have
4. What kind of SCSI (Differential) cards do you use and how many?

I suspect you've daisy chained the robot along with one or 2 drives. I always hook up a Tape library's robotics to its own SCSI port if it is directly attached to a server.

You may also look at updating your SCSI patches.
Hakuna Matata.
Thayanidhi
Honored Contributor

Re: SCSI Data Overrun on Storagetek L700 Library Controller

I think you already done enough to probe the cause!

Along with robotics is there many devices in the same SCSI bus? Are you up to date with HWE and QPK patches? (consider minimum SCSI patches).

How often this error comes? During that time was this SCSI busy with some other device?

See also

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=721761

Attitude (not aptitude) determines altitude.
Oliver Charni
Trusted Contributor

Re: SCSI Data Overrun on Storagetek L700 Library Controller

The Enviroment has been running stable for about 1 1/2 Years now without any significant changes in the last 3 Months. Last Thing we did was the Upgrade to DP5.5 in March. The Machine itself is patched with the latest Patch-Bundle supplied by HP. We cant patch on our own due to support-Contracts.
The Error occurs on an unregular basis. 1 week nothing then twice in 4 days. The Controller was Daisy-chained with a Drive but we changed that about 2 Weeks, with no Improvement.
There's a total of 16 Drives, 6 through SAN on NT-Hosts, 1 on an NS700(NDMP) and 9 are connected to the UNIX-Host(direct through SCSI).
if it smell's funny on the outside, it's worse on the inside
Michael Lampi
Trusted Contributor

Re: SCSI Data Overrun on Storagetek L700 Library Controller

Any time you get a Data overrun error on a SCSI bus means that the integrity of that bus is in severe doubt. The error means that more bytes were received than were expected. This is a hardware error.

Since you have already replaced the cables and terminator that leaves the Library and the HP SCSI controller. I'd suggest replacing the HP SCSI controller. If that doesn't fix the problem, then replace the L700 controller.

Regards,

Michael Lampi
A journey of 1000 steps ends in a mile.