SCSI: Abort abandoned -- lbolt:

Duffs · ‎03-06-2006

Hi,

An N-class server appeared to crash at the weekend and I need some help as to why this happened. I received an alert that the ssh service had stopped on the box so I attempted to remotely telnet onto the box unsuccessfully. I hooked a console up to the server and the screen was blank. The LED showed that the 2nd disk had a flashing green light but the primary disk was not. As I could not get access to the GDP I rebooted the box (power off/on). The system did not come back up. I could however access the GDP but was unable to boot the box up from the pri or alt disk. Neither was I able to boot into single user mode.

I was able to boot into maintenance mode from the ISL (hpux -lm) and from here managed to manually bring the system up through the run levels to multi-user mode.

The only errors I could find to explain what happened here was from the OLDsyslog.log which produced the following as its last entry:

vmunix: SCSI: Abort abandoned -- lbolt: 1410939057, dev: 1f026000, io_id: 20c3f76, status: 200

The two system disks are fine as I tested using 'dd' and no i/o errors were detected. Has anyone seen a similar issue happen before?

Rgds,
Duffs.

Peter Godron · ‎03-06-2006

Duffs,
lbolts are normally down to SCSI bussproblems, either caused by failed disks, power interrupts,cable faults or termination issues.

1f026000 I think translates:
1f = Decimal 31, which is sdisk device major number
and 026000 identifies minor number.
So, faulty device should be c2t6d0.

Can you do a:
diskinfo -v c2t6d0

and check the output

Duffs · ‎03-06-2006

Hi Peter,

The output of the following command:

# diskinfo -v /dev/rdsk/c2t6d0

SCSI describe of /dev/rdsk/c2t6d0:
vendor: SEAGATE
product id: ST318203LC
type: direct access
size: 17783240 Kbytes
bytes per sector: 512
rev level: HP01
blocks per disk: 35566480
ISO version: 0
ECMA version: 0
ANSI version: 2
removable media: no
response format: 2
(Additional inquiry bytes: (32)52 (33)46 (34)33 (35)33 (36)36 (37)35 (38)33 (39)0 (40)0 (41)0 (42)0 (43)0 (44)0 (45)0 (46)0 (47)0 (48)0 (49)0 (50)0 (51)0 (52)0 (53)0 (54)0 (55)0 (56)0 (57)0 (58)0 (59)0 (60)0 (61)0 (62)0 (63)0 (64)0 (65)0 (66)0 (67)0 (68)0 (69)0 (70)0 (71)0 (72)0 (73)0 (74)0 (75)0 (76)0 (77)0 (78)0 (79)0 (80)0 (81)0 (82)0 (83)0 (84)0 (85)0 (86)0 (87)0 (88)0 (89)0 (90)0 (91)0 (92)43 (93)6f (94)70 (95)79 (96)72 (97)69 (98)67 (99)68 (100)74 (101)20 (102)28 (103)63 (104)29 (105)20 (106)31 (107)39 (108)39 (109)39 (110)20 (111)53 (112)65 (113)61 (114)67 (115)61 (116)74 (117)65 (118)20 (119)41 (120)6c (121)6c (122)20 (123)2 (124)1e (125)b3 (126)90 (127)0 (128)0 (129)2 (130)0 (131)0 (132)0 (133)0 (134)0 (135)0 (136)0 (137)0 (138)0 )

To me this looks normal?

Rgds,
Duffs

Steven E. Protter · ‎03-06-2006

shalom Duffs,

With the exception of when I swap out a hot swap drive, every time I get an lbolt it eventually results in drive replacement.

I'd get good backups made and prepare for that eventuality.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Matti_Kurkela · ‎03-06-2006

By GDP I assume you mean GSP, right?

If you use a console terminal, the screen will of course be blank if you're just connected it to the server. The server does not generally keep track of what should be on the console's screen: the console does it by itself. After connecting the console, you generally press some keys to see if the server reacts.

Pressing Enter once or twice should normally bring up a login prompt. If that does not work, press Ctrl-B to access the GSP. If that does not work either, the GSP is probably hung. The server may be fine, the data just isn't going through the GSP to the console.

See the image on the page 97 of this file:
http://docs.hp.com/en/3687/rp7400_customer_hardwaremanual.pdf

Item 20 is the GSP reset button. If you cannot access the GSP, try pushing that first. It resets only the GSP without bothering the server proper.

Resetting the server from the power switch will not actually reset the GSP. Only pulling all the power cords physically off will remove power from the GSP.

Check the GSP firmware version (from the console, press Ctrl-B, then enter command HE and see the top line of the help information). If it's very old, it might be useful to update it. Newer GSP firmware versions are generally more stable than the old ones.

Did you check the GSP error log? (Ctrl-B and command SL, then E for error)

You can decode the dev: -number from the lbolt message. The first two digits are 1f hexadecimal, which is 31 in decimal. That tells us it's a device in /dev/dsk. The device nodes in /dev/dsk all have major number 31, and the rest of the dev:-number is the minor number.

Do a "ll /dev/dsk" and see which of the disk devices has numbers "31 0x026000" in their listing.

Check the firmware version of the disks using the "diskinfo -v" command. There was a disk firmware problem that caused disks to fail in use and then come back after the power was cycled. It was mainly with the L class servers, but I guess a N-class server might be just the right age to have the same problem.

MK

Andrew Rutter · ‎03-07-2006

hi,

lbolt errors are usually down to termination and timeouts with larger systems. If its just a small system with not many disks then this could be the start of a disk failure.

It is however abit starnge that this was an error in the oldsyslog.log. what was the date stamp on the log?

are the disks mirrored as the box shouldnt have hung if one disk failed and not allowed you to reboot the alt disk?

from the diskinfo command you posted you do have an older version of firmware on the disk that should be upgraded to stop the disks from going offline. excert from patch details
PF_DSEACH3HP04:
A problem has been identified with certain Cheetah III
disk drives that use a Cypress SRAM (9, 18 and 36 GB).
The most common symptom seen is that the drive goes
offline and is inaccessible to the system. In some
instances, the drive has been reported to have a solid
LED or a flash code. The problem may be temporarily
corrected by a bus reset or by unplugging and
reconnecting the drive. Updated firmware that addresses
the problem is labeled HP04.

downloadable from here

http://www4.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.main|patch.breadcrumb.search|&patchid=PF_DSEACH3HP04&context=firmware:disk

probbaly unlikely for both disks to encounter the same state at the same time though?

As you were in fact able to go through the run levels to get the box up I would also check the lif area on the disks aswell, And check the mirroing.

I would also check the logs in GSP and check for any HPMC's with STM or pdcinfo.

Andy

Duffs · ‎03-07-2006

Matti,

Yes I mean GSP. Maybe I should have been more clear, yes I hit a few keys after hooking up the console and Ctrl-B didn't prompt for GSP login either.

I checked the GSP error logs but they didn't tell me much. I will ckeck the GSP firmware version, thanks for your help.

Andrew,

The OLDsyslog.log entry was timestamped around the time when the server alerts began, which I suspect is when the server hung. The LIF area looks fine on both disks as does the mirroring. I will look into possible firmware upgrades and hope that eliminates the chances of this occuring again. Thanks for your help!

Rgds,
Duffs

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

SCSI: Abort abandoned -- lbolt:

SCSI: Abort abandoned -- lbolt:

Re: SCSI: Abort abandoned -- lbolt:

Re: SCSI: Abort abandoned -- lbolt:

Re: SCSI: Abort abandoned -- lbolt:

Re: SCSI: Abort abandoned -- lbolt:

Re: SCSI: Abort abandoned -- lbolt:

Re: SCSI: Abort abandoned -- lbolt: