Operating System - HP-UX
1833322 Members
3115 Online
110051 Solutions
New Discussion

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

 
seanh
Occasional Advisor

Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Hi there,

I noticed the following in /var/adm/syslog/syslog.log :

May 25 07:44:33 sul7it11 vmunix: SCSI: Resetting SCSI -- lbolt: 325099977, bus: 0
May 25 07:44:33 sul7it11 vmunix: SCSI: Reset detected -- lbolt: 325099977, bus: 0

May 25 07:44:39 sul7it11 EMS [1922]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/0_0_1_0.10.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 125960200 -r /storage/events/disks/default/0_0_1_0.10.0 -n 125960193 -a
May 25 07:44:40 sul7it11 EMS [1922]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/0_0_1_0.9.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 125960197 -r /storage/events/disks/default/0_0_1_0.9.0 -n 125960194 -a
May 25 07:44:40 sul7it11 EMS [1922]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/0_0_1_0.11.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 125960203 -r /storage/events/disks/default/0_0_1_0.11.0 -n 125960195 -a
May 25 07:44:41 sul7it11 EMS [1922]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/0_0_1_0.8.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 125960194 -r /storage/events/disks/default/0_0_1_0.8.0 -n 125960196 -a

....and then....................

May 25 11:05:17 sul7it11 vmunix: SCSI: Resetting SCSI -- lbolt: 326304419, bus: 0
May 25 11:05:17 sul7it11 vmunix: SCSI: Reset detected -- lbolt: 326304419, bus: 0

....later this morning.


The system appears to be fine but these errors are all for the disks on one side of the controller.

The EMS commands that it advised to run, all document the following :

http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100091

IOSCAN results...........

# ioscan -fnC disk
Class I H/W Path Driver S/W State H/W Type Description
======================================================================
disk 0 0/0/1/0.8.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c0t8d0 /dev/rdsk/c0t8d0
disk 1 0/0/1/0.9.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c0t9d0 /dev/rdsk/c0t9d0
disk 2 0/0/1/0.10.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c0t10d0 /dev/rdsk/c0t10d0
disk 3 0/0/1/0.11.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c0t11d0 /dev/rdsk/c0t11d0
disk 4 0/0/1/1.2.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 5 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 304
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
disk 6 0/4/0/0.8.0 sdisk CLAIMED DEVICE SEAGATE ST318203LC
/dev/dsk/c4t8d0 /dev/rdsk/c4t8d0
disk 7 0/4/0/0.9.0 sdisk CLAIMED DEVICE SEAGATE ST118202LC
/dev/dsk/c4t9d0 /dev/rdsk/c4t9d0
disk 8 0/4/0/0.10.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c4t10d0 /dev/rdsk/c4t10d0
disk 9 0/4/0/0.11.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c4t11d0 /dev/rdsk/c4t11d0
disk 10 0/4/0/0.12.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c4t12d0 /dev/rdsk/c4t12d0


Can anyone advise further as to what may be causing this please.........

thanks,
Sean
11 REPLIES 11
Tim Nelson
Honored Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

My experience has been one of the below.

A disk on that chain is causing scsi resets on the bus.

Loose cables or termination issues are causing the bus to reset.

If this is a DS-series subsystem I have seen power problems causing this.

Look at each disk using stm. If the number of errors is increasing on any one disk you may need to replace it. If the errors are on all disk then some isolation is needed. i.e. remove a disk and see if errors stop. if not remove a different one.

DCE
Honored Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Sean,

EMS is telling you there is more info available on the error run the suggested command and it will display the info and suggest next steps

/opt/resmon/bin/resdata -R 125960194 -r /storage/events/disks/default/0_0_1_0.8.0 -n 125960196 -a


Darrel Louis
Honored Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Sean,

Can you post the output of
/opt/resmon/bin/resdata -R 125960194 -r /storage/events/disks/default/0_0_1_0.8.0 -n 125960196 -a

When performing a ioscan do you still see the disks as claimed?

Darrel
Albert_31
Trusted Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

send even the /var/opt/resmon/log/event.log as well
seanh
Occasional Advisor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Hi guys,

Thanks for all the replies.

For your information, no more lbolt messages have been issues since the last ones yesterday at 11:00 ish.

Here is the results of the command informed by EMS to run............


# /events/disks/default/0_0_1_0.8.0 -n 125960196 -a <

CURRENT MONITOR DATA:

Event Time..........: Thu May 25 07:44:41 2006
Severity............: MAJORWARNING
Monitor.............: disk_em
Event #.............: 100091
System..............: sul7it11

Summary:
Disk at hardware path 0/0/1/0.8.0 : Software configuration error


Description of Error:

The device is in a condition where it requires action on the part of the
device driver or a human operator.

Probable Cause / Recommended Action:

The device has been reset by a Bus Device Reset message, a hard reset
condition, or a power-on reset.

If this is the case, no action is necessary.

Alternatively, a removable medium has been loaded or replaced.

If this is the case, no action is necessary.

Alternatively, the mode parameters, microcode, or inquiry data for the
device have been changed.

If this is the case, no action is necessary.

Alternatively, the installed version of the device driver does not match
that of the installed version of HP-UX. Install the correct version of the
driver.

Additional Event Data:
System IP Address...: 127.127.72.192
System IP Address...: 127.127.72.192
Event Id............: 0x4475525900000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x4475525600000003
Additional System Data:
System Model Number.............: 9000/800/L2000-44
OS Version......................: B.11.00
STM Version.....................: A.42.00
EMS Version.....................: A.03.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100091

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path...: 0/0/1/0.8.0
Device Class...........: Disk
Inquiry Vendor ID......: SEAGATE
Inquiry Product ID.....: ST318404LC
Firmware Version.......: HP04
Serial Number..........: 3BT1SA0L00002116JC24

Product/Device Identification Information:

Logger ID.........: sdisk
Product Identifier: SCSI Disk
Product Qualifier.: SEAGATEST318404LC
SCSI Target ID....: 0x08
SCSI LUN..........: 0x00

I/O Log Event Data:

Driver Status Code..................: 0x0000000B
Length of Logged Hardware Status....: 22 bytes.
Offset to Logged Manager Information: 24 bytes.
Length of Logged Manager Information: 34 bytes.

Hardware Status:

Raw H/W Status:
0x0000: 00 00 00 02 70 00 06 00 00 00 00 0A 00 00 00 00
0x0010: 29 02 02 00 00 00

SCSI Status...: CHECK CONDITION (0x02)
Indicates that a contingent allegiance condition has occurred. Any
error, exception, or abnormal condition that causes sense data to be
set will produce the CHECK CONDITION status.

SCSI Sense Data:

Undecoded Sense Data:
0x0000: 70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00
0x0010: 00 00

SCSI Sense Data Fields:
Error Code : 0x70
Segment Number : 0x00
Bit Fields:
Filemark : 0
End-of-Medium : 0
Incorrect Length Indicator : 0
Sense Key : 0x06
Information Field Valid : FALSE
Information Field : 0x00000000
Additional Sense Length : 10
Command Specific : 0x00000000
Additional Sense Code : 0x29
Additional Sense Qualifier : 0x02
Field Replaceable Unit : 0x02
Sense Key Specific Data Valid : FALSE
Sense Key Specific Data : 0x00 0x00 0x00

Sense Key 0x06, UNIT ATTENTION, indicates that the target has been
reset by a BUS DEVICE RESET message, a hard reset condition, or by a
power-on reset. If not a reset, then one of the following may have
occurred.
1. A removable medium may have been changed.
2. The mode parameters in effect for this initiator have been
changed by another initiator.
3. The version or level of microcode has been changed.
4. Tagged commands queued for this initiator were cleared by
another initiator.
5. INQUIRY data has been changed.
6. The mode parameters in effect for this initiator have been
restored from non-volatile memory.
7. A change in the condition of a synchronized spindle.
8. Any other event that requires the attention of the initiator.

SCSI Command Data Block:

Command Data Block Contents:
0x0000: 2A 00 00 73 28 C2 00 00 02 00

Command Data Block Fields (10-byte fmt):
Command Operation Code...(0x2A)..: WRITE
Logical Unit Number..............: 0
DPO Bit..........................: 0
FUA Bit..........................: 0
Relative Address Bit.............: 0
Logical Block Address............: 7547074 (0x007328C2)
Transfer Length..................: 2 (0x0002)

Manager-Specific Data Fields:
Request ID.............: 0x008D7A5B
Data Residue...........: 0x00000400
CDB status.............: 0x00000002
Sense Status...........: 0x00000000
Bus ID.................: 0x00
Target ID..............: 0x08
LUN ID.................: 0x00
Sense Data Length......: 0x12
Q Tag..................: 0x65
Retry Count............: 0


Ioscan shows all disks as being CLAIMED.

hmmmmmmm

Albert_31
Trusted Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Hello Sean,

Did you check if any backup programs running at that particular time..

other than that..nothing much to probe, as it was reset on the bus and the error points it to be a software configuration error..

regards

albert
Darrel Louis
Honored Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Hi,

How many times did the error message occur?

Have done a dd test to see if the disk is really OK.
dd if=/dev/rdsk/c4t8d0 of=/dev/null bs=64k.

Darrel
seanh
Occasional Advisor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Albert,

No backups running...........
Andrew Merritt_2
Honored Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Hi Sean,
A number of comments:

'lbolt' is effectively just a timestamp; it's not an error indication in itself, just a record of when something happened.

The root cause is whatever it causing the SCSI resets on the bus. The EMS events are just a consequence of that as each disk reports the reset. Most likely it's a hardware problem as Tim suggests.

How often are you seeing this happen? If more than once, is it always at a certain time?

You can also see the EMS events in the /var/opt/resmon/log/event.log file (easier tan running 'resdata').

Running STM probably won't show you any error stats; the larger disks these days use SMART and don't make the error stats visible.

You have an old version of the OnlineDiags; you have A.42.00 (September 2003 release). You should upgrade to the latest, which is the A.44.00 (March 2004) and then install PHSS_34286 . (This won't stop the events, but will fix a number of known problems, and give you a supported version of the OnlineDiags.)
The link to the OnlineDiags is:

http://www.software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE

You can also go to http://www.software.hp.com and then type "B6191AAE" in the search box.

http://docs.hp.com/en/diag/stm/stm_ptch.htm shows the latest patches.

Hope this helps,
Andrew
seanh
Occasional Advisor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

Hi guys,

Just a further update.
No more messages issues over the weekend but at 07:00 this morning, the following was issued..............


May 30 07:00:52 sul7it11 vmunix: SCSI: Abort averted -- lbolt: 368039648, dev: 1f009000, io_id: e17ec5, status: 02


Any thoughts ?
Mridul Shrivastava
Honored Contributor

Re: Resetting SCSI -- lbolt: errors.......in /var/adm/syslog/syslog.log

The device generating lbolt errors is c0t9d0...
I would suggest you to check the firmware version of this disk as well as HBA connected to this disk.
Time has a wonderful way of weeding out the trivial