System Administration
Showing results for 
Search instead for 
Do you mean 

HP L1000 running HPUX 11.0 is crashing often -

Occasional Contributor

HP L1000 running HPUX 11.0 is crashing often -

Each week, we are getting a show-stopping crash... i have to cycle the power to get it back together. This is what i am getting in the syslog. I am new, and need a clue:

Jul 4 02:45:12 vision vmunix: SCSI: Abort abandoned -- lbolt: 31165162, dev: 1f022000, io_id: 201acf4, status: 200
Jul 4 02:45:13 vision vmunix:
Jul 4 02:45:13 vision vmunix: SCSI: Read error -- dev: b 31 0x022000, errno: 126, resid: 2048,
Jul 4 02:45:13 vision vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 2048.
Jul 4 02:45:13 vision vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f022000) is POWERFAILED
Jul 4 06:17:41 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: UPS /dev/tty0p1 AC POWER FAILURE - running on UPS batter
y
Jul 4 06:17:41 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: AC Power to all recognized, system critical UPS's OK! Sy
stem will not shutdown.
Jul 4 06:17:44 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: UPS /dev/tty0p1 OK: AC Power back on
Jul 4 06:17:44 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: AC Power to all recognized, system critical UPS's OK! Sy
stem will not shutdown.

Thanks,
Lynn
4 REPLIES
Trusted Contributor Trusted Contributor

Re: HP L1000 running HPUX 11.0 is crashing often -

I agree it looks like a bad drive.

If you type in vgdisplay -v it may help you detect which drives are causing the problem.

Positive Results requires Positive Thinking
Honored Contributor Honored Contributor

Re: HP L1000 running HPUX 11.0 is crashing often -

Here is how you decode "lbolt" messages. It will tell you the exact device. (hp-ux could tell you directly, but that would be too easy):

How to decode an lbolt error
=====================================================================
gbo390-d:~abramss/doc/11/lbolt.decode SDA 11/11/98


1. Get the "dev:" entry from the lbolt:

# dmesg | grep lbolt | grep dev:

SCSI: Abort -- lbolt: 18346341, dev: e7015000, io_id: 122e9a3
SCSI: Request Timeout -- lbolt: 18351441, dev: e7015000
SCSI: Abort -- lbolt: 18351441, dev: e7015000, io_id: 122e9be
SCSI: Request Timeout -- lbolt: 18356641, dev: e7015000
SCSI: Abort -- lbolt: 18356641, dev: e7015000, io_id: 122e9cf
SCSI: Request Timeout -- lbolt: 18362141, dev: e7015000
SCSI: Abort -- lbolt: 18362141, dev: e7015000, io_id: 122e9e0
SCSI: Request Timeout -- lbolt: 74105435, dev: 1f000000
SCSI: Abort Tag -- lbolt: 74105435, dev: 1f000000, io_id: 4ead34

Here we have two:

1f
e7

2. This is the major number of the device in question. Convert the first
two digits of the device from hex to decimal:

# printf "%#d\n" 0x1f
31

3. find out what driver this major number is. It tells us the type of
device:

# lsdev 31

Character Block Driver Class
188 31 sdisk disk

So, this is probably a disk !


4. Find the device file entry from the remainder of the lbolt error:

SCSI: Abort Tag -- lbolt: 74105435, dev: 1f000000, io_id: 4ead34

This is the minor number for the device that is failing.

a. Block device:

# ll -R /dev/ | grep 31 | grep 0x000000

brw-r----- 1 bin sys 31 0x000000 Jul 15 16:25 c0t0d0

Or:

b. Character Device:

# ll -R /dev/ | grep 188 | grep 0x000000
crw-r----- 1 bin sys 188 0x000000 Oct 11 07:15 c0t0d0

5. Find the Hardware Address:

# lssf /dev/dsk/c0t0d0
sdisk card instance 0 SCSI target 0 SCSI LUN 0 section 0
at address 0/0/0.0.0 /dev/dsk/c0t0d0


6. Find the type of device:

# diskinfo /dev/rdsk/c0t0d0# diskinfo /dev/rdsk/c0t0d0
SCSI describe of /dev/rdsk/c0t0d0:
vendor: DGC
product id: C2300WDR1
type: direct access
size: 4102875 Kbytes
bytes per sector: 512


So, we have a Nike disk at hardware address 0/0/0.0.0, device file
/dev/dsk/c0t0d0


Honored Contributor Honored Contributor

Re: HP L1000 running HPUX 11.0 is crashing often -

Here is what this means:

LVM: vg[1]: pvnum=0 (dev_t=0x1f022000

The "0-th" disk in the /etc/lvmtab for the VG with minor number "01":

# ll /dev/*/group
crw-r--r-- 1 root sys 64 0x4a0000 Mar 18 09:28 /dev/05inst98/group
crw-r--r-- 1 root sys 64 0x490000 Mar 5 10:24 /dev/05inst99/group
crw-r--r-- 1 root sys 64 0x010000 Aug 16 2001 /dev/05vg01/group

.. (The above is minor number "01".) ..

crw-r--r-- 1 root sys 64 0x020000 Aug 16 2001 /dev/05vg02/group
crw-r--r-- 1 root sys 64 0x030000 Aug 16 2001 /dev/05vg03/group
crw-r--r-- 1 root sys 64 0x050000 Aug 16 2001 /dev/05vg04/group

# strings /etc/lvmtab | grep dev | more
/dev/vg00
/dev/dsk/c1t6d0
/dev/dsk/c2t6d0
/dev/05vg01
/dev/dsk/c3t2d2 <== The "0-th" disk
/dev/dsk/c5t2d2
/dev/dsk/c7t2d2
Honored Contributor Honored Contributor

Re: HP L1000 running HPUX 11.0 is crashing often -

Hi Lynn,

Appears to me that you have two separate problems here:

1) Disk c2t2d0 is causing SCSI errors - I'd replace it at your earliest opportunity.

2) You're either losing power or you have a bad/flaky power monitor board. If you can verify that in fact you're not losing power, then I'd log a HW call w/HP as there are known issues w/early L-class power monitor boards that cause all sorts of "false" errors. There is also a possibility that the UPS itself is the root cause of this. Either way this one is the problem that is most likely to be causing the reboots.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!