Re: root disk failures

lastgreatone · ‎01-02-2002

For the 2nd time within one month there is a disk failure. Initially the mirror disk, now the primary disk. It occurs subsequent to the ora.shut script spawned by OB II 4.0 using a DLT 8/8 on the same system. The latest occurred Jan. 1 (01:00). Anyone else have a similar problem?

A. Clay Stephenson · ‎01-02-2002

Hi Frankie:

I've been running OB2 4.0 on my K-260 sandbox for about 3 months and have observed nothing like that. One thing that you have not made clear is whether you are suffering actual hardware failures or is the data being wiped/clobbered but the underlying hardware remains good. Also, are your two disks on the same bus or different buses. A bit more data please ...

Clay

If it ain't broke, I can fix that.

Wim Rombauts · ‎01-02-2002

Is it a disk failure (media error) or a disk corruption ? It's important to make the difference.

A disk failure is not caused by any software you run, but the software could try to read the damaged block and make your system panic. Try to read your disks with "dd" a couple of times and check if they complete succesfully. (dd if=/dev/rdsk/c.... of=/dev/null bs=1024k).

A corruption on only one disk in a mirrored configuration is very odd. In that case I should start checking the patch levels on the mirroring software.

Krishna Prasad · ‎01-02-2002

Is your DLT on the same bus as your root disk?

Sometimes heavy I/O (tape) can cause SCSI time-outs on the bus.

If your DLT is on the same bus try to move it to a seperate bus.

Positive Results requires Positive Thinking

Roger Baptiste · ‎01-02-2002

Hi Frankie,

Is the problem on external disks??
Is it on the same channel as the tape drives??

Ok, looking at your subject now , which says Root disk - i assume these are internal disks.
I don't think it is related with OB backup or dbshut program. Probably it's a coincidence that both the disks broke down.

Could you post the error messages when the disks errored out??

-raj

Take it easy.

fg_1 · ‎01-02-2002

Frankie

Please send the following info:

1) System model#
2) HW path of the disk and the tape drive.
3) last day's entries in syslog
4) dmesg file

This is something that cannot really be diagnosed without some of this data. I have been running ob2 here for a long time and haven't encountered this one at all.

Love to help but send this info first.

lastgreatone · ‎01-02-2002

I ran cstm and read the latest log which indicates a failed disk. pvstatus = unavailable. the dd cmd hangs. the DLT is off lba 782 0/6.

lastgreatone · ‎01-02-2002

L1000/rp5400 o.s. 11/64
HW patch: 0/0/1/1.2.0
syslog:
Jan 1 01:00:04 hp002 : su : + tty?? root-oracle
Jan 1 01:05:18 hp002 vmunix: DIAGNOSTIC SYSTEM WARNING:
Jan 1 01:05:18 hp002 vmunix: The diagnostic logging facility has started rec
eiving excessive
Jan 1 01:05:18 hp002 vmunix: errors from the I/O subsystem. I/O error entri
es will be lost
Jan 1 01:05:18 hp002 vmunix: until the cause of the excessive I/O logging is
corrected.
Jan 1 01:05:18 hp002 vmunix: If the diaglogd daemon is not active, use the D
aemon Startup command
Jan 1 01:05:18 hp002 vmunix: in stm to start it.
Jan 1 01:05:18 hp002 vmunix: If the diaglogd daemon is active, use the logto
ol utility in stm
Jan 1 01:05:18 hp002 vmunix: to determine which I/O subsystem is logging exc
essive errors.
Jan 1 01:05:43 hp002 vmunix: scb->cdb: 28 00 00 62 a3 64 00 00 04 00
Jan 1 01:05:43 hp002 vmunix: SCSI: Abort abandoned -- lbolt: 160413359, dev: 1f
012000, io_id: 113d0b6, status: 200
Jan 1 01:05:43 hp002 vmunix: scb->cdb: 2a 00 00 61 06 90 00 00 10 00
Jan 1 01:05:43 hp002 vmunix: scb->cdb: 28 00 00 39 ce 40 00 00 70 00
Jan 1 01:05:43 hp002 vmunix: SCSI: Abort abandoned -- lbolt: 160413463, dev: 1f
012000, io_id: 113d0b7, status: 200
Jan 1 01:05:43 hp002 vmunix: SCSI: Abort abandoned -- lbolt: 160413492, dev: 1f
012000, io_id: 113d0b5, status: 200
Jan 1 01:05:43 hp002 vmunix: LVM: vg[0]: pvnum=0 (dev_t=0x1f012000) is POWERFAI
LED
Jan 1 01:05:43 hp002 vmunix: scb->cdb: 28 00 00 39 ce 40 00 00 70 00
Jan 1 01:06:14 hp002 vmunix: DIAGNOSTIC SYSTEM WARNING:
Jan 1 01:06:14 hp002 vmunix: The diagnostic logging facility is no longer re
ceiving excessive
Jan 1 01:06:14 hp002 vmunix: errors from the I/O subsystem. 97 I/O error en
tries were lost.
Jan 1 01:53:59 hp002 : su : + tty?? root-oracle
dmesg:
DIAGNOSTIC SYSTEM WARNING:
The diagnostic logging facility has started receiving excessive
errors from the I/O subsystem. I/O error entries will be lost
until the cause of the excessive I/O logging is corrected.
If the diaglogd daemon is not active, use the Daemon Startup command
in stm to start it.
If the diaglogd daemon is active, use the logtool utility in stm
to determine which I/O subsystem is logging excessive errors.
scb->cdb: 28 00 00 62 a3 64 00 00 04 00
SCSI: Abort abandoned -- lbolt: 160413359, dev: 1f012000, io_id: 113d0b6, status: 200
scb->cdb: 2a 00 00 61 06 90 00 00 10 00
scb->cdb: 28 00 00 39 ce 40 00 00 70 00
scb->cdb: 28 00 00 39 ce 40 00 00 70 00
SCSI: Abort abandoned -- lbolt: 160413463, dev: 1f012000, io_id: 113d0b7, status: 200
SCSI: Abort abandoned -- lbolt: 160413492, dev: 1f012000, io_id: 113d0b5, status: 200
LVM: vg[0]: pvnum=0 (dev_t=0x1f012000) is POWERFAILED
DIAGNOSTIC SYSTEM WARNING:
The diagnostic logging facility is no longer receiving excessive
errors from the I/O subsystem. 97 I/O error entries were lost.
DIAGNOSTIC SYSTEM WARNING:
The diagnostic logging facility has started receiving excessive
errors from the I/O subsystem. I/O error entries will be lost
until the cause of the excessive I/O logging is corrected.
If the diaglogd daemon is not active, use the Daemon Startup command
in stm to start it.
If the diaglogd daemon is active, use the logtool utility in stm
to determine which I/O subsystem is logging excessive errors.
#

Volker Borowski · ‎01-02-2002

Hi Frankie,

with two messages clearly stating a POWERFAIL, I like to know if the root disks reside in an external unit, and have been physicly moved around shortly (may be without you knowing it ?!?).

With a powerfailed disk I tend to suspect bad electricity cables somewhere. Or an overloaded powersupply (which might drop, when the DLT starts to spin up ?).

Yeah, this sounds weak, but basic problems usually have basic solutions.

Do not know if this helps.
Volker

fg_1 · ‎01-02-2002

Frankie

The powerfail message is a definite indication of failure, but in this case it's not the disk itself but more the controller. Are there any other disk on this controller, since these are internal disk, the disk should not be failing out.

What is the HW path of the tape drive?

Also, you need to take a look at your diagnostics configuration since you are receiving alot of messages.

The problem with the controller could be intermittent so just to be safe, I would have HP remove and replace the affected card just to eliminate the possibility.

lastgreatone · ‎01-02-2002

The final diagnosis, a failed main board. It is being replaced by HP.

thanks all.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: root disk failures

root disk failures