Re: Numerous disk errors

Rick Garland · ‎02-11-2003

Hi all:

Got a problem with disk drives in an XP256 that some folks want to associate with the backups performed by Data Protector, I am hesitant to do so.

First, a description of the configuration:

All servers are HPUX 11.00 and are either L class or RP class. Each server has 2 HBA cards (A5158) for the data traffic and for the backup traffic. The primary data is stored on a XP256 disk array. Using Data Protector 5.0 to a HP 70/700 silo.

From a HBA on each server, the data routes to the XP via a HP Hub S10. From the other HBA on the servers the traffic is routed through a Brocade Silkworm 2800, then to either a SCSI Bridge FC 4/2 or a Bridge 2/1 LV (for DLT or LTO tape drives).

I have attached a portion of a syslog file from one of the servers.

Granted the times match with the backup in this example but this is not true of all servers.

The end result is that the system crashes, the backup fails, or the backup completes in an extra long time.

This backup is only with the Oracle database. If I backup the OS or the archives on the same system, none of these errors appear and things work OK.

Hopefully someone out there has encountered this issue before.

Many thanks!

T G Manikandan · ‎02-11-2003

SCSI error are due to

1.Improper scsi termination or scsi cabling

2.Disk not responding to a particular timeout period.

SHould be increased with pvchange -t

3.As your messages relate to read errors I suspect with the the hard disk
0x06b400

find out the hard disk using

ll /dev/dsk|grep 0x06b400

You can just do a

diskinfo
to check up the bad disk.

And probably this message 0x1f07b400 is POWERFAILED is due to the fact that the devices were not responding so the other disk has given those messages

Revert

T G Manikandan · ‎02-11-2003

Also make sure that you have all the LVM patches updated on the system

T G Manikandan · ‎02-11-2003

Also check this doc

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000062953903

Steven E. Protter · ‎02-11-2003

Are there any unused disks on the scsi chain?

I've seen problems like this when a disk with no data on it has failed. It can effect other disks up and down the chain.

The lack of lbolts shows nothing is critical yet, though that could soon change.

I had problems like this for months on a D box and disks with data on them failing about every month. Finally, I had time to do an inventory and test all the disks, data or not with xstm. I exercized the disk and found the bad one.

After hot swapping the bad disk out, the messages stopped.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Rita C Workman · ‎02-11-2003

Have you checked for errors on your HBAs?

fcmsutil /dev/fcms<#> stat > fcms<#>.txt

Then go and check that file, I look for Link Failed Count and Loss of Signal Count and the like. If you see some numbers there than you may have a problem with your HBA getting flakey.

Not sure this is your problem, but it is something to check. Besides I just got use these commands this past weekend...wondered how long before I found a thread that might apply.

Rgrds,
Rita

...had similar problem with some very old A3404A cards once...

Rick Garland · ‎02-11-2003

Still hacking at it. Have verified and exercised the disks in the XP, have previously set the timeout (pvchange -t) to 180, have remove some fiber cables that were suspect, etc.

Still nothing.

At this point it is thought that we have too many LUNS for the HW to handle and this could be causing a LIPS storm. I have made up some quick and dirty scripts for the systems to collect the fcmsutil data over time and see what may be happening.

Right now, I sure don't.

Any other suggstions are greatly welcome.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Numerous disk errors

Numerous disk errors