Operating System - HP-UX
1753435 Members
4585 Online
108794 Solutions
New Discussion юеВ

Re: Help with async write and uncorrectable read errors

 
Tony Williams
Regular Advisor

Help with async write and uncorrectable read errors

Hi,

We had HP local support replace a 4Gb dual-port FC HBA (A379A) because of a bad port 1 in a rx7620 Integrity server running 11.23. I used olrad to perform the replacement. After the replacement and olrad brought the slot and driver online we immediately started getting async write errors on the disks attached to an EMC disk array:

Jul 30 16:26:08 idevl1 vmunix: SCSI: Async write error -- dev: b 31 0x7e7500, errno: 126, resid: 8192,
Jul 30 16:26:08 idevl1 vmunix: blkno: 297464, sectno: 594928, offset: 304603136, bcount: 8192.

We also got uncorrectable read errors:

Jul 30 16:26:33 idevl1 vmunix: WARNING: VxVM vxio V-5-0-3 Plex pdata1 block 2280263912:
Jul 30 16:26:33 idevl1 vmunix: Uncorrectable read error on Subdisk data1_idevl1_2606.102-01 block 14711296
Jul 30 16:26:33 idevl1 vmunix: WARNING: VxVM vxio V-5-0-2 Subdisk data1_idevl1_2606.102-01 block 14711336: Uncorrectable read error

Unfortunately I didn't see this until 30 minutes later. I persistent disable the disk and tape SAN ports, disable the HBA ports. umounted the filesystems, deported the VxVM (ver 4.1) Device Groups, force FSCKтАЩs the filesystems.

Right now the volumes on the DGтАЩs are started and the filesytems are mounted on the remaining path.

I have talked with HP's level one support, level 2 support, and 2 duty managers. HP's response has been to reboot the server, update the device driver, or replace the HBA.

I was hoping to get a more in-depth answer as to what the "root" cause of the errors are.

I would like to know with some certainty if the problem was:

Hardware тАУ Faulty HBA
Hardware тАУ Faulty installation
Hardware тАУ Damaged FC cable during the installation of the HBA
Software тАУ Veritas Volume Manager inability to handle an online HBA replacement
Software тАУ Veritas Volume Manager Dynamic Multi-Pathing (DMP) inability to handle an online HBA replacement
Software тАУ Fiber Channel device driver inability to handle an online HBA replacement
Software тАУ olrad
Human error тАУ Incorrect online card replacement procedures

Anyone have experience with async write errors and uncorrectable read errors after a olrad HBA replacement?

Thanks


3 REPLIES 3
Steven E. Protter
Exalted Contributor

Re: Help with async write and uncorrectable read errors

Shalom,

A step was left out.

When you replace a fiber channel card, you need to get the World Wide Name and update your storage team so the LUNS the old card had presented to it are presented to the new one.

If the system is up, use fcmustil to get the WWN information and relay it to your storage team.

Human error ├в Incorrect online card replacement procedures

I've done this with olrad. It does work.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Tony Williams
Regular Advisor

Re: Help with async write and uncorrectable read errors

Thanks Steven,

We use Port Zoning, so WWNs are not involved.
Tony Williams
Regular Advisor

Re: Help with async write and uncorrectable read errors

Steven, I think you just nudged a brain cell in the right direction.

We don't use WWN zoning but you must MASK the HBAs via WWN to the Fiber Adapters on the EMC disk array.

When the HBA changed the WWN changed and the array database should have been updated to reflect the change and it wasn't.