HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Help with async write and uncorrectable read errors

 
Tony Williams
Regular Advisor

Help with async write and uncorrectable read errors

Hi,

We had HP local support replace a 4Gb dual-port FC HBA (A379A) because of a bad port 1 in a rx7620 Integrity server running 11.23. I used olrad to perform the replacement. After the replacement and olrad brought the slot and driver online we immediately started getting async write errors on the disks attached to an EMC disk array:

Jul 30 16:26:08 idevl1 vmunix: SCSI: Async write error -- dev: b 31 0x7e7500, errno: 126, resid: 8192,
Jul 30 16:26:08 idevl1 vmunix: blkno: 297464, sectno: 594928, offset: 304603136, bcount: 8192.

We also got uncorrectable read errors:

Jul 30 16:26:33 idevl1 vmunix: WARNING: VxVM vxio V-5-0-3 Plex pdata1 block 2280263912:
Jul 30 16:26:33 idevl1 vmunix: Uncorrectable read error on Subdisk data1_idevl1_2606.102-01 block 14711296
Jul 30 16:26:33 idevl1 vmunix: WARNING: VxVM vxio V-5-0-2 Subdisk data1_idevl1_2606.102-01 block 14711336: Uncorrectable read error

Unfortunately I didn't see this until 30 minutes later. I persistent disable the disk and tape SAN ports, disable the HBA ports. umounted the filesystems, deported the VxVM (ver 4.1) Device Groups, force FSCK’s the filesystems.

Right now the volumes on the DG’s are started and the filesytems are mounted on the remaining path.

I have talked with HP's level one support, level 2 support, and 2 duty managers. HP's response has been to reboot the server, update the device driver, or replace the HBA.

I was hoping to get a more in-depth answer as to what the "root" cause of the errors are.

I would like to know with some certainty if the problem was:

Hardware – Faulty HBA
Hardware – Faulty installation
Hardware – Damaged FC cable during the installation of the HBA
Software – Veritas Volume Manager inability to handle an online HBA replacement
Software – Veritas Volume Manager Dynamic Multi-Pathing (DMP) inability to handle an online HBA replacement
Software – Fiber Channel device driver inability to handle an online HBA replacement
Software – olrad
Human error – Incorrect online card replacement procedures

Anyone have experience with async write errors and uncorrectable read errors after a olrad HBA replacement?

Thanks


3 REPLIES
Steven E. Protter
Exalted Contributor

Re: Help with async write and uncorrectable read errors

Shalom,

A step was left out.

When you replace a fiber channel card, you need to get the World Wide Name and update your storage team so the LUNS the old card had presented to it are presented to the new one.

If the system is up, use fcmustil to get the WWN information and relay it to your storage team.

Human error â Incorrect online card replacement procedures

I've done this with olrad. It does work.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Tony Williams
Regular Advisor

Re: Help with async write and uncorrectable read errors

Thanks Steven,

We use Port Zoning, so WWNs are not involved.
Tony Williams
Regular Advisor

Re: Help with async write and uncorrectable read errors

Steven, I think you just nudged a brain cell in the right direction.

We don't use WWN zoning but you must MASK the HBAs via WWN to the Fiber Adapters on the EMC disk array.

When the HBA changed the WWN changed and the array database should have been updated to reflect the change and it wasn't.