LVM and VxVM

HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

 
SOLVED
Go to solution
William Shaw
Frequent Advisor

HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

I am running HP-UX 11i with the Dec 2002 patches installed. The server is L1000 and I have 2 A6795A HBA's installed. I am connected to HDS disk array via fabric switch, and have alternate paths to the drives with PVlinks (Mirror-UX). I experience occasional system hangs when testing path failure (simple cable pulls). Most of the time the failover to the alt path and then the fail back to the primary path work fine. But some of the times (5% to 10% of the time), the system will hang. The CDE continues to function, more or less, (I can switch between windows and sesions), but almost all commands just hang (ps, ls, mount, shutdown). The system returns pings and will give a login prompt over telnet, but after giving your login, never prompts for password. Only "solution" to this condition is to power cycle the system. I read a similar problem description here:

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x981736e69499d611abdb0090277a778c,00.html

But I believe I have the recommended patches applied already.

Any ideas? Thanks!

Bill
11 REPLIES 11
Ian Kidd_1
Trusted Contributor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

I don't have a solution - all I wanted to say is that I've experienced the same issue with some of my servers and an EMC array. EMC recommended installing PowerPath to overcome this problem, but my company was unwilling to pay for it. I don't know if HDS has a similar product, though.

If at first you don't succeed, go to the ITRC
John Poff
Honored Contributor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

Hi,

We just came off of an EMC array this past weekend and moved onto an XP1024. We used PowerPath with EMC and the corresponding product for HP is AutoPath. That might help your situation. AutoPath will balance the load down all the available paths and will handle the loss of a path.

My only other suggestion would be that if the problem only happens when you pull the cable, well, just stop pulling the cable! :)

JP
William Shaw
Frequent Advisor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

I believe that AutoPath is simply another name for PVLinks, part of HP's Mirror-UX software. So unless I am mistaken on that, I'm already running autopath.

Also fyi I work in a test lab, pulling cables is part of my job. ;-)
John Poff
Honored Contributor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

Ok. I understand now. I work in a production environment so I would get yelled at for pulling cables. If I got caught, that is. :)

Auto Path is a separate product from the MirrorUX software and PV links. Auto Path will use all of the paths you have available to a set of disks and will balance the reads and writes along those paths. Here is a link to the Auto Path product:

http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/autopathxp/

JP


John Poff
Honored Contributor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

I just thought of something else. While we were switching things around this past weekend, we swapped out some of the old A5158A Fibre Channel cards for the newer A6795A cards. Our guys brought up a system with a couple of the A6795A cards that didn't have anything attached to them. About an hour later that system crashed. We had HP on site already for the work we were doing, and they dug into the problem. They found out that the A6795A cards need to be terminated if nothing is plugged into them. Perhaps you are running into the same problem where your A6795A cards are having problems when the cable is unhooked for a certain length of time?

JP
Sridhar Bhaskarla
Honored Contributor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

Hi Bill,

By default PV TIMEOUT is set to 30secs. So, system will be able to recognize the failure atleast after 30 secs. I may expect to see this behaviour if you are using raw logical volumes as the application has to take care of handling the data on them. They can go crazy causing the system to hang.

BTW, how long did you have to wait before recycling the system?

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
William Shaw
Frequent Advisor

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

Hi people, thanks for the replies. I would hate to think that simply pulling a cable would make the system crash (or hang), but I guess that is possible. The thing is, this only happens on 11i with the 6795's. 11.00 is fine with the 6795's, we can pull cables, put them back, whatever. Alternate paths take over when necessary, and primary path restores when cable returned. 11i is also fine with the 5158 cards. It's just the combo of 11i and 6795's that give us this problem.

Thanks again
Bill
Keith De Gray
New Member

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

Bill,
I would look at the driver and firmware for the 6795 cards, as pulling the cables should not cause the problem, also I would need more information about the HDS array, is it a 9900 series or a 9500 series, as the 9900 is an active / active, and the 9500 is an active / passive system, The act/pas can cause a lot of problems unless you are using the respective software, and for HDS that would be the HDLM driver.
Thayanidhi
Honored Contributor
Solution

Re: HP 11i, 2 A6795A HBA's, pvlinks failures (system hang)

Hi,

I think the best way to test the PV link is by commands. I do this way instead pulling cables.

When PV link configured it changes the PV paths automatically to optimise the performance. For testing purpose disbale that first. Then change the path to alternate links, and test, finally enable auto switching.

Step1: pvchange -S n /dev/dsk/.....
Repeat for all currently used PV paths.
Step2: pvchange -s /dev/dsk/.....
test all alternate paths.
Repeat VG display inbetween all steps to make sure which is "current".
Once testing completed
Step3: pvchnage -S y /dev/dsk/....
Repeat for all PV paths.

Note: Without disbling (-S ) if you try to switch the path ( -s ) then system may switch back automatically because of the auto switching feature.

Also note that even disabling the auto switching PV path will switch when the current path is not accesible.

Refer to man page of pvchange for more info.

I am not sure why 11.00 and 11i behaves diffrently for you!!

Regards

TT
Attitude (not aptitude) determines altitude.