Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Problems with LVM running under 11.0 <-> EMC Clarrion

Dr. Peer-Joachim Koch
Frequent Advisor

Problems with LVM running under 11.0 <-> EMC Clarrion

Hi,

we are using hpux 11.0 on our file server for a couple of years now. EMC wanted to update the flare code of our fc4700 clarrion to the newest
release.
After this updated we run into serious trouble.
We are using pvlinks to attach the lun's of the raid's. If I'm going to move a lv from one LUN to another the whole system starts to trespass between both paths heavily. Even a reboot of one storage processor was the result of a pvmove !!
All parameter are adjusted according to the
wishes of EMC, but hte behavior is still the same. EMC is working on the problem, but seem's not to know how to stop the extended tresspassing.
However I can not work on the file server
without running into serious trouble (one
file system was already corrupted !).

So has anybody knowledge in this field ?

Maybe one has had the the trouble and knows
some workaround ....

hardware: l-class, 2 A5158 Adapter, 2 Brocade 2800, 2 FC4700, ~10TB disk space 16 LUN'S (8 on each clarrion). Using hpux 11.0, lvm online jfs.

Thanks, Peer
peko
6 REPLIES
Bernhard Mueller
Honored Contributor

Re: Problems with LVM running under 11.0 <-> EMC Clarrion

Peer,

upfront: I'm not familiar with the issue you have. However there are two thing you could check:

1. did you implement single initiator zoning on the Brocades? If not, this would be the first thing I'd suggest.

2. pvmove is similar to lvextend -m 1 and a subsequent lvreduce -m 0. So you might try such a two step approach instead of using pvmove and see it you get the same problem when simply adding or removing an lvm mirror.

Regards,
Bernhard
Dr. Peer-Joachim Koch
Frequent Advisor

Re: Problems with LVM running under 11.0 <-> EMC Clarrion

I'm not sure, what you mean, but I guess
you mean thezoning of the sp's and the hba's, right ?
Every hba is in one zone seeing only one
sp. As mentioned before it worked for nearly
four years WITHOUT a single fault ....

In the moment I'll will not test any further.
I'm waiting for a response of EMC. The risk is
much to high. I've always play with our MAIN
file server ...

Thanks, Peer
peko
Bernhard Mueller
Honored Contributor

Re: Problems with LVM running under 11.0 <-> EMC Clarrion

Yes, that is what I meant, every HBA in its own zone. So its up to EMC.

Regards,
Bernhard
Martha Mueller
Super Advisor

Re: Problems with LVM running under 11.0 <-> EMC Clarrion

We had the exact same problem including the corruption. Turns out that our fc4700 was simply overworked and couldn't handle the load. Whenever the 1 GB of cache was overrun and data had to come from disk, we would start to have problems. EMC had to give us a symmetrix because they couldn't fix the problem.
Tom Geudens
Honored Contributor

Re: Problems with LVM running under 11.0 <-> EMC Clarrion

Greetings,
I have some knowledge in this area, as we used both the FC4500 and the FC4700. However, we are now upgraded to CX600 (due to the somewhat inherent instability of the FC's) so I don't know if my knowledge is up-to-date.

Things to check :
- Your agent.config file on the host. Does it contain the "OptionsSupported AutoTrespass"-option ?
- Are you using PowerPath in combination with Navisphere Agent ? If so, things are pretty tricky. There's a manual specifically for that combination.
- How is your failovermode (navicli command) set ?
These three things interact with each other and the settings for each may cause the results you are describing. However, if you are having corruption you should have EMC on-site and they probably already checked these things ...

Oh yes, is the physical volume you are "pvmoving" visible on the same SP as the physical volume you are moving to ?

Hope this at least gives you a pointer to "where to start looking" ...

Regards,
Tom

P.S. EMC should have checked this before doing the upgrade, but is the combination of FC4700 / microcode / A5158A adapter still supported in their matrix ?
A life ? Cool ! Where can I download one of those from ?
Dr. Peer-Joachim Koch
Frequent Advisor

Re: Problems with LVM running under 11.0 <-> EMC Clarrion

Hi,

we now "solved" the problem switching from pv-links to powerpath. Everything was reconfigured the hpa driver, the navi client and the pdc were updated and nearly everything
is running smoothly now (one machine has a load
of 1 but the system is idle'ing 100% ...!, but
I can not find the reason. No pending IO or things like that.)

However, when we did the reconfiguration one
naviagent used old settings and switched the
settings to HP-Trespass. The powerpath was not
able to see the device anymore. We found the problem and fixed it, the device was accessible again, but the file systems on the LV were both corrupt (fsck -o full ...).
We lost again > 100GB. The file system was
not reachable, no IO nothing.

So why are the file systems broken ?

A few year's ago we had to play a lot with
a different FC-Raid and we often used the following trick to fix problems on the Raid. We simply disconnected th FC-Link - all
IO's were queued and after 1 or 2h we
reestabilished the link and everything
worked fine.

Any idea ?

EMC fault or HP fault ?

Bye, Peer
peko