Operating System - HP-UX
1833013 Members
2433 Online
110048 Solutions
New Discussion

Re: eva / switch / lvol / filesystem question

 
Charles McCary
Valued Contributor

eva / switch / lvol / filesystem question

Group, need some understanding.

Both of my fibre switches rebooted Sunday morning (firmware issue that I'm addressing). This looks to have caused some time where neither path to the array was available to the system. However, I saw no consequences from a database or application standpoint, i.e. they both kept working as if nothing happened. Is this dependent on the PV timeout value? If so, mine is set to default which I believe is 30 seconds. You can see from below that technically the lun was dead for just under two minutes. I'm trying to figure out why I didn't have any filesystem and therefore db or app problems.

Jul 8 01:10:04 sys1 vmunix: CPQswsp: Path c5t0d1 Failed (LUN 600508B4001023BC0000900000600000 Controller P66C5E2AAQT016 Array 50001FE1500463F0 HBA td0)

Jul 8 01:10:05 sys1 vmunix: CPQswsp: Availability for LUN 600508B4001023BC0000900000600000 changed to Reduced


Jul 8 01:12:06 sys1 vmunix: CPQswsp: Path c15t0d1 Failed (LUN 600508B4001023BC0000900000600000 Controller P66C5E2AAQT016 Array 50001FE1500463F0 HBA td1)

Jul 8 01:12:06 sys1 vmunix: WARNING: CPQswsp: All paths for Target/LUN 0/1 (WWID=600508B4001023BC0000900000600000) on Controller P66C5E2AAQT016 failed

Jul 8 01:12:07 sys1 vmunix: CPQswsp: Availability for LUN 600508B4001023BC0000900000600000 changed to Critical

Jul 8 01:12:13 sys1 vmunix: WARNING: CPQswsp: Availability for LUN 600508B4001023BC0000900000600000 changed to Dead

Jul 8 01:14:05 sys1 vmunix: CPQswsp: Availability for LUN 600508B4001023BC0000900000600000 changed to Reduced

Jul 8 01:16:10 sys1 vmunix: CPQswsp: Availability for LUN 600508B4001023BC0000900000600000 changed to Alive

I appreciate any thoughts or comments
6 REPLIES 6
Torsten.
Acclaimed Contributor

Re: eva / switch / lvol / filesystem question

If a path fails, the system waits for timeout (30~90 sec are usual), then the system will use the next alternate path.
But your output looks like securepath is in use. The behaviour is the same here, more or less.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Charles McCary
Valued Contributor

Re: eva / switch / lvol / filesystem question

securepath is in use.

I'm just wondering how the timeout didn't kick in during the timeperiod above where the LUN was marked as Dead (1:12:13) to when it was changed back to Reduced (1:14:05), which is 114 seconds.

What should have occured here? I would assume that at the very least, the database would have generated something in the alert log, but that didn't happen.

Don Mallory
Trusted Contributor

Re: eva / switch / lvol / filesystem question

I can't point to why, but I can share my experiences. Most of my array failures have been due to hardware on VA7410's, so I've seen a lot of them.

By and large, in my experience, HP-UX boxes, when the device shows as NOHW (ioscan -fnC disk) during a failure, just won't finish the read or write operations that are pending. It waits.

I've had periods of up to 12 hours where Oracle Financials on Oracle 10g2 sat with all associated LUNs stuck as NOHW, and when it came back, the database continued on as if nothing happened. It's when you reboot in the middle that there's a serious problem, because what's stuck in the cache didn't get written out to the logs.

Now, this isn't always going to be true, but if the data is stable on the disk, it's been okay from what I've seen.

A call to backline support might answer the WHY better.
Charles McCary
Valued Contributor

Re: eva / switch / lvol / filesystem question

Thanks for sharing the info. This explains what could have occured (I feel marginally better anyway).

I of course opened a ticket when all of this occured. Originally thought we had an issue with a controller on the EVA. I asked the question of the folks on the storage team and they didn't have a good answer (redundant paths was their best effort).

Anyway, thanks. I'll leave this open a while longer in case anyone else wants to comment.

Sudalaimani
Frequent Advisor

Re: eva / switch / lvol / filesystem question

Hi,

What is the Load balance policy (set_lbpolicy) set on this box for the Securepath .. is it SQL (Shortest Queue Length).

As some times this will also help to narrow down the issues.

#autopath display all è To check the load balance


Just want to share my 2 cents ..

regards

Mani

A Long Journey Starts with Single Foot Step
Charles McCary
Valued Contributor

Re: eva / switch / lvol / filesystem question

spmgr shows the following (I've included only one lun due to space):

Command: spmgr display
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Storage:
Load Balance: On Auto-restore: On Balance Policy: Round Robin
Path Verify: On Verify Interval: 30
HBAs: td0 td1
Controller: P016, Operational
P02B, Operational
Devices: c13t0d0 c13t0d1 c13t0d2 c13t0d3 c13t0d4 c13t0d5 c13t0d6
c13t0d7 c13t1d0 c13t1d1 c13t1d2 c13t1d3 c13t1d4 c13t1d5
c13t1d6 c13t1d7 c13t2d0 c13t2d1 c13t2d2 c13t2d3 c13t2d4
c13t2d5 c13t2d6 c13t2d7 c13t3d0 c13t3d1 c13t3d2 c13t3d3
c13t3d4 c13t3d5 c13t3d6 c13t3d7 c13t4d0 c13t4d1 c13t4d2
c13t4d3 c13t4d4 c13t4d5

TGT/LUN Device WWLUN_ID H/W_Path #_Paths
0/ 0 c13t0d0 6005-08B4-0010-23BC-0000-9000-0060-0000 4
255/255/0/0.0
Controller Path_Instance HBA Preferred? Path_Status
P016 no
c5t0d1 td0 YES Active
c15t0d1 td1 YES Active

Controller Path_Instance HBA Preferred? Path_Status
P02B no
c7t0d1 td0 no Standby
c17t0d1 td1 no Standby