Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Nike 20 problem

Warren_9
Honored Contributor

Nike 20 problem

I have a nike 20 configured with 3 LUN

LUN0=A0,B0,C0,D0 (Raid 10)
LUN1=A1,B1,C1,D1 (Raid 10)
LUN2=E0,E1 (Mirror)

The SCSI error was found in the syslog and it said that there was an powerfail.

The console of the SPA shown that the status of the LUN2 was toggling between ENA/RDY every 2-3 seconds. No error was found on LUN0 and LUN1.

The HP engineer already replace the SPA, the mid-plane, the E0 and E1 harddisks, but the problem still exist.

Any idea??
8 REPLIES
RAC_1
Honored Contributor

Re: Nike 20 problem

These error are logged when I/O operation is not completed with timeout value-pv timeout value and then LVM starts it's power faile functions.
Had the same problem on VA7400 for some luns.

Check if you have patches

PHCO_25870
PHKL_26743
PHKL_27153
PHKL_27751
PHKL_28096
PHSS_26799

I also did pvchange -t 120 disks
(in order to increase the timeout value.)
can be done online.

also check exerciser option through stm. are you getting any visual indication on
nike 20?

Hope this helps.
There is no substitute to HARDWORK
Warren_9
Honored Contributor

Re: Nike 20 problem

The primary path of the LUN0 and LUN2 are SPA. And the LUN1 is from SPB.

Change of the timeout value may eliminate the powerfail problem.

But how come the status of the LUN2 was toggling?
Eugeny Brychkov
Honored Contributor

Re: Nike 20 problem

If you see in Nike's presentation utility that LUN has status 'ENA' this means that this SP owns the LUN and all current transactions are goin through it. If you see status 'RDY' this means that another controller owns LUN and all transactions are going through that controller.
If you see LUN jumping ENA/RDY state this may mean that host trying to access LUN through both controllers. Why? Because one of the paths may not respond to host correctly. There can be patch (OS) problem, as Anil stated, but there can be a connectivity problem.
Please check which path LUN2 has as primary path and which path other LUNs have? If LUN2 has primary path through SPA, and there're problems with this path (bad cable, bad SCSI controller etc) then when host will see these problems it will switch to alternate path. If then, for some reason, host will be notified by SCSI controller (its driver) that it is alive and path is available, host will switch to primary path, and again, and again.
So my idea is:
- check primary path for LUN2 and if it lies through SPA change it to lie through SPB;
- look at LUN state if it will switch between ENA and RDY
Eugeny
Bill McNAMARA_1
Honored Contributor

Re: Nike 20 problem

LUN2=E0,E1 (Mirror)
is pretty much a stupid idea. It is Slow and
it is not HA. (two disks on internal bus E)
Use a HS and an IND(ividual) disk instead. (need to be in FE mode)

Use mstm or xstm to select the SP, run the expert tool and get the array information.
Attach it here.
If I were you I'd pull the owning SP, and reseat it. You do know the owning SP is NOT necessqarily the ENA SP right?

I think in any case this has been fixed in a fw upgrade, it was known as the Christmas Tree Effect.. quite timely!!


It works for me (tm)
Warren_9
Honored Contributor

Re: Nike 20 problem

I???m getting confuse???

As Eugeny said, ???ENA??? means the SP owns the LUN and all transactions are going through this SP.
But Bill said the owning SP is NOT necessarily the ENA SP.

BTW, the HP software team confirmed the running OS is fine and already patched. The hardware team already replaced some hardware. The HP support said it should be an application problem and the application generates too many loads on the LUN and this makes the LUN unstable. No problem was found when the LUN2 haven't got any load.
Eugeny Brychkov
Honored Contributor

Re: Nike 20 problem

Warren,
if you wish you can do testing:
- shutdown host. Do you see ENA/RDY problem?
- boot into single-user mode, stress array - Do you see problem?
- disconnect SPB from the disk array side. Try accessing array - do you see problem?
- try the same with SPA;
Bill is right, you must check for firmware in Nike. Can you post it here (bootcode and microcode versions)?
As soon as there're no such problem observed on other LUNs, I suspect that there's some code (program) running on the host polling LUN2 by alternale path. So when one application acceses LUN2 through primary path you see ENA on SP residing on this path and RDY on SP residing on another path, but when another software polls LUN2 through alternate path picture changes contrarily
Eugeny
Bill McNAMARA_1
Honored Contributor

Re: Nike 20 problem

pull one of the E disks.

Then reslot it.

Did you check the UEL for both SPs (from mstm or xstm?)

Later,
Bill
It works for me (tm)
Warren_9
Honored Contributor

Re: Nike 20 problem

1. the LUN resume normal after reboot the host.
2. no problem found when having few 'dd' with bs=2k