HPE EVA Storage

EVA4400 cache battery failure forces SAN offline

 
SOLVED
Go to solution
Stu Gepp
Occasional Advisor

EVA4400 cache battery failure forces SAN offline

We have deployed 6 separate EVA4400s with a single disk shelf each in our produciton environment.

On Wednesday last week one of these suffered a double cache battery failure - the second battery died 65 minutes after the first.

This caused the whole SAN to go offline and we lost access to all LUNs.

HP have replaced the batteries and all is working again now but they are saying that the offline behaviour is by design. I find this incrediblely bizarre behaviour as I wouldn't expect the cache batteries to be in the critical path - they are there just in case external power fails.

Is this behaviour common across all HP SANs?
27 REPLIES 27
kunalsahoo
Valued Contributor

Re: EVA4400 cache battery failure forces SAN offline

Hi,

This is by design across all EVAs and this is for data safety...

when both the batteries goes down, there is always a possibility that unflushed data would be there in the cache, hence the controllers locks the host access in this scenario so that the data doesn't get corrupt or lost, once the batteries are back first the unflushed data is written to the disks and then controllers automatically releases the lock...

attached is the table which describes best Vdisk access in different battery stages...



IBaltay
Honored Contributor

Re: EVA4400 cache battery failure forces SAN offline

Hi,
surely it is mistaken. Could you provide your current controller firmware?
the pain is one part of the reality
IBaltay
Honored Contributor

Re: EVA4400 cache battery failure forces SAN offline

could you provide the termination code of the issue?
the pain is one part of the reality
Uwe Zessin
Honored Contributor
Solution

Re: EVA4400 cache battery failure forces SAN offline

Sure its for data safety, but the problem is that this behaviour cannot be turned off to at least run minimal services in write-through mode! Instead your are left "dead in the water"!

We had a problem with 2 EVAs some years ago: 3 dead cache batteries and no replacements available! I've asked around, but it seems that there isn't even a 'secret bypass' to enable access to the data.

I would not have a problem if the customer had to sign a paper that he accepts the risk of a cache loss and then somebody comes and flips a bit to allow access again. Most systems are connected through an external UPS anyway.

In the what can be thought of the previous generation of storage array, based on the HSG controller it was possible to tell the module to ignore the state of the cache battery and it was documented in the manual.
.
IBaltay
Honored Contributor

Re: EVA4400 cache battery failure forces SAN offline

ok i see i have overlooked that the question was to the cache policy preventing the access to the whole DG - starting with detecting the battery system on one controller as no longer good and moving the disk drives to the other controller. If then the battery system fails on the other controller too, there is not disk presentation...
My questions were rather to find out if the double battery failure had not been related to the controller firmware bug, or so to be able to prevent the reoccurance of such failure...
the pain is one part of the reality
kunalsahoo
Valued Contributor

Re: EVA4400 cache battery failure forces SAN offline

@Ibaltay
Is there any known BUG in eva 4400 fw which mars the both the batteries @ the same time ? please share the fw version and if possible the release notes of the version in which it is fixed....is the new one has the fix ,21000 to be specific...
IBaltay
Honored Contributor

Re: EVA4400 cache battery failure forces SAN offline

it was meant historicaly as seen from the ADVISORY: (Revised) HP StorageWorks 4400/6400/8400 Enterprise Virtual Array XCS 09522000 released:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01850026&jumpid=reg_R1002_USEN

from which it is seen that lot of versions had been innactivated...

e.g. HP StorageWorks 4400 Enterprise Virtual Array release notes (XCS 09006000)p. 4. mentioned the fix of addressing a firmware issue which could result in false reporting of controller component and battery failure.

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01658405/c01658405.pdf?jumpid=reg_R1002_USEN
the pain is one part of the reality
Stu Gepp
Occasional Advisor

Re: EVA4400 cache battery failure forces SAN offline

The controller firmware is 09522000. I am aware of the previous version(s) falsely reporting battery failure but in my experience the failure cleared in a matter of milliseconds.

These batteries definitely died and rendered the SAN unusable.

I find it bizarre that the policy is that if both the batteries die I am denied access to my data. I don't even have the option to move it elsewhere where I can do writes safely - regardless of the fact that the EVA is in a datacentre with UPS and generator backup.
kunalsahoo
Valued Contributor

Re: EVA4400 cache battery failure forces SAN offline

Unfortunately thats how it works...however bit surprised to see both batt failure at the same time...which is quite rare to experience.....