Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

EVA 8000 and AIX 5.3 TL5

SOLVED
Go to solution
Jim Norkus
Occasional Visitor

EVA 8000 and AIX 5.3 TL5

Hi Folks, I researched bu tdidn't find any information that was helpful. We've had some issues recently to the rootvg. We boot of off SAN EVA disk (mulitbos as well). It appears that some of the LVM filesystems were corupted.

My question is: Does HP recommend that bad block reallocation be turned off on the OS level? If so can you point me into the location of some documentation surrounding that.

I've worked with EMC in the past and I know they recommend it but I'm not sure about this IBM gear.

Thanks
-jim
7 REPLIES
IBaltay
Honored Contributor

Re: EVA 8000 and AIX 5.3 TL5

Hi, for EMC storages (e.g. EMX3 2500) the LVM Bad block reallocation should be turned off for all volume groups/all logical volumes on all EMC arrays. Without it, the FS corruption can result. According to the LVM best practices for HP-UX servers using XP disks (most probably the same applies also to the EVA platform) lvols should also be set with bad block relocation set to none.
From the http://www.redbooks.ibm.com/abstracts/SG245432.html?Open
it is clear that IBM recommendation is to disable the bad block reallocation on the concurrent access VGs/every logical volume.
the pain is one part of the reality
Jim Norkus
Occasional Visitor

Re: EVA 8000 and AIX 5.3 TL5

Hi,

Thanks for the reply. I knew the EMC and HP server settings. And i as well, assume it'd be the same for EVA and AIX gear. I'm not running concurrent (mulitple servers seeing same drives at the same time) so I couldn't find in that document anything that was definitive. Thanks for the help though.
-jim
Tom O'Toole
Respected Contributor

Re: EVA 8000 and AIX 5.3 TL5


I've never seen anything in the HP doc about this, and here we have never done this. We have many aix systems booting and using data disks served up by EVA arrays.

Could you provide more info about the errors you encountered, and the software you are using (MPIO, antemeta, etc...)? Would be very interested in what happened, and under what circumstances. Thanks.
Can you imagine if we used PCs to manage our enterprise systems? ... oops.
Jim Norkus
Occasional Visitor

Re: EVA 8000 and AIX 5.3 TL5

Hi sure -

Here we boot off of the SAN drives. Three of our servers suffered OS corruption. We were luck to have an old multibos image to boot off of to fsck the system systems. The SAN guys were unable to determine if these all shared the same physical drive in the EVA. The core issue may have been something else but if brought up to my attention that maybe we should have the OS level bad block reallocation turned off. On some of our servers we do run MPIO. I wish this issue was as cut and dry with the issues but it's not. One server experienced issues running simple commands like ls or lsattr while others simply just hung. So anytime I witness unexplainable events like this, I found it to be related to bad block reallocation. So for all the words here but my posting at this point was mostly trying to determine best practices for the EVA and IBM. I know EMC provides this info but I'm not so sure I can find anything related to this combination. I agree that it works fine for the longest time but if you ever get a strange event that's unexplainable but a "reboot" fixes then you'd know what I'm talking about.

I will pay more attention to patterns but at this point they're all P570 Lpars running their OS on EVA 8000.

-peace
Tom O'Toole
Respected Contributor

Re: EVA 8000 and AIX 5.3 TL5


It sounds like you are saying a physical disk failure on the EVA that seemed to precipitate the outage you had, is that correct? I say this because of your comment about systems sharing a physical disk.

It's easy to dertermine whether systems are using a certain disk -- all vdisks in a disk group share all the physical disks in that group.

Unless the vdisks are vraid0, a single disk failure should not affect the operation of vdisks.

Your post is the first I've heard about BB relocation policy for san devices...

Can you imagine if we used PCs to manage our enterprise systems? ... oops.
Jim Norkus
Occasional Visitor

Re: EVA 8000 and AIX 5.3 TL5

Hi - I guess it was a little confusing. The shared disk I was refering to was the physical EVA drive which presented up multiple virtual luns to different hosts. (Much clearer right - lol) And you are correct with the raid that we didn't experience a loss of a physical drive. But my question was if one physical drive (hosting multiple lun's i.e. a 300GB drive with 6 - 50GB luns to multiple servers) experienced a bad block, the EVA would relocate that block and possible the operating systems AIX would as well. The blocks would then be located in different locations. This would cause data corruption and since I boot off the SAN, it would cause very strange results that are hard to trace. I was familiar with EMC DMX and HP HPUX. You'd have to set bad block reallocation to NONE on the OS side. I was wondering if I needed to carry that same thought over to EVA and AIX. One person called EVA support and the tech was like Duhhhh (meaning it should be off) BUT has no documented policy for it.

As for the issues I had, they've all been resolved by running fsck and usually that would end this until the next time but this time I'm not going to let it go until I find what I'm looking for.

Thanks for reading.
Tom O'Toole
Respected Contributor
Solution

Re: EVA 8000 and AIX 5.3 TL5


Hi Jim, If you had a bad block on an EVA phsyical disk it should be dealt with on the array and never get back to the hosts systems that are using it. That's one of the big benefits of virtualizing storage.

But I say "should" because it seems that over the years we've been using EVA storage, messages DO sometimes get sent back to the host about back-end events, in the form of 'inquiry data has changed' UNIT ATTENTIONS. These can confuse some systems that aren't expecting them, and can cause error counters to increment, and device drivers to go through needless error recovery.

Can you imagine if we used PCs to manage our enterprise systems? ... oops.