HPE EVA Storage
1832970 Members
2896 Online
110048 Solutions
New Discussion

Re: Ungroup of disk causes disruption on Oracle server

 
Brian Nielsen_4
Advisor

Ungroup of disk causes disruption on Oracle server

Hi,
I have a problem with an EVA5000 2C8D VCS3.028.
When ungrouping a disk from the disk group, an Oracle server loose its vdisks for a minute or two. It's not everytime but often. So this means that right now, we have to shut down the Oracle, before we start an ungroup. We are migrating to larger drives, that why we have to ungroup. The disk group does also contains vdisks presented to some Vmware servers, but theese servers doesn't loose any vdisks.
Any ideas how to prevent this situation?

Regards Brian
10 REPLIES 10
Del_3
Trusted Contributor

Re: Ungroup of disk causes disruption on Oracle server

This is not normal. How are your servers zoned and what OS is running Oracle? Also very important: Are your servers and especially the ESX servers single hba?
Brian Nielsen_4
Advisor

Re: Ungroup of disk causes disruption on Oracle server

All servers are zoned by best practice. Only one server and one EVA in every zone. The Oracle server is running on Windows 2000 SP4. All servers are with dual hba. The driver and firmware for the hba's has been updated for about a halv year ago. I haven't seen this problem before, even on other EVA's. I think it's a strange problem, because it's initiated when you start an ungroup, but the problem is only seen on the Oracle server. So where to start the troubleshooting?

Regards Brian
Tom O'Toole
Respected Contributor

Re: Ungroup of disk causes disruption on Oracle server

I haven't seen this with an eva 5000 running VCS, but I'm using AIX not windows. With EVA 8000 running XCS we have seen some issues with various back-end events sometimes causing AIX issues. The VMS systems on these arrays don't see any problems. We would sometimes get disk operation errors on AIX during for example, locate operations, which should not be seen by a host. I think these turned out to be unit attentions sent by the EVA.

What does this have to do with your situation? I think the EVA could be sending "unit attention" messages back to the host when certain things happen. The host may not always handle these properly and log an error condition. That is one thing for you to investigate.

The other is the I/O timeout. I think oracle has its own value, and I think it's generally more stringent than the host OS. It would be worth investigating what the oracle I/O timeout is, and if it can be changed.

By the way, don't you mean "lose", not "loose"?
Can you imagine if we used PCs to manage our enterprise systems? ... oops.
Brian Nielsen_4
Advisor

Re: Ungroup of disk causes disruption on Oracle server

Thank you for your answer. The problem hasn't realy nothing to do with the Oracle it self, because it's a Windows message I get. It's something like "Delayed write failure", which means that the Windows server has lost the connection to the vdisk, and therefore the Oracle DB goes down. The Windows server reconnects after about a minute to the vdisk again, and the Oracle can be started again. I think your right about it's a background operation, and it shouldn't be seen by the hosts, like you also get with your AIX. Maybe a firmware upgrade would solve the problem?
Rgds Brian
tkc
Esteemed Contributor

Re: Ungroup of disk causes disruption on Oracle server

Rob Leadbeater
Honored Contributor

Re: Ungroup of disk causes disruption on Oracle server

Hi Brian,

How many disks are in the disk group, and how many times has this happened ?

I'm wondering whether there's a 'strange' event when one of the quorum disks in the group gets ungrouped...

Regardless you should probably upgrade to VCS3.110, details of which are here:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=315127&prodTypeId=12169&objectID=c01037929

There are quite a few fixes mentioned there, some of which might be relevant...

Hope this helps,

Regards,

Rob
Del_3
Trusted Contributor

Re: Ungroup of disk causes disruption on Oracle server

Before you go updating firmware - I do not believe this is the problem - I would make a good health check of the SAN.

- Check the switch firmware and be sure its the supported version for the VCS.
- Clear the switch error counters and then monitor the "porterrshow" output for lost frames and enc_out etc. Enc_out should be 0 or not climing. Transcievers do go bad, sometimes in an non-binary way.
- Be sure your HBA driver is the appropriate version. If you are using the storport drivers there is a patch that is required from MS. KB 916048.
- Check your multipath software too. Are you on Sec Path or MPIO basic? I would verify it is current and fails over properly between fabrics.
- And I would open a support call with HP and send them the controller logs.

Lost writes on Windows are usually very "physical" issues.





Owen_15
Valued Contributor

Re: Ungroup of disk causes disruption on Oracle server

Hi Brian,

Your issue is going to be around the appropriate compatible versions of firmware and drivers being loaded throughout the entire end to end solution.

You need to determine for an EVA running VCS code 3.028, what are the appropriate revisions of fabric OS, hba firware, hba boot bios, multipathing softare, and hba driver.

The hba driver that you are using in this configuration is especially important. The latest driver for the hba card may not be appropriate.

If you update us with what server, hba card, driver, firmware, multipathing driver and version, and san switch and fabric OS you are using I may be able to determine what revision of everything you should be running.

Regards
Owen
Brian Nielsen_4
Advisor

Re: Ungroup of disk causes disruption on Oracle server

Thank you for your answers. The EVA, SAN Switches, HBA's, drivers, were updated by HP for about a half year ago, so it's not a compability problem, and I don't think it's a Switch problem, since it can be initiated by an ungroup. It must be something on the EVA. I think I'll contact HP and ask for a firmware update.
Thank you for your inputs.
Rgds Brian.
Rob Leadbeater
Honored Contributor

Re: Ungroup of disk causes disruption on Oracle server

Hi Brian,

VCS 3.110 is now customer installable, if you see fit...

I'd still want HP to analyse the log files though, to try and determine the root cause.

Cheers,

Rob

P.S. Remember to assign points...