BladeSystem Virtual Connect
cancel
Showing results for 
Search instead for 
Did you mean: 

Virtual Connect Domain going offline when there is no OA communication?

chuckk281
Trusted Contributor

Virtual Connect Domain going offline when there is no OA communication?

Arnout was looking for some help:

 

*****************

 

Hello experts,

 

Today I was at a customer doing firmware upgrade in BL860c i2 / BL870c i2.

 

FW:

OA: 3.21

VC: 3.15

 

 

Due to some updates not running (see separate mail) I decided – among other things – to reset the OA modules.

So I took out OA1 and failover occurred to OA2.

After inserting OA1 again, I took out OA 2(so OA is again running on primary module).

 

Now, it seems I took out OA2 before OA1 has completely booted up again and apparently the whole VC Domain went down due to no OA available.  With a result that all Unix clusters failed over to the other site and we had an unplanned downtime of 15 minutes.  Customer is off course not happy with this.

 

Is this normal behaviour?  I thought that VC works independently from OA.  A complete failure of all OA’s should not trigger a failure of VC?

 

*************************

 

Mark replied with some info:

 

********************

 

The messages in red that you have highlighted is not an indication that there was a network outage or module reset. It indicates only that VC-OA Communication was lost and that is why “Unknown” is reported and Domain is Failed.

I looked at the attached support dump and see that VC-OA Communication went Down/Up on 24th and a few times on the 25th and in all 3 events, the domain recovered and nothing was re-configured. (See below)

If the VC modules would have been reset or re-configured because of the No-COMM event, I would be able to see that clearly in the support dump logs, that evidence is not there.

 

Under normal circumstances, the OA-VC communication failure should not result in a outage on the data network. Something else happened here and I would suggest opening a case with the support organization.

 

*********************

 

Arnout responded:

 

*****************

 

Mark,

 

Thank you for the analysis.

 

It is indeed possible that the VC-OA communication went down yesterday also since we did the OA update yesterday.  Today the VC-OA communication went down due to removing all OA’s.

But it seems that the domain stayed online and that we need to look elsewhere for the cluster failover.

 

I just heard from the customer that it was apparently only one blade that failed over to other site.  So indeed, it can’t be a complete domain outage.

 

I will discuss these findings first and will take the necessary steps with the customer.

 

Kind Regards and thanks again!

 

*********************

 

Comments?

 

1 REPLY
John Naysmith
Occasional Visitor

Re: Virtual Connect Domain going offline when there is no OA communication?

We had horrible results applying the official fix.

A successful solution to this that we tested and applied in our 100+ c-class enclosure estate is to create a blank reverse DNS zone in our corp DNS for 46.48.49.in-addr.arpa
This brings all of the no-comm state OA's and VC's back into a manageable state.

We only found the IP via a couple of posts including - http://viktorbalogh.net/blog/virtualization/virtual-connect-bug-found

Hope this helps.