BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

Virtual Connect Domain failure

 
Steve MacKenzie
Occasional Advisor

Virtual Connect Domain failure

Hi,
I have the following issue, whenever I try and edit or delete a particular server profile the entire VC domain resets with the following error:
Enclosure state FAILED : All Enet modules not OK or DEGRADED
Critical] Domain state FAILED : 1+ enclosures not OK or DEGRADED
VCM initialized

All blades lose network connectivity until the modules re-initalize.

We are running:
OA version 3.21
VC version 3.15
All Blades on BIOS l24 09/02/2010

Any help appreciated.
25 REPLIES
dchmax
Frequent Advisor

Re: Virtual Connect Domain failure

I've not seen this before. You could try failing the active vc domain module to the standby and make the same changes. Hopefully it can be narrowed down to a specific module issue. Support will more then likely walk you through reseating all the modules and the backplanes. What type of modules do you have in interconnect bay 1 and 2?
Steve MacKenzie
Occasional Advisor

Re: Virtual Connect Domain failure

We have - HP VC Flex-10 Enet Modules.
We have logged a support call and they are still going through the logs.
dchmax
Frequent Advisor

Re: Virtual Connect Domain failure

Post what you find out. I'm about to upgrade the firmware in our test chassis.
anthony.bucci
Occasional Visitor

Re: Virtual Connect Domain failure

Have you figured out what is wrong here? I have the same issue right now.

OA version 3.21
VC version 3.15

VC-ENet gives a no communication error.
VC-FC seems to be working just fine.

Thanks!
Steve MacKenzie
Occasional Advisor

Re: Virtual Connect Domain failure

This is the response for the support team:

"recommended that the only way to resolve this is to deleting and recreating the Domain.
This issue could be caused due to corruption in the Virtual Connect Domain Configuration. Hence to resolve the VCD crashing issue every time a profile is edited or deleted, is to delete the existing VCD configuration and create a new Domain."

Which isn't good when I'm running crtical production servers in the enclosure.

I have escalated the issue, but have yet to receive another response.
anthony.bucci
Occasional Visitor

Re: Virtual Connect Domain failure

Thanks for the update...
I've done that and it does not seem to help either...

very bad answer on the part of HP.

Can't wait to hear what the next level has to say.

Thanks again.
ChrisTan_1
Advisor

Re: Virtual Connect Domain failure

Hi Steve,

are you using any of the new features in ver 3.15? I noticed you are still using Flex-10 and not FlexFabric modules so VC firmware ver 3.10 should be sufficient.

Force a VC firmware downgrade to ver 3.10?
Steve MacKenzie
Occasional Advisor

Re: Virtual Connect Domain failure

The issue started while we were running 3.10
We were advised we needed to upgrade our firmware to the revisions mentioned in my 1st post.
After upgrading the issue remains.

Regards,
Steve
DerekS_1
Frequent Advisor

Re: Virtual Connect Domain failure

I had a similar problem in December when we did VC/OA firmware upgrades. Two linked chassis, and VC died on us such that all blades in both chassis lost network connectivity. Critical production servers were taken offline and I was not at all happy.

After several hours of being on the phone with HP, I gave up. I'm not sure what finally cured the problem, but I ended up de-assigning and assigning a profile to a blade then things got back in sync and the servers came back online.

I'm not at all a fan of VC, and will avoid future purchases of this technology as my faith in its reliability is nearly gone.
Cindy Mayer
Occasional Visitor

Re: Virtual Connect Domain failure

This might help some people

The first thing I'd like to do is rule out an issue that can cause VCM to display no communication with Virtual Connect modules that occurs when DNS is enabled in the Onboard Administrtor under Enclosure Settings > Enclosure Bay IP Addressing > Interconnect Bays. If EBIPA is used to assign an IP address, make sure there is no DNS information entered. If there is, remove it and update by hitting the Apply button at the lower right. (This is if you use EBIPA to assign IP address. If you use an external DHCP server, then we would want to remove the DNS info there.)

The symptoms for the DNS issue is VCM reporting no communication for one or more VC modules and stacking links showing failed.



After i replied with this fixed it i got this in return.



Glad to hear that took care of it. There is a customer advisory on this issue, c02720395, but I don't believe it's availble on our website yet. There will be a future firmware fix, but for now the workaround is to remove the DNS information.
Jeroen_Kleen
HPE Pro

Re: Virtual Connect Domain failure

Hello Folks,

This DNS issue as described in the document c02720395 is resolved in VC FW 3.17 that is our next targeted release of VC FW.

Cheers, Jeroen
(I am an HPE employee and a HPE OneSphere evangelist)
Engage as well with HPE OneSphere experts at our new slack channels for quick and agile discussions around HPE OneSphere and HPE Dev: https://www.labs.hpe.com/slack
If your question is resolved then please acknowledge that and/or provide Kudo's.
RBC09
Occasional Visitor

Re: Virtual Connect Domain failure

Hey All,

We've what HP's advisory has asked and cleared the DNS records for our interconnect cards IPs. We run a DHCP not EBIPA. Also we temporarily set DNS not to update the IP scope. We are still experiencing the No communication error. Has anyone tried something else that resolved this issue? I am new to blade systems and am even having a hard time getting an HP engineer to resolve the issue.

Any advise or a "try this it worked for me" would be greatly appreciated.
HRT
Advisor

Re: Virtual Connect Domain failure

I cleared the DNS settings, and once I reseated the secondary module, everything resynchronized correctly.
The Brit
Honored Contributor

Re: Virtual Connect Domain failure

For the older fw versions you need to remove the DNS info from the Device Bays page on the OA EBIPA.

Since you use DHCP, this might not help, but I would do it anyway.
JMG-IT
Occasional Visitor

Re: Virtual Connect Domain failure

Hello - Jeroen or anyone else out there:

Is there a release date for VC FW 3.17?

Anyones input is greatly appreciated.

Thanks!
Chris House
Frequent Advisor

Re: Virtual Connect Domain failure

I removed the DNS entries from my EBIPA settings for the Interconnect modules and within a few minutes it cleared up the loss of communication between my VC ethernet modules, without needing reseat/reboot anything. However, we were using a hostname in our LDAP server field so that had to be changed to the IP equivalent in order to use LDAP authentication again.

... looking forward to weird new bugs in 3.17..........
Emarthi
Occasional Visitor

Re: Virtual Connect Domain failure

Hi,

I have the exact same problem, brand new C7000 with two VC Eth modules.
All latest FW.
Have also tried to remove DNS entry, without any luck!
We have not put the enclousure in production yet, so I also tried to delete the domain and recreate it, still not work.
Also tried to flush the FW on VC again.
I have open a case with HP, but the response arent wery quick, the only solution from HP so far are the workaround with DNS...

Erik
Jeroen_Kleen
HPE Pro

Re: Virtual Connect Domain failure

Hello JMG-IT,

We don't have a formal release date yet although we are working on updated schedule to push it out within a month from now or sooner.

Erik, feel free to post your CaseID so I can lookup the status for your case.

A updated Customer Advisory with the same URL: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02720395 should be available very soon.

Cheers, Jeroen
(I am an HPE employee and a HPE OneSphere evangelist)
Engage as well with HPE OneSphere experts at our new slack channels for quick and agile discussions around HPE OneSphere and HPE Dev: https://www.labs.hpe.com/slack
If your question is resolved then please acknowledge that and/or provide Kudo's.
Emarthi
Occasional Visitor

Re: Virtual Connect Domain failure

Hi Jeroen,

Thanck you for quick response!
My case number with HP are: 4626776895.

Thancks
Emarthi
Occasional Visitor

Re: Virtual Connect Domain failure

Hi Jeroen!

Your solution to the problem worked for me, very nice! Thanks!

After I set new ip's on both the VC's all start to work again.
I have pasted your soulution below, maybe this will work for other systems with the same problem.

Erik

"I see that your are using corporate DHCP. FUnny enough I don't see any kind of DNS/gateway listed at your EBIPA settings.
FOr now I would suggest the following:

1. do a VC CLI: SHow interconnect or a show status (this will show maybe the no comm status on your VC bay 1 & 2).

Reserve in DHCP your IP's used for Bay 1 & 2.
Then configure static EBIPA for interconnect on bay 1 & 2.
like this example in CLI:
#NOTE: SET EBIPA commands are only valid for OA v3.00 and later SET EBIPA INTERCONNECT VCM.IP.bay.1 255.255.248.0 1 SET EBIPA INTERCONNECT GATEWAY x.x.x.x 1 SET EBIPA INTERCONNECT DOMAIN "test.com" 1 ADD EBIPA INTERCONNECT DNS 0.0.0.0 ADD EBIPA INTERCONNECT DNS 0.0.0.0 SET EBIPA INTERCONNECT NTP PRIMARY x.x.x.x 1 SET EBIPA INTERCONNECT NTP SECONDARY x.x.x.x 1 ENABLE EBIPA INTERCONNECT 1 SET EBIPA INTERCONNECT VCM.IP.BAY.1 255.255.248.0 2 SET EBIPA INTERCONNECT GATEWAY x.x.x.x 2 SET EBIPA INTERCONNECT DOMAIN "test.com" 2 ADD EBIPA INTERCONNECT DNS 0.0.0.0 ADD EBIPA INTERCONNECT DNS 0.0.0.0 SET EBIPA INTERCONNECT NTP PRIMARY x.x.x.x SET EBIPA INTERCONNECT NTP SECONDARY 16 x.x.x.x ENABLE EBIPA INTERCONNECT 2

If you change your original VC bay 1 & 2 IP's to a different one then now in use then you need to {clear VCmode} on OA CLI.

After the new IP, subnetmask & GW are set on the interconnects (without DNS) then just poweroff and poweron the secondary VC module first:
OA CLI:
poweroff interconnect 2
poweron interconnect 2
then if needed do a VC CLI: reset vcm -failover (this might not work; then go rightaway to next step) poweroff interconnect 1 poweron interconnect 1
"
Dennis Hinson
Occasional Visitor

Re: Virtual Connect Domain failure

I would like to thank everyone who contributed to the knowledge on this forum regarding this issue specifically.

6 hours of my life and 15 minutes after I read this blog we have 2 chassis problems fixed!

Bob Firek
Regular Advisor

Re: Virtual Connect Domain failure

Hello Everybody,

 

It my enclosure settings I have the following settings: Bay, Enabled, EBIPA Address, Subnet Mask, Gateway, Domain, DNS Servers, Autofill, and Current Address. All field are filled in. When they say remove the DNS information do they just the IP addresses in the DNS servers field? We also have our domain in the Domain field, should that be removed as well? We are currently on OA 3.11 and VC 3.10. Will be looking at upgrading soon. Need to ask a stupid question. Is it wise to upgrade when the Domain is in a failed state even if everything seems to be working?

 

Any and all tips will be appreciated.

Thanks,

Bob

Mark Wibaux
Trusted Contributor

Re: Virtual Connect Domain failure

You only need to remove the IP addresses in the DNS server fields to work around the issue.

I would implement the work around and get your VC modules back online fully before performing the firmware update.

 

Below is some detail of what actually went wrong in the firmware (collected at the time from some ITRC posts)  and a link to the customer advisory about it.

HP Virtual Connect - Virtual Connect Manager May Be Unable to Communicate (NO_COMM) if DNS Is Enabled for Virtual Connect Ethernet Modules

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02720395

 

 

 

This is a bug, which translated 10. from ASCII to binary (ASCII 1 is 49, ASCII 0 is 48, ASCII . (period) is 46) giving a class C address in the 49.48.46.0 range. The bug was ignored, because that class C was not valid.

This Class C was allocated mid February. DNS queries to this (previously bogus) class C began to resolve, and returned errors. For instance, nslookup (for the 4 ASCII characters of 10.1) of 49.48.46.49 shows

mx-ll-49.48.46-49.dynamic.3bb.co.th

This was discovered (and reported on the SAN security mailing list) when outbound SSH connections to this Class C were observed by sites following best practice and monitoring outbound connection attempts

This caused a huge problem around the world. You could be affected if you were running any c class chassis with any virtual connect. The temporary fix was to erase the DNS server settings in the OnBoard admin under Enclosure Bay IP Addressing (for each tab). Sometimes that resulted in a complete reset of the V/C interconnect devices.

Bob Firek
Regular Advisor

Re: Virtual Connect Domain failure

Mark,

 

Thank you for the clarification. I wondering if you can help clarify something else for me. In the advisory to select the "Interconnect Bays" and remove DNS server IP addresses but in the comments included in your posting it states to remove the DNS server IP addresses from each tab. In your experience should I remove the DNS Server entries just from the Interconnect Bays tab or from each tab?

 

Thanks again for your help,

 

Bob