BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

OA communication problem.

The Brit
Honored Contributor

OA communication problem.

I think we might have encountered the issue described in

Advisory c02720395

Question. If I remove the DNS Information, do I have to do anything on VCM to get the stuff to correct itself??

If not, how long is it likely to take to update.

Dave.
12 REPLIES
Steven Clementi
Honored Contributor

Re: OA communication problem.

From the looks of it... all you need to do is remove the DNS info. Once you "set" it, the new information is set to the devices and they should do their lookup/reverse lookup tests. (which should fail since there is no DNS info for the module to check with).

I'd expect it to happen shortly after you apply the change.. but it is DNS related.. might take slightly longer.

Just guessing here. Never had to do this... yet.



Steven
Steven Clementi
HP Master ASE, Storage and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5)
RHCE
NPP3 (Nutanix Platform Professional)
cjb_1
Trusted Contributor

Re: OA communication problem.

I assume i had this today too.

Same issues and symptoms as described in the advisory. In addition, no access to my vc domain ip. I saw no degradation in service, either lan or san?

I removed the dns entries and waited half an hour. Still no access to the vc domain ip but could see the errors had gone by accessing vc man console in bay 2 (couldnt get on bay1).

Had to power down the enet in bay 1 and disabled vc domain ip from clui on bay2 enet.
This got rid of the vc domain ip problem. Powered up bay1 then powered down bay2. OA recognised this so assumed it was communicating ok again.

Set vc domain ip again and all ok again.

In answer to your question I think you may have to reset the enet modules after you change the dns settings. That said, I suspect strongly the removal of the dns did nothing that the resets wouldn't have done anyway. Just hope it stops the problem happening again.

I have a call in with support so they can go through the logs and tell me i need to up my firmware :). Will update if i get anything useful back.
The Brit
Honored Contributor

Re: OA communication problem.

Update.

symptoms:

1. 07:00: Oracle cluster crashed (system NICs not teamed). This coincided with a VC Failover event.

2. Enclosure setup is
Bay 1 Flex10 VC
Bay 2 Flex10 VC
Bay 3 1/10 VC
Bay 4 1/10 VC
Vertical stacking links Bay 1 -> Bay 3, and Bay 2 -> Bay 4.
There are no UPLINKs from Bays 3/4, all uplinks are from Bays 1/2

Systems with NIC teaming between Bays 1 & 2 remained up.

3. ENet interconnect modules show no communication. (See Attachment)

4. Under "Stacking Links" a) enclosure not listed, and/or stacking links not listed.

After removing the DNS information under OA::Enclosure Settings/IBIPA/Interconnect Bays.

Waited ~10 minutes and nothing happened. Then cycled VC modules in Bays 1 & 2, and problem cleared up.

This problem does not appear to be related to VC Version. This particular enclosure had just been upgraded to VC 3.15 (Saturday 26th, PM). We have a second enclosure which is running VC 2.33, and it began experiencing the same problem this morning at 08:00am, (i.e. VC failover then the above)

Both of the above enclosures are attached to the same pair of Network Switches. Both of these switches were cycled on Saturday Night, (~9pm)

No issues were experienced until 07:00am today.

Because the systems in enclosure 2 are more important, we have postponed the VC reset until this evening.

We still have concerns because we have 2 other enclosures (production) which have not (yet) experienced this problem. They are connected to different Network switches and use a different primary DNS.

A deeper explanation of what is happening here would be nice.

Dave.
cjb_1
Trusted Contributor

Re: OA communication problem.

As a happy coincidence this was our uat system too. Thanks for the screen shot as i missed getting it from ours.

The first message in my vc log indicates the enet in bay 2 reset.

For your ref we are on 3.11 and 2.34 vc.

Will push support on this and see where we get.
cjb_1
Trusted Contributor

Re: OA communication problem.

This is the inital message from the vc log.

2011-02-28T13:00:59+00:00 VCEFTW vcmd: [ENET:enc0:iobay2:3005:Warning] Enet Module power off
2011-02-28T13:00:59+00:00 VCEFTW vcmd: [VCD:04_vc:1025:Warning] Domain state NO_REDUNDANCY : Stacking Links not redundant
2011-02-28T13:00:59+00:00 VCEFTW vcmd: [NET:x-develop:7013:Minor] Enet Network state DEGRADED : Component partially operational, but capacity lost
Steven Clementi
Honored Contributor

Re: OA communication problem.

Dave:

What are you Onboard Administrator firmware version(s)?

There are some weird oddities with 3.x and VC Modules that get fixed with 3.21.


Steven
Steven Clementi
HP Master ASE, Storage and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5)
RHCE
NPP3 (Nutanix Platform Professional)
Steve MacKenzie
Occasional Advisor

Re: OA communication problem.

Hi,
We have the same problem with 2 stacked Production Enclosues (see attachment).

Is the only solution a reset of the Ethernet Modules?

This occured after the creation of a new Server Profile (note: the newly created profile has no Ethernet Connectivity). All existing blades are still running as normal.

System Log events:
2011-03-01T07:01:34+11:00 VCEXTW2941027Y vcmd: [VCD:XXXXXXXX_vc_domain:1034:Info] User Operation : setVcXmlProfilesConfiguration (mackes_admin@10.21.8.252)
2011-03-01T07:03:09+11:00 VCEXTW2941027Y vcmd: [PRO:XXXXXXXXX:6040:Info] Debug Msg: : Set personality on bay number: 1#012#011EmStored Signature: 294293954#012#011BladeStored Signature: 294293954#012#011Calculated Signature: 294293954#012#011PersonalityChecksum Signature: 2549673591
2011-03-01T07:03:09+11:00 VCEXTW2941027Y vcmd: [PRO:XXXXXXXX:6004:Info] Profile assigned : Bay 1
2011-03-01T07:03:09+11:00 VCEXTW2941027Y vcmd: [PRO:XXXXXXXXX:6001:Info] Profile added
2011-03-01T07:03:09+11:00 VCEXTW2941027Y vcmd: [ENET:enc1:iobay2:3023:Major] Enet Module state NO_COMM : Cannot communicate with component
2011-03-01T07:03:09+11:00 VCEXTW2941027Y vcmd: [NET:Bay2_Enc06_VLAN1:7011:Warning] Enet Network state UNKNOWN : Port set UNKNOWN
2011-03-01T07:03:09+11:00 VCEXTW2941027Y vcmd: [PRO:XXXXXXXX:6013:Minor] Profile state DEGRADED : At least 1 connection OK but not all connections OK
2011-03-01T07:03:10+11:00 VCEXTW2941027Y vcmd: [VCD:XXXXXX_vc_domain:1024:Minor] Domain state DEGRADED : 1+ enclosures & profiles OK, DEGRADED, UNKNOWN, NOT-MAPPED
2011-03-01T07:03:10+11:00 VCEXTW2941027Y vcmd: [PRO:XXXXXXX:6013:Minor] Profile state DEGRADED : At least 1 connection OK but not all connections OK
2011-03-01T07:03:10+11:00 VCEXTW2941027Y vcmd: [PRO:XXXXXX:6013:Minor] Profile state DEGRADED : At least 1 connection OK but not all connections OK
The Brit
Honored Contributor

Re: OA communication problem.

The DNS fix worked for one of our enclosures, (running OA 3.21/VC 3.15). i.e. remove DNS information from OA, and cycle VC modules (although checking the VC Logs indicates that the VC cycle might not have been necessary since item appeared to be coming on line on the own)

Unfortunately, the fix didn't work for a second, more important enclosure (running OA 2.60/VC 2.33). (The importance of the enclosure is the reason we are running several FW levels behind, we need the F/W to be tried-and-true before we upgrade)

After removing the DNS information on the OA, and cycling all of our VC modules, we still have "NO COMM" and no Stacking Links.

In addition, (probably unrelated), our standby Flex10 module in Bay 2, died. It is currently powered off, so we have lost 50% of our network redundancy.

We will be back on the phone with HP shortly.

(watch this space)!!

Dave
Le Coq Manuel
Frequent Advisor

Re: OA communication problem.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02720395〈=en&cc=us&taskId=115&prodSeriesId=3794423&prodTypeId=3709945
Cordially
The Brit
Honored Contributor

Re: OA communication problem.

Thankyou Manuel,
But that is the Advisory I referenced in the very first post of this thread.

my final comment.

The problem is resolved.

For newer f/w releases, removing the DNS information from the interconnect bays is sufficient.
For older f/w releases, it needs to be removed from the Device Bays pages as well.

After doing this, our older enclosures cleared up in ~10mins.

Dave.
The Brit
Honored Contributor

Re: OA communication problem.

After applying the solution above, no VC reset was required.

Dave.
babbu
Occasional Visitor

Re: OA communication problem.

We has same problem and removing DNS entires from OA solved the problem...o need to reboot anything, it takes about couple of minutes and everything looks ok