Insight Control for Linux
cancel
Showing results for 
Search instead for 
Did you mean: 

c -class Blade Server discovery problems

c -class Blade Server discovery problems

Hello.

We've been trying to discover a c -class Blade Server with Control Tower 1.5, but at the point of "starting tincan" a following error is displayed:

Fatal error: Uncaugh SoapFault exception: [Client] looks like we got no XML document in /var/rct/includes/provision/tincan/public/startTinCan.php:28
Stack trace:
#0 /var/rct/includes/provision/tincan/public/startTinCan.php(28): SoapClient->__call('handshake', Array)
#1 {main}
thrown in /var/rct/includes/provision/tincan/public/startTinCan.php on line 28

Anyone faced any similar problems with c -class Blades (BL460c G1)?

In Control Tower GUI nothing is shown. No Events, nothing in Component Manager and so on. We have currently 4 Blades in the Chassis (3 + CT), but CT cannot discover any of them. All ILO interfaces and OA has been set to DHCP and can be seen from CT DHCP leases view and accessed with a browser.

Please find attached a screenshot of an ILO Remote Console window showing the error.

Br,


/teemu
14 REPLIES

Re: c -class Blade Server discovery problems

Hi.

Ok, most likely problem solved. Further look at the manuals by a collegue resulted in the following discovery from Onboard Administrator User Guide.

Control Tower makes XML queries to OA of HW and by default, OA does not respond. On page 69, Network Access:

XML Reply â This checkbox is not selected by default. Selecting this checkbox enables XML replies from the Onboard Administrator onto the network.

So we'll try it out later this week.



Br,


/teemu
klemerS
Frequent Advisor

Re: c -class Blade Server discovery problems

Hello Teemu.

I am not expert & hope Robert Crockett will see your Question soon.

I installed 16 bl460c in 2 encl. With no problem at all regarding the Discover Step.

Did you connect NIC1 ( eth1 ) to the management network ?

You can check if the CT create a user "ctadmin" in thr bl460 ILOs.

Maybe something realy wrong with the /var/rct/includes/provision/tincan/public directory.

Hope you will solve the problem soon.

Shalom.

Re: c -class Blade Server discovery problems

Hello.

Yes, the interfaces are connected to the mgmt network.

We used CT for deploying a bunch of p -series blades without any problems, so I was quite amazed when this error appeared.

But, we'll check the OA configuration and other stuff later this week and hopefully after that can continue with deployment.

Thanks for the reply.

Br,

/teemu
Robert Crockett
Valued Contributor

Re: c -class Blade Server discovery problems

Hello Teemu, the main issue with the BL460c's registration process is the NIC's. Be sure that ONLY NIC1 is in the VLAN that connects to the ICLE's Management network. There is a known problem with that specific blade (BL460c) where all NIC's try to work on the expected isolated ICLE Management network. It is written up in the Release Notes for the next upgrade (v.1.60).
The easiest way to test for this scenario is to use the VGA console (or remote VGA console via the iLO) and when the blade PXE boots to the error mentioned in your first thread, go to that console prompt (which is a linux ramdisk) and type 'ifconfig'. Look for more than one NIC with a '10.128.xxx.xxx' IP, if this is the case you must setup a VLAN that only has NIC1 in it for the blades you are trying to register. This scenario is ONLY on the BL460c.

Robert

Re: c -class Blade Server discovery problems

Hi.

Thanks for the reply. The problem was with the network interfaces being connected to the same VLAN.

What's strange though is that before connecting the other NIC, it did not work either. But after removing it, it worked. Go figure...



/teemu
Robert Crockett
Valued Contributor

Re: c -class Blade Server discovery problems

Glad to hear you were able to resolve this issue... not sure why it worked the way you describe ? my assumption is a routing or NIC issue, that specific blade gave us trouble when developing ICLE - but none of the other HP blades had this issue ?

Robert

Re: c -class Blade Server discovery problems

Hi.

Ok. Thanks for the information.

We also have problems with image capture from that particular blade type. Is that also just 460c specific and will there be enhancements to that in the next release?

Br,

/teemu
Robert Crockett
Valued Contributor

Re: c -class Blade Server discovery problems

What kind of image capture problem are you having ? How many partitions do you have on the OS you are trying to capture ? and what type are they ? (ext2, ext3, reiserfs, etc.)
What OS and version ?
Can you give me an error log from ICLE on a failed capture ?

Robert

Re: c -class Blade Server discovery problems

Hi.

The image capture times out on "wait for RAPIDS" in ICLE.

-- klip --
[2007-05-22 10:14:02] Failed to contact ramdisk () after 921 seconds
[2007-05-22 10:14:02] Failed
-- klip --

In console, "mount error 101 - Network is unreachable" message is displayed. It seems that the RAPIDS image is unable to get ip address with DHCP.


Br,


/teemu
Robert Crockett
Valued Contributor

Re: c -class Blade Server discovery problems

Ok, when that happens get the IP that the ramdisk has (we use the SSTK ramdisk, Smart Start Toolkit which is developed by that team) from the VGA console of that blade and SSH into the ICLE linux OS and try to ping that IP to see if ICLE can get to it (check routes, etc.). Also you should be able to ping back to the ICLE linux OS from the blade ramdisk on the ICLE Management Network (10.128.xxx.xxx, where the default ICLE linux OS IP in that subnet is 10.128.255.253).
Verify your VLAN configuration and make sure the blades are only getting one 10.128.xxx.xxx IP assigned from the ICLE DHCP Server.

Robert

Re: c -class Blade Server discovery problems

Hi Teemu,

Similer issue we were faced "mount error 101 - Network is unreachable" but final we got the solution after enable the Port Fasting feature on all ports of core ethernet siwtch where our blade & control tower connected.

You can try this option it might be it will help you.

thanks
Naveen Gulia
Robert Crockett
Valued Contributor

Re: c -class Blade Server discovery problems

Teemu, I beleive your problem here is the known issue of the BL460c. There are other issues in this forum that relate to this.

The issue is that that specific hardware has issues unumerating the NIC's in the SSTK ramdisk that is used by the ICLE product. You need to setup a VLAN in the enclosure switch (or external switch if using passthru's) and only place the first NIC (eth0) of each blade in that VLAN. To test if this is your problem get to the prompt of one of the blades that is trying to register and at the bash prompt you showed me in your attachement type 'ifconfig' and see if there are more than one of the blade NIC's that has a 10.128.xxx.xxx IP assigned to it. If so, that is the problem and you will need to setup a VLAN and only allow eth0 (NIC1) to be in the ICLE Management network and hence get a 10.128.xxx.xxx IP.

One other thing to look at while you are at this prompt is to make sure we are able to gather all of the data needed to register a blade with ICLE. At that same bash prompt type 'cat /etc/sysconfig/session' and verify there are no empty fields. The main one to look for is the Rack name, it cannot be empty or registration will fail.

Let me know how it goes,

Robert

Re: c -class Blade Server discovery problems

Hello.

This problem was actually solved already in May, but I totally forgot to send a response here. Just been too busy...

The problem was that the switch port(s) did not have portfast enabled. We usually always enable portfast in the blade environment but then we had so many other things to do and forgot to doublecheck the switches when we noticed the problem.

I thank you for all the answers regarding the case.

Br,

/teemu
Robert Crockett
Valued Contributor

Re: c -class Blade Server discovery problems

Hello Teemu, I responded to this yesterday but don't see my reply anywhere ?

Anyway, here it is again:

The BL460c is the ONLY blade that has a different characterstic in regard to NIC's and the ICLE Management network. We documented in the Release Notes this behaviour. The problem is that both of the NIC's get into the ICLE Mgmt network and get an IP from the ICLE DHCP Server in the 10.128.xxx.xxx range. To solve this the customer will need to create a VLAN and place ONLY the first NIC (eth0) of each blade and the iLO and OA ports in that VLAN. Unless the desire is to have the OA's and iLO's on a different network than the ICLE Mgmt network, if this is the case the OA's and iLO's must have their IP's assigned BEFORE registration with ICLE - and they must not change, if they do a re-registration must occur. To test this scenario get to the 'bash' prompt that you show in your attached picture and type 'ifconfig' and see if both blade NIC's get a 10.128.xxx.xxx IP. If so, that is your problem and the VLAN must be setup to filter the onboard blade NIC's such that only eth0 or NIC1 get in the ICLE DHCP Server network.

One other thing to mention is the appropriate data needed to register blades and enclosures in ICLE. At that same bash prompt type 'cat /etc/sysconfig/session' and see if any of the fields are empty. If they are you must remedy that, if it is the Rack name you have to give the OA a rack name before registration. If there are missing serial numbers, you need to verify the hardware has a serial number in it's FRU data which can be found in the iLO or other places.

Let me know how it goes...

Robert