Operating System - Linux
1753797 Members
7000 Online
108799 Solutions
New Discussion юеВ

Re: Capturing Image from BL680c G5 Server

 
Mitchell Kulberg
Valued Contributor

Re: Capturing Image from BL680c G5 Server

Mario,

Where and when exactly did you see this stack trace message?

In the CMS system log? CMS SIM log? In the SIM GUI? On the managed node? Etc...

And tell me at what point in the capture did you see it.

Also, you wrote that the SIM was able to discover the blade and the iLO. Can you verify that the the iLO is properly associated with the server as described above?

Also, there sre some additional steps that need to be done when you use SIM to discover a node. These are:

- Verify the iLO association
- Run 'configuire or repair agents' against that node
- Apply an ICE-Linux license

Can you verify that these steps were performed?

Thanks,
Mitch
Mario Couthino
Frequent Advisor

Re: Capturing Image from BL680c G5 Server

Mario,

I see these messages on the console of the BL680C when I boot up the target server ( BL680C) in PXE boot mode

I dont see this at the capture


ILO is running.

Ran the configure or repair agent and here is the output

Configuration of agents started, waiting for it to be completed.
Configure Agents and Providers (START) ...
Configuring SSH authentication (START)...
Configure SSH for host based authentication (DONE)................. [SUCCESS]

LINUX configuration command (START)...
Configuring SNMP Settings (START)...
Stopping SNMP daemon...
Stopping snmpd: [ OK ]
Adding read community string public
Trap destination address ifdssim already in /etc/snmp/snmpd.conf
Restarting SNMP Daemon...
Starting snmpd: [ OK ]
Setting SNMP trap destination / SNMP read community string (DONE)....[SUCCESS]


Set Trust relationship to "Trust by Certificate" (START) ...
Setting Trust for System Management Home Page.
Stopping System Management Home Page
Stopping hpsmhd:
Copying /var/opt/mx/tmp/ifdssim.pem to /opt/hp/hpsmh/certs
Restarting System Management Home Page
Starting hpsmhd:
...hpsmhd: Could not determine the server├В┬┤s fully qualified domain name, using 1
72.25.41.247 for ServerName
[ OK ]
Set Trust relationship to "Trust by Certificate" (DONE)............. [SUCCESS]


Setting admin password/Trust for Insight Management Agents 7.1/earlier (START)..
Setting admin password/Trust for legacy HP Server Management Agents..[SKIPPED] H
P Server Management Drivers and Agents, is not installed.


Linux configuration commands (DONE)................................. [SUCCESS]

Re-identifying system to get update information (START) ...
Re-Identification of system (DONE).............................. [SUCCESS].

Subscribing to WBEM / WMI indications ...
Subscribe to WBEM / WMI Indications (DONE).......................... [FAILED]
Check whether target system met the requirements and all of the software require
d to support indications is installed.
WBEM protocol settings are not valid/enabled for this system in HP SIM. Check yo
ur HP SIM settings.


ICE Linux license is applied

Thanks,
Mitch
Mitchell Kulberg
Valued Contributor

Re: Capturing Image from BL680c G5 Server

You see the stack trace on the console of the managed system when you PXE boot? Correct?

Let's try this:

- Record the MAC addres from the managed server

-Shut down the managed node

-Go the the SIM CMS and delete the managed node and it's iLO from SIM

-go to /opt/repository/boot/pxelinux.cfg and delete any files you see there that match the MAC address you recorded in step 1

- power on the node and have it PXE boot and watch the console.

- It should boot up into the ICE-Linux ramdisk and automatically be rediscovered in SIM.

- After about 3-5 minutes, the node should shut down and reboot back into the OS you had running

Please let me know if this works, or if you get the stack trace message when it boots the ramdisk.

Thanks
Mario Couthino
Frequent Advisor

Re: Capturing Image from BL680c G5 Server

When I powered on the node and rebooted via pxe it kept on going in a fornever end loop. When I typed in maintenance on the prompt it gave the stack error. One thing that I have noticed is the initally when I tried to capture a linux image I was get an error at step 1 now it is at step 2. the error is below

Setting one time PXE.
Could not set one time PXE:
Error retrieving BMC for server. Root cause:Could not determine the BMC associated with the server (admelprd1)
in the database. Probably not discovered yet.


Please let me know if this works, or if you get the stack trace message when it boots the ramdisk.
Mitchell Kulberg
Valued Contributor

Re: Capturing Image from BL680c G5 Server

This new error you're getting is the result of SIM losing the server to iLO association. So it doesn't know which iLO belongs to its server. Let's not worry about that too much right now. It will go away if we ever get this working.

The fact that you got this error though, and the fact that you get into the endless loop of PXE booting, makes me thing you didn't get the server properly deleted in SIM.

Shut down the managed node.

On the CMS, go to 'all systems', select that server, and select its iLO, and click 'delete' at the bottom of the screen.

Then, log into the CMS and cd to /opt/repository/boot/pxelinux.cfg

In there you will see files with names that look like MAC addresses. You need to delete any files that match any of the MAC addresses on your managed server.

Once those files are deleted, try PXE booting the managed node again and let me know what happens. It should NOT loop, but instead boot into the RAMdisk and discover itself in SIM.

Watch the rediscovery on the console of the managed node. If it works, you should see the system start to reboot after about 2 to 5 minutes. If it stack traces going into the ramdisk, then we need to look elsewhere.
Mario Couthino
Frequent Advisor

Re: Capturing Image from BL680c G5 Server

followed the process and it stacktraced going to ramdisk
Mitchell Kulberg
Valued Contributor

Re: Capturing Image from BL680c G5 Server

That is what I was afraid of...

So there is something special about this system that is causing our ramdisk, which normally boots fine on all supported platforms, to crash on yours.

First - is there any special hardware on this server? Any mezzanine cards or storage blades or accellerators or anything that might be considered a deviation from a standard server? How many CPUs do you have on this node and how many cores in each?

Second - I have to ask. Is your firmware up to date? Our software relies on recent, if not the latest versions of firmware for your server's BIOS and iLO. Consider downloading and booting the latest firmware update CD. It upgrades not just the iLO and BIOS, but just about every other thing in your server.

Third - If that doesn't reveal anything, I'd like you to reset the system BIOS to factory defaults and try again.

Let me know the resutls of these items.
Thanks,
Mitch
Mario Couthino
Frequent Advisor

Re: Capturing Image from BL680c G5 Server

This is a BL680C 4 quad core CPU's. No mezzanine cards and or storage blades.

firmware version is at 8. ILO is at 1.5

Mitchell Kulberg
Valued Contributor

Re: Capturing Image from BL680c G5 Server

Well - it doesn't get more up to date than that.

Let me see if anyone around here knows anything about this.

Mitch
Mitchell Kulberg
Valued Contributor

Re: Capturing Image from BL680c G5 Server

On a long shot, did you try to reset the BIOS to factory defaults? It's not like this is a known problem or anything like that, but it couldn't hurt.

In the meantime, I am checking to see if we have tested with this particular configuration.