System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Eth0 and Eth1 Looses Information after Reboot

ramizkhan
Advisor

Eth0 and Eth1 Looses Information after Reboot


Problem:
=========
Eth0 and Eth1 is bonded and works fine
but when i reboot the OS eth0 and eth1 cant be up when i run command "ifup bond0" , it gives me following message.



netxen_nic device eth0 doesnt seem to be present, delaying intialization


netxen_nic device eth1 doesnt seem to be present, delaying intialization


Server: HP 580 Proliant

OS information:
==============

[root@at1ohypdb101 admin]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

[root@at1ohypdb101 admin]# uname -r
2.6.18-128.el5

Network Card :
=============

HP NC522SFP Dual Port 10GbE Gigabit Server Adapter


I am using above card , it works perfectly fine and I have bonded but when I reboot , eth0 and eth1 looses this driver and also the RPMS I have installed , they are gone too.

Any help would be highly appreciated.

Thanks,

15 REPLIES
Elmar P. Kolkman
Honored Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

Check out your ocnfiguration files...
You might be missing the modules, needed for using the cards. Perhaps some misconfiguration in the modprobe.conf or something like that.
Check out the lspci output (to make sure the card is still visible).
Also a ifconfig -a might help to find the cause.
I think for this card you need the netxen_nic module.
Every problem has at least one solution. Only some solutions are harder to find.
ramizkhan
Advisor

Re: Eth0 and Eth1 Looses Information after Reboot

This is my modprobe.conf file

==============
alias bond0 bonding
alias eth0 bnx2
alias eth1 bnx2
alias eth2 bnx2
alias eth3 bnx2
alias scsi_hostadapter cciss
alias scsi_hostadapter1 qla2xxx
alias scsi_hostadapter2 usb-storage
###BEGINPP
include /etc/modprobe.conf.pp
###ENDPP
alias eth4 e1000e
==================

Also, lscpsi out shows following


13:00.0 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10 Gigabit Server Adapter (rev 42)

13:00.1 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10 Gigabit Server Adapter (rev 42)
[root@at1osoadb101 ~]#


When you say modules ?
What Exactly do you mean by modules ?
I did install the driver of this card , and when I can see that driver ., when i do

rpm -qa | grep hp

thanks
ramizkhan
Advisor

Re: Eth0 and Eth1 Looses Information after Reboot

Sorry please disregard my above message

My modprobe.conf is following

======
[root@at1osoadb101 ~]# cat /etc/modprobe.conf
alias bond0 bonding
alias eth0 netxen_nic
alias eth1 netxen_nic
alias eth2 bnx2
alias eth3 bnx2
alias scsi_hostadapter cciss
alias scsi_hostadapter1 ata_piix
alias scsi_hostadapter2 qla2xxx
alias scsi_hostadapter3 usb-storage
###BEGINPP
include /etc/modprobe.conf.pp
###ENDPP
==========
Steven E. Protter
Exalted Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

Shalom,

cat /etc/redhat-release
uname -a
# please post.

I'm not sure bonding of 10BbE interfaces is even supported at this point.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Modris Bremze
Esteemed Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

Also, the output of
/etc/sysconfig/network-scripts/ifcfg-bond0
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts/ifcfg-eth1
could be useful.

What exactly did you mean by "the RPMS I have installed , they are gone too."?
Gerardo Arceri
Trusted Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

ramizkhan:
I've been through the same problem, and the solution is to disable "udev hotplug" of the drivers so they are loaded at the moment the network interfaces are started and not previously by udev.
Go ahead and create a file called /etc/modprobe.d/networkfix containing
##### DO NOT AUTOLOAD THESE MODULES
blacklist bnx2x
blacklist e1000e
#####

Reboot and you should get the interfaces working properly, btw, i couldn't help but notice that you are bonding both eth0 and eth1 together, which if redundancy is your objective is a very bad idea (TM), i would use one iface on the onboard bnx2x card and the other on the dual e1000e, so if the onboard card happens to fail it will not bring down the whole bonded iface.

Regard and i hope this information was helpful, if so, please don't forget to assign points.
ramizkhan
Advisor

Re: Eth0 and Eth1 Looses Information after Reboot

Thanks Gerardo Arceri,

Before creating the file you requested above, I went to the following directory and find this ..

[root@at1osoadb102 netxen]# cd /etc/modprobe.d/
[root@at1osoadb102 modprobe.d]# ls -l
total 44
-rw-r--r-- 1 root root 831 Jun 30 11:46 blacklist
-rw-r--r-- 1 root root 833 Jan 21 2009 blacklist-compat
-rw-r--r-- 1 root root 83 Jan 21 2009 blacklist-firewire
-rw-r--r-- 1 root root 810 Jun 30 11:46 blacklist.saved
-rw-r--r-- 1 root root 6111 Jan 21 2009 modprobe.conf.dist


and when I view the "blacklist" file and i found following entry in it . Now do you think because of this following entry I am loosing the driver ? Should I go ahead remove this entry ?

========
blacklist netxen_nic
=======



Also, the /etc/modprobe.conf looks following
======
alias bond0 bonding
alias eth0 nx_xport
alias eth1 nx_xport
alias eth2 bnx2
alias eth3 bnx2
alias scsi_hostadapter cciss
alias scsi_hostadapter1 ata_piix
alias scsi_hostadapter2 qla2xxx
alias scsi_hostadapter3 usb-storage
###BEGINPP
include /etc/modprobe.conf.pp
###ENDPP
install netxen_nic /bin/true
install nx_xport /sbin/modprobe nx_nic || /sbin/modprobe nx_xport
~
================







Also, the suggestion you gave about bonding 10gb port with Broadcom on board nic, My question is that Broadcom card is 1gb and Hp Card is 10gb , will that work ?


Also, one more thing to look here , I just rebooted the server and it come up fine and i can putty the server and bonding is there now.. Wiered rite ? But after 3 or 4 reboot it will loose everything , I am pasting the log that i got yesterday when I was not able to putty it. It throw following messages ...

===/var/log/messages/ from Yesterday=====

Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted
Jun 30 06:27:23 at1osoadb102 kernel: unm_init_module: Remove the nx_nic driver first
Jun 30 06:27:23 at1osoadb102 modprobe: FATAL: Error inserting nx_xport (/lib/modules/2.6.18-128.el5/extra/hp-nx_nic/nx_xport.ko): Operation not permitted

Jun 30 07:22:06 at1osoadb102 gconfd (root-8067): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 0
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: -+ //etc/modprobe.conf eth0 alias nx_nic
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: -+ //etc/modprobe.conf eth1 alias nx_nic
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: -+ //etc/modprobe.conf eth2 alias bnx2
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: -+ //etc/modprobe.conf eth3 alias bnx2
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: -+ //etc/modprobe.conf bond0 alias bonding
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: chmod 0644 //etc/sysconfig/networking/devices/ifcfg-eth2
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: chmod 0644 //etc/sysconfig/networking/devices/ifcfg-bond0
Jun 30 07:23:04 at1osoadb102 nm-system-settings: ifcfg-rh: updating /etc/sysconfig/network-scripts/ifcfg-bond0
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: chmod 0644 //etc/sysconfig/networking/devices/ifcfg-eth3
Jun 30 07:23:04 at1osoadb102 nm-system-settings: ifcfg-rh: updating /etc/sysconfig/network-scripts/ifcfg-eth3
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: rm //etc/sysconfig/networking/devices/ifcfg-eth0
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: rm //etc/sysconfig/network-scripts//ifcfg-eth0
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: rm //etc/sysconfig/networking/devices/ifcfg-eth1
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: rm //etc/sysconfig/network-scripts//ifcfg-eth1
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: rm //etc/sysconfig/networking/profiles/default/ifcfg-eth0
Jun 30 07:23:04 at1osoadb102 system-config-network[8208]: rm //etc/sysconfig/networking/profiles/default/ifcfg-eth1
Jun 30 07:23:05 at1osoadb102 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
Jun 30 07:23:05 at1osoadb102 nm-system-settings: ifcfg-rh: error: Couldn't parse file '/etc/sysconfig/network-scripts/ifcfg-eth0'
Jun 30 07:23:05 at1osoadb102 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
Jun 30 07:23:05 at1osoadb102 nm-system-settings: ifcfg-rh: error: Couldn't parse file '/etc/sysconfig/network-scripts/ifcfg-eth1'
Jun 30 07:28:44 at1osoadb102 system-config-network[8276]: -+ //etc/modprobe.conf eth0 alias nx_nic
Jun 30 07:28:44 at1osoadb102 system-config-network[8276]: -+ //etc/modprobe.conf eth1 alias nx_nic
Jun 30 07:28:44 at1osoadb102 system-config-network[8276]: -+ //etc/modprobe.conf eth2 alias bnx2
Jun 30 07:28:44 at1osoadb102 system-config-network[8276]: -+ //etc/modprobe.conf eth3 alias bnx2
Jun 30 07:28:44 at1osoadb102 system-config-network[8276]: -+ //etc/modprobe.conf bond0 alias bonding









ramizkhan
Advisor

Re: Eth0 and Eth1 Looses Information after Reboot

Steven E. Protter

Yes it is supported ,

[root@at1osoadb102 modprobe.d]# uname -r
2.6.18-128.el5
[root@at1osoadb102 modprobe.d]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
[root@at1osoadb102 modprobe.d]# uname -a
Linux at1osoadb102.onehreem.com 2.6.18-128.el5 #1 SMP Wed Jan 21 08:45:05 EST 2009 x86_64 x86_64 x86_64 GNU/Linux




you can take a look this is the card

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=3913538&prodTypeId=329290&prodSeriesId=3913537&swLang=13&taskId=135&swEnvOID=4004
ramizkhan
Advisor

Re: Eth0 and Eth1 Looses Information after Reboot

I am posting the ifconfig -a
because Server is up now, but i am pretty sure If i reboot 2 or 3 times netowrk is gona go and it isgoing to loose the driver

===
inet addr:10.92.200.10 Bcast:10.92.200.255 Mask:255.255.255.0
inet6 addr: fe80::f6ce:46ff:feaf:6e00/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:18511 errors:0 dropped:0 overruns:0 frame:0
TX packets:11914 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1431792 (1.3 MiB) TX bytes:23096994 (22.0 MiB)

eth0 Link encap:Ethernet HWaddr F4:CE:46:AF:6E:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
TX packets:11709 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1378030 (1.3 MiB) TX bytes:23071650 (22.0 MiB)
Interrupt:75

eth1 Link encap:Ethernet HWaddr F4:CE:46:AF:6E:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:528 errors:0 dropped:0 overruns:0 frame:0
TX packets:207 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:53762 (52.5 KiB) TX bytes:25668 (25.0 KiB)
Interrupt:139

eth2 Link encap:Ethernet HWaddr 18:A9:05:59:2E:66
inet6 addr: fe80::1aa9:5ff:fe59:2e66/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:5434 (5.3 KiB)
Interrupt:169 Memory:dc000000-dc012100

eth3 Link encap:Ethernet HWaddr 18:A9:05:59:2E:68
inet addr:192.168.1.8 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::1aa9:5ff:fe59:2e68/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:9768 (9.5 KiB)
Interrupt:177 Memory:de000000-de012100

=====


root@at1osoadb102 netxen]# /opt/netxen/nxudiag -i eth0
NETXEN Interface is eth0
Interface eth0 is UP.
************* Board Info *************
Board Ser# : CB01BK0937
Chip Rev : B2
Board Type : 0x26 (HP NC522SFP Dual Port 10GbE Server Adapter)
Core clock : 330 MHz
Mem clock : 300 MHz
Crystal freq : 20 MHz
SRE Mode : LEGACY
Firmware in : LEGACY mode
DDR size : 256 MB, ECC Enabled
QDR size : 64 MB, ECC Enabled
Peg ICACHE : OK, Enabled
Peg DCACHE : OK, Enabled
Firmware ver.: 4.0.526
Driver ver. : 4.0.520
MAC Addr 0 : F4:CE:46:AF:6E:00
MAC Addr 1 : F4:CE:46:AF:6E:04
**************************************
Junction Temperature = 38 degrees C , Status = 'NORMAL'
[root@at1osoadb102 netxen]#
Matti_Kurkela
Honored Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

You might want to use these commands to gather information now as the system is working OK:

ethtool -i eth0
ethtool -i eth1
lsmod | grep -e netxen_ -e nx_

Save the output of these commands to a file, e.g. "works.txt".

If the problem reappears, run the commands again, and save the results to another file, e.g. "fails.txt".

Comparing these two files might give a lot of clues about what is going wrong. The appropriate parts of the "dmesg" command output and /var/log/messages in the failing case wouldn't hurt either...

Based on what I've seen so far, I might *guess* the following:

- The driver for the 10GbE network adapter is apparently built up from several kernel modules, not just one. This is common for more complex hardware.

- Your system seems to have two sets of drivers for your 10GbE network adapter: the netxen_* set of modules, and the nx_* set. If the system tries to mix up these sets modules, it will probably fail: the system should be loading modules exclusively from either the netxen_* set or the nx_* set only.

As far as I can determine, the netxen_* set is the standard version included in the RHEL 5 distribution, and the nx_* set is provided by the HP driver RPM.

- The automatic configuration tools (like kudzu in RHEL) might have a built-in preference for the drivers included in the RHEL distribution, even though the HP-provided drivers might be better.

The "preference" might also be an accidental effect, caused by the loading order of things:
- When you install an updated kernel RPM, the modules in the HP driver RPM will need to be recompiled to match the updated kernel. Fortunately, the driver RPM probably includes a script that will do this automatically as necessary while the system is booting. But...

- If kudzu runs in the boot-up sequence *before* the module-recompilation script, it might "think" that your current NIC driver configuration is wrong (because the correct set of NIC drivers has not been recompiled yet), and start adjusting it... causing the configuration to break.

- Once the modules have been successfully recompiled and the sysadmin has fixed the configuration, the system will again be able to reboot without issues... until the next kernel upgrade happens.

Kudzu tries to add a bit of Artificial Intelligence to the hardware configuration of RHEL, but I've found it sometimes turns into Artificial Stupidity instead :)

If you want to use the HP-provided drivers, you may have to disable kudzu to stop it from changing the configuration on its own:

chkconfig kudzu off

A more appropriate fix might be to tweak the start-up order of kudzu vs. the HP RPM recompilation script, but I would want to know more about the situation before doing that.

MK
MK
Ishwar_1
Frequent Advisor

Re: Eth0 and Eth1 Looses Information after Reboot


To retain the Same Ip Address after Reboot their are 2 possible Way in which we can achieve this.

1> DHCP server should have an entry of Server MAC address binded with static IP

2> You should make static entry of the IP Address in the below file
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts/ifcfg-eth1
Or
You can use command neat-tui for text mode interface.

Regards
Ishwar
Elmar P. Kolkman
Honored Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

You are missing the bonding lines in your modprobe.conf...
At least, we needed them to make sure bonding was working, so I think your system has the same problem.

The lines we inserted:

alias bond0 bonding
alias bond1 bonding
alias bond2 bonding
alias bond3 bonding
options bonding miimon=100 max_bonds=4
options bond0 miimon=100 mode=4

The last one was needed to have that bond working in an active/active setup, in combination with the Cisco setup on the switches. Your mode can be different.

As for the blacklist: that has to do with the Proliant Support Pack containing HP's version of the drivers and RedHat linux delivering its own version. Try to find out which version is newest and use that one.

And the bonding with this card works fine, as long as it is not in the wrong place in some HP servers. We had the card working until it got way to hot and shut itself down. We needed to upgrade the firmware of the card and change some BIOS setting. This happened on both the servers we had with this card at that time.
Every problem has at least one solution. Only some solutions are harder to find.
ramizkhan
Advisor

Re: Eth0 and Eth1 Looses Information after Reboot

Thanks Matti Kurkela and Gerardo Arceri .

I have sole the issue , here what i did.

turn of the kudzu
remove the driver and installed the updated 5.26 version from HP site, because RHEL call it NETXEN_NIC and HP calls it NX_XPORT so they both were getting conflicted at OS reboot and that is what causing the issue.

also ethtool -i also verifies that.

I also created the network fix file with same contentc what GERARDO said above.

Also in blacklist file , i have seen
blacklist netxen_nic too which we need so this is the proper entry.

Me and other Linux Admin totally agees wit what MATTI said above.

I really appreciate your help MATTI.


Also one more thing I would like to share here is that in RHEL 5.4 release I dont even need HP updated driver and NETXEN_NIC works perfectly fine , but we are on 5.3 so I had to update the HP driver so Probabbly RHEL 5.4 has fix in it this is whay it works on 5.4.

thanks
Steven E. Protter
Exalted Contributor

Re: Eth0 and Eth1 Looses Information after Reboot

Shalom,

Turning off Kudzu may mask a problem.

When Kudzu continually detects a change in networking, this is a sign of a potentially serious hardware problem.

I would definitely boot the server into diagnostic mode and do a full hardware diagnostic.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Sameerna Desai
Occasional Visitor

Re: Eth0 and Eth1 Looses Information after Reboot

hi
i have the similar issue on OEL 5.5 , i am using HP hp-nx_nic-4.0.534-2 driver , diabled kudzu and added nx_nic entries in modeprobe.conf file, ethernet card model is HP NC522SFP for RAC interconnect.on boards cards are also disabled.

ethtool -i eth8
driver: nx_nic
version: 4.0.534
firmware-version: 4.0.534
bus-info: 0000:21:00.0

ethtool -i eth0
driver: e1000e
version: 1.0.2-k3
firmware-version: 5.12-2
bus-info: 0000:15:00.0


cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth10
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth10
MII Status: up
Link Failure Count: 0
Permanent HW addr: d8:d3:85:a1:d0:a8

Slave Interface: eth8
MII Status: up
Link Failure Count: 0
Permanent HW addr: d8:d3:85:a0:ee:d0