- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- ICE-Linux mond issues with mdadm
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2009 05:27 PM
11-04-2009 05:27 PM
ICE-Linux mond issues with mdadm
Nov 4 14:56:58 usorl03p307 mdadm: DeviceDisappeared /dev/md0
Nov 4 14:56:58 usorl03p307 mdadm: DeviceDisappeared /dev/md2
Nov 4 14:56:58 usorl03p307 mdadm: DeviceDisappeared /dev/md1
Nov 4 14:56:59 usorl03p307 mdadm: DeviceDisappeared /dev/md0
Stopping mond stops the messages.
/etc/init.d/mond stop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 11:57 AM
11-05-2009 11:57 AM
Re: ICE-Linux mond issues with mdadm
These critical alerts are associated with the "Syslog Alerts" Service, correct?
I'd like to see if I can reproduce this. What version of RH5 do you have installed on your managed nodes (e.g. 32bit or 64bit; update 1 or 2)?
If you're not interested in seeing these mdadm critical alerts you should be able to stop the alerts by modifying the /opt/hptc/nagios/etc/syslogAlertRules file.
Try this and let me know if the alerts stop.
Edit syslogAlertRules (make a backup copy first) and change the mdadm rule to look as follows (i.e. add DeviceDisappeared to the list of mdadm events to ignore).
rule mdadm_errors {
name (! /(NewArray)|(SparesMissing) (DeviceDisappeared)/)
relevance ($subsystem =~ /mdadm/)
format "$timestamp $message"
}
Thanks,
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 01:02 PM
11-05-2009 01:02 PM
Re: ICE-Linux mond issues with mdadm
The RHEL version on the node is RHEL 5.4 x86_64 on BL495G5 blades in C7000 chassis.
Have been working with Mitch on other issues also but not this one.
We are interested in seeing valid mdadm alerts, but these are not valid and start after mond is stared.
I will make your suggested changes and report back.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 01:12 PM
11-05-2009 01:12 PM
Re: ICE-Linux mond issues with mdadm
maybe shoudl be: (SparesMissing)|(DeviceDisappeared)/)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 01:58 PM
11-05-2009 01:58 PM
Re: ICE-Linux mond issues with mdadm
rule mdadm_errors {
name (! /(NewArray)|(SparesMissing)|(DeviceDisappeared)/)
relevance ($subsystem =~ /mdadm/)
format "$timestamp $message"
}
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 02:03 PM
11-05-2009 02:03 PM
Re: ICE-Linux mond issues with mdadm
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2009 04:52 AM
11-06-2009 04:52 AM
Re: ICE-Linux mond issues with mdadm
mond -> /opt/hptc/supermon/etc/init.d/mond-setup
with mond stopped there are no more messages generated in /var/log so there is something that ICE-Linus (supermon) is doing that is causing the message to occur in the first place.
Need to find the root cause that is causing the messages.
I can provide you a virtual room connection if it would help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2009 06:46 AM
11-06-2009 06:46 AM
Re: ICE-Linux mond issues with mdadm
On the CMS, vi /opt/hptc/nagios/etc/nagios_vars.ini. In this file you will see mdadminfo and MDAMDCOLLECTIONPERIOD.
MDADMCOLLECTION is set to 15 minutes which means on the target nodes, supermon will call /opt/hptc/mdadm/sbin/getMdadmEvents every 15 minutes. You can change this collection period to anything you like.
If you log in to one of you target nodes, you can look at /opt/hptc/mdadm/sbin/getMdadmEvents which calls mdadm-handler. mdadm-handler sends all messages returned by /sbin/mdadm to syslog.
We recently fixed an issue in our next IC-Linux release (V6.0) where this script was failing because it was being run as Nagios and not root so I'm wondering if your hitting that issue.
Can you run a test for me? On the target node, (as root) run /opt/hptc/mdadm/sbin/getMdadmEvents and tail /var/log/messages and let me know what you see.
Then login as Nagios (su - nagios) and run getMdadmEvents and let me know what you see in /var/log/messages.
In regards to the DeviceDisappeared event, do you think that /sbin/mdadm is incorrectly reporting this error? Or has the device really disappeared?
One work around I can think of is to modify mdadm-handler to check for the DeviceDisappeared event and not call syslog.
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2009 10:54 AM
11-06-2009 10:54 AM
Re: ICE-Linux mond issues with mdadm
When ran as nagio, each time the command getMdadmEvents generates:
Nov 6 13:45:53 usorl03p309 mdadm: DeviceDisappeared /dev/md1
Nov 6 13:45:53 usorl03p309 mdadm: DeviceDisappeared /dev/md0
Nov 6 13:45:59 usorl03p309 mdadm: DeviceDisappeared /dev/md2
Nov 6 13:45:59 usorl03p309 mdadm: DeviceDisappeared /dev/md1
Nov 6 13:45:59 usorl03p309 mdadm: DeviceDisappeared /dev/md0
I believe the messages are bogus and the devices are NOT disappearing.
dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2009 12:15 PM
11-06-2009 12:15 PM
Re: ICE-Linux mond issues with mdadm
I realize that's a lot of questions, but I'm just trying to figure out why mdadm would be reporting the error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2009 12:45 PM
11-06-2009 12:45 PM
Re: ICE-Linux mond issues with mdadm
mdadm.conf
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=aa4f5616:1f85a679:04e92872:8cb15fe7
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=6787038e:e6c35d9c:fa5a0916:9729dd5f
dave
ARRAY /dev/md2 level=raid1 num-devices=2 uuid=c90d94d7:2f54ad8e:74248664:92872716
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2009 03:00 PM
11-06-2009 03:00 PM
Re: ICE-Linux mond issues with mdadm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2009 06:16 PM
11-07-2009 06:16 PM
Re: ICE-Linux mond issues with mdadm
usorl03p309 ~ -1277> cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
208704 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
12586816 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
49721088 blocks [2/2] [UU]
unused devices:
usorl03p309 ~ -1278>
dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-09-2009 05:53 AM
11-09-2009 05:53 AM
Re: ICE-Linux mond issues with mdadm
I'm curious. Are you able to reproduce this error on any other servers other than this one? any chance you've got USB devices on this server?
It's a long shot, but I've had questionable USB devices do that for real.
Thanks,
Mitch
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2009 01:42 PM
11-10-2009 01:42 PM
Re: ICE-Linux mond issues with mdadm
After further investigation it looks like this bogus DeviceDisappeared event is occurring because we are running mdadm as the nagios user. This is happening because we changed mond (which calls getMdadmEvents) to run as Nagios instead of root for security purposes. However, when we made this change we forgot to modify mdadm to use sudo so there's a defect in V2.11, in that we should be using "sudo /sbin/mdadm" inside getMdadmEvents.
This defect is fixed in the next IC-Linux release (V6.0) which should be available January 2010.
Do you know if Siemens is planning to move to V6.0 when it becomes available?
In the interim, You could manually work around this issue by making the following changes on every managed system. This is exact same fix that will be available in our V6.0 release.
1) Add the following line to /etc/sudoers on every managed system.
nagios ALL = NOPASSWD: /sbin/mdadm
And
2) Add "sudo" to the following line in /opt/hptc/mdadm/sbin/getMdadmEvents
`/usr/bin/sudo /sbin/mdadm --monitor --scan --program=/opt/hptc/mdadm/sbin/mdadm-handler --oneshot`;
Let me know if this helps.
Thanks,
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2009 04:52 PM
11-10-2009 04:52 PM
Re: ICE-Linux mond issues with mdadm
I'll give your suggestions a try and report back to you.
dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2009 06:04 PM
11-10-2009 06:04 PM
Re: ICE-Linux mond issues with mdadm
Ready for another one? something is trying to open /dev/mcelog on 15 minute intervals and getting permission denied.
Nov 10 20:28:27 usorl03p309 mcelog: Cannot open /dev/mcelog
Nov 10 20:43:26 usorl03p309 mcelog: Cannot open /dev/mem for DMI decoding: Permission denied
Nov 10 20:43:26 usorl03p309 mcelog: Cannot open /dev/mcelog
Nov 10 20:58:27 usorl03p309 mcelog: Cannot open /dev/mem for DMI decoding: Permission denied
dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2009 07:31 PM
11-10-2009 07:31 PM
Re: ICE-Linux mond issues with mdadm
The mcelog event is the exact same issue so you need to apply the same work around.
1) Add /usr/sbin/mcelog to /etc/sudoers and
2) Add /usr/bin/sudo to the following line in /opt/hptc/mcelog/sbin/getMcelogEvents.
e.g.
`/usr/bin/sudo /usr/sbin/mcelog --syslog`;
These where the only two sudo issues fixed for V6.0, so you should be all set now.
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-11-2009 05:06 AM
11-11-2009 05:06 AM
Re: ICE-Linux mond issues with mdadm
Applied the changes for mcelog also.
The last issue I'm working so far with Mitch is the wrong system name is being picked up when multiple IP's are plumbed up on the same NIC. Mitch should have all the details but maybe I'll open up a new forum on this one also.
Thanks for your support.
dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-11-2009 06:02 AM
11-11-2009 06:02 AM
Re: ICE-Linux mond issues with mdadm
Mitch described the NIC/hostname issue to me. I'm going to try and reproduce it and will let you know what I find.
Donna
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2009 06:23 AM
11-12-2009 06:23 AM
Re: ICE-Linux mond issues with mdadm
I defined multiple NICs on managed system pluto as shown below and after I discovered it with SIM, I'm correctly seeing the one IP address for eth0 and host name pluto in SIM.
Is this configuration similar to your multi NIC configuration? Please open up a new forum entry for this discussion.
[root@poseidon image]# mxnode -ld pluto
System name: pluto
Host name: pluto.usa.hp.com
IP addresses: 16.118.197.34
OS name: LINUX
[root@pluto ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:16:35:C6:C8:F6
inet addr:16.118.197.34 Bcast:16.118.207.255 Mask:255.255.240.0
inet6 addr: fe80::216:35ff:fec6:c8f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6602599 errors:0 dropped:0 overruns:0 frame:0
TX packets:120564 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:651843140 (621.6 MiB) TX bytes:17638634 (16.8 MiB)
Interrupt:169 Memory:f6000000-f6012800
eth0:0 Link encap:Ethernet HWaddr 00:16:35:C6:C8:F6
inet addr:16.118.197.163 Bcast:16.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:f6000000-f6012800
eth0:1 Link encap:Ethernet HWaddr 00:16:35:C6:C8:F6
inet addr:16.118.198.249 Bcast:16.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:f6000000-f6012800
eth0:2 Link encap:Ethernet HWaddr 00:16:35:C6:C8:F6
inet addr:16.118.199.254 Bcast:16.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:f6000000-f6012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:613664 errors:0 dropped:0 overruns:0 frame:0
TX packets:613664 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:238774063 (227.7 MiB) TX bytes:238774063 (227.7 MiB)
Thanks,
Donna