- Community Home
- >
- Storage
- >
- Entry Storage Systems
- >
- MSA Storage
- >
- HP MSA 2040 - Dual Controller iSCSI - Disappeared....
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2014 12:23 PM
06-16-2014 12:23 PM
HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Purchased a brand new HP MSA 2040 Dual Controller around 30 days ago, configured in minutes, worked great. This morning at 4:40AM it completely disappeared.
My apologize for the long winded post.
Info:
-Dual Controller
-24 X 900GB Dual Port Enterprise Drives
-Used with VMWare
-2 ESXi hosts, each with 2 DAC connections (each host has 1 connection to controller a, 1 connection to controller b)
-Configured as per HP's best practice for MSA 2040 with vSphere document
-Round Robin enabled, multiple links active (each host has 1 connection to each controller)
-2 vDisks, cont a owns disks 1-12. cont b owns disks 13-24
-Different subnets used on each link
-4:40AM errors and warnings in event log start to occur (discovered this later viewing back)
-6:00AM I wake up, notice storage is down, servers are OK, storage is down
-Check physical SAN, fans are 100%, amber health led is illuminated - degraded health
-Look at controllers, NICs are down on Cont A, NICs are up on Cont B, Amber health LED flashing on both controllers
-Go to log in to web interface, page loads, but states it's unavailable, cannot log in
-SSH in, logged in, every command gets a reply of:
Error: The MC is not ready. Wait a few seconds then retry the request. (2014-06-16 06:32:00)
After trying everything, couldn't do anything. Unplugged power cables, waited a couple seconds, and plugged back in.
-Unit came up, amber LED illuminated. Within 15 minutes this cleared by itself (after viewing logs, it recovered and wrote back the flash, I think)
I extracted these logs from the "save log" feature. They are very similiar to the actual logs presented inside of the management interface. I'll attach a screenshot of the actual logs in the web interface.
B440 2014-06-16 04:38:03 194 INFORMATIONAL Auto-write-through trigger event: partner processor is not up. B441 2014-06-16 04:38:03 71 INFORMATIONAL Failover started. (failed or shutdown controller: A) B442 2014-06-16 04:38:05 19 INFORMATIONAL A rescan-bus operation was done. (number of disks that were found: 24, number of enclosures that were found: 1) (rescan reason: initiated by internal logic, rescan reason code: 2) B443 2014-06-16 04:38:05 77 INFORMATIONAL Write-back cache was initialized for controller A. Write-back data was found. B444 2014-06-16 04:40:07 107 ERROR Critical Error: OSMEnterDebugger p1: 0x03259E6, p2: 0x0325E43, p3: 0x03268AD, p4: 0x0326DCB CThr: IcMsgMon, DbgRegNum=255 B445 2014-06-16 06:37:19 56 INFORMATIONAL Storage Controller booted up (cold boot - power up). SC firmware version: GLS105R04-01 B446 2014-06-16 06:37:52 84 WARNING Killed partner controller. (reason: Non volatile device flush or restore failure) B447 2014-06-16 06:37:52 204 INFORMATIONAL The system has come up normally and the NV device is in a normal expected state. (p1: 0x0, p2: 0x2F, p3: 0x0, p4: 0x0) B448 2014-06-16 06:38:23 204 INFORMATIONAL The system has come up normally and the NV device is in a normal expected state. (p1: 0x0, p2: 0x30, p3: 0x0, p4: 0x0) B449 2014-06-16 06:38:27 211 INFORMATIONAL The SAS topology changed (components were added or removed). (Channel: 0, number of elements: 95, expanders: 1, native levels: 1, partner levels: 0, device PHYs: 25) B450 2014-06-16 06:38:33 112 INFORMATIONAL Host link down. (port: 1) B451 2014-06-16 06:38:34 112 INFORMATIONAL Host link down. (port: 2) B452 2014-06-16 06:38:34 112 INFORMATIONAL Host link down. (port: 3) B453 2014-06-16 06:38:34 112 INFORMATIONAL Host link down. (port: 4) B454 2014-06-16 06:38:34 310 INFORMATIONAL Discovery and initialization of enclosure data was completed following a rescan. B455 2014-06-16 06:38:35 188 INFORMATIONAL Write-back cache was disabled. B456 2014-06-16 06:38:35 190 INFORMATIONAL Auto-write-through trigger event: supercapacitor charging. B457 2014-06-16 06:38:35 310 INFORMATIONAL Discovery and initialization of enclosure data was completed following a rescan. B458 2014-06-16 06:38:35 77 INFORMATIONAL Write-back cache was initialized for controller B. Write-back data was found. B459 2014-06-16 06:38:41 77 INFORMATIONAL Write-back cache was initialized for controller A. Write-back data was found. B460 2014-06-16 06:38:50 19 INFORMATIONAL A rescan-bus operation was done. (number of disks that were found: 24, number of enclosures that were found: 1) (rescan reason: initiated by internal logic, rescan reason code: 27) B461 2014-06-16 06:39:01 202 INFORMATIONAL Auto-write-through: Write-back cache was reenabled. B462 2014-06-16 06:39:01 191 INFORMATIONAL Auto-write-through trigger event: supercapacitor good. B463 2014-06-16 06:39:49 81 INFORMATIONAL Kill was released (that is, the partner controller was allowed to boot up), automatic. B464 2014-06-16 06:39:54 211 INFORMATIONAL The SAS topology changed (components were added or removed). (Channel: 1, number of elements: 91, expanders: 1, native levels: 0, partner levels: 1, device PHYs: 25) B465 2014-06-16 06:39:54 310 INFORMATIONAL Discovery and initialization of enclosure data was completed following a rescan. B466 2014-06-16 06:39:56 19 INFORMATIONAL A rescan-bus operation was done. (number of disks that were found: 24, number of enclosures that were found: 1) (rescan reason: initiated by internal logic, rescan reason code: 27) B467 2014-06-16 06:40:26 363 INFORMATIONAL Firmware versions match those in the firmware bundle. (controller: B) B468 2014-06-16 06:40:26 181 INFORMATIONAL Management Controller configuration parameters were set. B469 2014-06-16 06:40:26 181 INFORMATIONAL Management Controller configuration parameters were set. B470 2014-06-16 06:40:26 141 INFORMATIONAL The Management Controller IP address changed. (new IP address: IP: 10.127.32.16/255.255.255.0/10.127.32.5) B471 2014-06-16 06:40:26 139 INFORMATIONAL The Management Controller booted up. MC firmware version: GLM105R009-01 (baselevel: L100) B472 2014-06-16 06:40:28 111 INFORMATIONAL Host link up. (port: 3, speed: 10 Gbps) B473 2014-06-16 06:40:31 111 INFORMATIONAL Host link up. (port: 4, speed: 10 Gbps) B474 2014-06-16 06:40:32 112 WARNING Host link down. (port: 3) B475 2014-06-16 06:40:33 111 INFORMATIONAL Host link up. (port: 3, speed: 10 Gbps) B476 2014-06-16 06:40:35 112 WARNING Host link down. (port: 4) B477 2014-06-16 06:40:36 111 INFORMATIONAL Host link up. (port: 4, speed: 10 Gbps) B478 2014-06-16 06:41:26 310 INFORMATIONAL Discovery and initialization of enclosure data was completed following a rescan. B479 2014-06-16 06:41:31 195 INFORMATIONAL Auto-write-through trigger event: partner processor is up. B480 2014-06-16 06:41:31 73 INFORMATIONAL Heartbeat was detected from the partner controller. This indicates that the partner controller is operational. B481 2014-06-16 06:41:31 72 INFORMATIONAL Recovery was initiated for controller A. B482 2014-06-16 06:41:49 19 INFORMATIONAL A rescan-bus operation was done. (number of disks that were found: 24, number of enclosures that were found: 1) (rescan reason: initiated by internal logic, rescan reason code: 27) B483 2014-06-16 06:41:53 19 INFORMATIONAL A rescan-bus operation was done. (number of disks that were found: 24, number of enclosures that were found: 1) (rescan reason: initiated by internal logic, rescan reason code: 6)
Inside of the kernel controller (B) log, I noticed a bunch of these:
Jun 16 05:29:17 (none) user.warn kernel: MCMC: error status (0xdc) - Memory write failed. (Inter MC link message(0x17))
Viewing further, it appears that the NETDEVWATCHDOG mentioned errors on this interface, I don't know if this is a LAN type interface for communication between both controllers.
Jun 16 04:37:55 (none) user.warn kernel: MCMC: error status (0xda) - Unexpected failure. (Inter MC link message(0x17)) Jun 16 04:37:56 (none) user.warn kernel: ------------[ cut here ]------------ Jun 16 04:37:56 (none) user.warn kernel: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x15c/0x24c() Jun 16 04:37:56 (none) user.info kernel: NETDEV WATCHDOG: mcmc (): transmit queue 0 timed out Jun 16 04:37:56 (none) user.warn kernel: Modules linked in: mcmclink g_serial ocores_udc mcfulink mooseproc mcscbridge msgdrv Jun 16 04:37:56 (none) user.warn kernel: [<c0014428>] (unwind_backtrace+0x0/0xec) from [<c02a6c50>] (dump_stack+0x20/0x24) Jun 16 04:37:56 (none) user.warn kernel: [<c02a6c50>] (dump_stack+0x20/0x24) from [<c001bf60>] (warn_slowpath_common+0x5c/0x74) Jun 16 04:37:56 (none) user.warn kernel: [<c001bf60>] (warn_slowpath_common+0x5c/0x74) from [<c001c034>] (warn_slowpath_fmt+0x40/0x48) Jun 16 04:37:56 (none) user.warn kernel: [<c001c034>] (warn_slowpath_fmt+0x40/0x48) from [<c022a1e4>] (dev_watchdog+0x15c/0x24c) Jun 16 04:37:56 (none) user.warn kernel: [<c022a1e4>] (dev_watchdog+0x15c/0x24c) from [<c00281b8>] (run_timer_softirq+0x1d0/0x2dc) Jun 16 04:37:56 (none) user.warn kernel: [<c00281b8>] (run_timer_softirq+0x1d0/0x2dc) from [<c0021cfc>] (__do_softirq+0xd8/0x1c0) Jun 16 04:37:56 (none) user.warn kernel: [<c0021cfc>] (__do_softirq+0xd8/0x1c0) from [<c002219c>] (irq_exit+0x50/0x5c) Jun 16 04:37:56 (none) user.warn kernel: [<c002219c>] (irq_exit+0x50/0x5c) from [<c000f750>] (handle_IRQ+0x84/0xa4) Jun 16 04:37:56 (none) user.warn kernel: [<c000f750>] (handle_IRQ+0x84/0xa4) from [<c00086b8>] (asm_do_IRQ+0x18/0x1c) Jun 16 04:37:56 (none) user.warn kernel: [<c00086b8>] (asm_do_IRQ+0x18/0x1c) from [<c000e394>] (__irq_svc+0x34/0x80) Jun 16 04:37:56 (none) user.warn kernel: Exception
Looks like a kernel panic.
Controller A was also filled with:
Jun 16 05:38:09 (none) user.warn kernel: MCMC: error status (0xdc) - Memory write failed. (Inter MC link message(0x17))
Jun 16 05:38:14 (none) user.warn kernel: MCMC: error status (0xdc) - Memory write failed. (Inter MC link message(0x17))
Jun 16 05:38:15 (none) user.warn kernel: MCMC: error status (0xdc) - Memory write failed. (Inter MC link message(0x17))
After 15 minutes or so after plugging it back in, the AMBER health LED disappeared, and the system was back up online and everything was good.
I called HP to see if they had input. They mentioned they don't know what caused this, however there was notes about the:
Critical Error: OSMEnterDebugger p1: 0x03259E6, p2: 0x0325E43, p3: 0x03268AD, p4: 0x0326DCB CThr: IcMsgMon, DbgRegNum=255
being a firmware related issue. They mentioned there are internal notes on other SANs (P2000, 2012i), but none for the MSA 2040.
They offered to replace controller A, however this unit is BRAND new, so I don't want to replace a controller with a "repaired" controller. They also mentioned the compact flash may not be working due to the error logged in the logs.
However, I think the compact flash error was spawned because I plugged it in right away after unplugging it, and it was probably still using the supercapacitor to write the flash to the compact flash (since later on it was able to succesfully write it back to disk, I think).
Anyone have any input? I was told normally they could make an internal note for investigation on the:
Critical Error: OSMEnterDebugger p1: 0x03259E6, p2: 0x0325E43, p3: 0x03268AD, p4: 0x0326DCB CThr: IcMsgMon, DbgRegNum=255
error, however a new firmware was released 4 days ago, so they can't since I'm running the 2nd newest firmware. Keep in mind the latest firmware only resolves two issues that are completely unrelated.
Any help would be appreciated. Been running back online for 6+ hours no and no issues. I'm just shocked both controllers wigged out and there was no fallback.
I'm thinking it's firmware related, but not sure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2014 12:30 PM
06-16-2014 12:30 PM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Here's a screenshot of the logs. Please find attached.
Sorry, I couldn't copy and paste from the web interface for some reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2014 05:58 AM - edited 06-17-2014 05:59 AM
06-17-2014 05:58 AM - edited 06-17-2014 05:59 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Diving more in to detail in to the controller kernel logs, I'm betting this entry set it off:
Jun 16 04:38:08 (none) user.warn kernel: MCMC: error status (0xda) - Unexpected failure. (Inter MC link message(0x17))
The kernel logs on Controller A are flooding with these when the unit went offline. I'm assuming this is the network communication used in the backplane for the controller to talk to eachother.
Kernel log on controller B, reflected (tons of these):
Jun 16 04:37:58 (none) user.warn kernel: MCMC: error status (0xda) - Unexpected failure. (Inter MC link message(0x17))
Jun 16 04:39:57 (none) user.warn kernel: MCMC: error status (0xdc) - Memory write failed. (Inter MC link message(0x17))
And it all started with:
Jun 16 04:37:55 (none) user.warn kernel: MCMC: error status (0xda) - Unexpected failure. (Inter MC link message(0x17))
Jun 16 04:37:56 (none) user.warn kernel: ------------[ cut here ]------------
Jun 16 04:37:56 (none) user.warn kernel: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x15c/0x24c()
Jun 16 04:37:56 (none) user.info kernel: NETDEV WATCHDOG: mcmc (): transmit queue 0 timed out
Jun 16 04:37:56 (none) user.warn kernel: Modules linked in: mcmclink g_serial ocores_udc mcfulink mooseproc mcscbridge msgdrv
Jun 16 04:37:56 (none) user.warn kernel: [<c0014428>] (unwind_backtrace+0x0/0xec) from [<c02a6c50>] (dump_stack+0x20/0x24)
Jun 16 04:37:56 (none) user.warn kernel: [<c02a6c50>] (dump_stack+0x20/0x24) from [<c001bf60>] (warn_slowpath_common+0x5c/0x74)
Jun 16 04:37:56 (none) user.warn kernel: [<c001bf60>] (warn_slowpath_common+0x5c/0x74) from [<c001c034>] (warn_slowpath_fmt+0x40/0x48)
Jun 16 04:37:56 (none) user.warn kernel: [<c001c034>] (warn_slowpath_fmt+0x40/0x48) from [<c022a1e4>] (dev_watchdog+0x15c/0x24c)
Jun 16 04:37:56 (none) user.warn kernel: [<c022a1e4>] (dev_watchdog+0x15c/0x24c) from [<c00281b8>] (run_timer_softirq+0x1d0/0x2dc)
Jun 16 04:37:56 (none) user.warn kernel: [<c00281b8>] (run_timer_softirq+0x1d0/0x2dc) from [<c0021cfc>] (__do_softirq+0xd8/0x1c0)
Jun 16 04:37:56 (none) user.warn kernel: [<c0021cfc>] (__do_softirq+0xd8/0x1c0) from [<c002219c>] (irq_exit+0x50/0x5c)
Jun 16 04:37:56 (none) user.warn kernel: [<c002219c>] (irq_exit+0x50/0x5c) from [<c000f750>] (handle_IRQ+0x84/0xa4)
Jun 16 04:37:56 (none) user.warn kernel: [<c000f750>] (handle_IRQ+0x84/0xa4) from [<c00086b8>] (asm_do_IRQ+0x18/0x1c)
Jun 16 04:37:56 (none) user.warn kernel: [<c00086b8>] (asm_do_IRQ+0x18/0x1c) from [<c000e394>] (__irq_svc+0x34/0x80)
Jun 16 04:37:56 (none) user.warn kernel: Exception stack(0xc03e7f50 to 0xc03e7f98)
Jun 16 04:37:56 (none) user.warn kernel: 7f40: 00000000 0005317f 0005217f 60000013
Jun 16 04:37:56 (none) user.warn kernel: 7f60: c03e86c8 00000000 c07400e0 c03eb1fc 49804000 41069265 49bd4dc4 c03e7fa4
Jun 16 04:37:56 (none) user.warn kernel: 7f80: 600000d3 c03e7f98 c000f8ec c000f8f8 60000013 ffffffff
Jun 16 04:37:56 (none) user.warn kernel: [<c000e394>] (__irq_svc+0x34/0x80) from [<c000f8f8>] (default_idle+0x3c/0x40)
Jun 16 04:37:56 (none) user.warn kernel: [<c000f8f8>] (default_idle+0x3c/0x40) from [<c000fad0>] (cpu_idle+0x60/0xb4)
Jun 16 04:37:56 (none) user.warn kernel: [<c000fad0>] (cpu_idle+0x60/0xb4) from [<c02a42c4>] (rest_init+0x68/0x80)
Jun 16 04:37:56 (none) user.warn kernel: [<c02a42c4>] (rest_init+0x68/0x80) from [<c03be7a0>] (start_kernel+0x2a8/0x300)
Jun 16 04:37:56 (none) user.warn kernel: ---[ end trace ec0622d06186d082 ]---
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2014 10:23 AM
07-24-2014 10:23 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
hi Stephen,
Like you I have a brand new MSA (controller firmware revision: GLS105R04-01) and I was able to crash the controller. I'm in process of opening a ticket with HP concerning this. I'm curious to know what you find out, and I'll post back to your thread when I find out more with my ticket. :)
I crashed Controller A as well, curiously, but it was also the controller that I was doing a benchmark run on.
I'm going to look through your posts a little more carefully a bit later and through my logs and followup.
Thanks,
Doug
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2014 10:48 AM
07-24-2014 10:48 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Hmm,
Shortly after this I updated to the latest firmware, and I haven't encountered the issue again. Keep in mind I haven't done any benchmarks on the unit since. It's been fairly stable, and I occasionally restart the individual controllers.
HP kept in touch regarding this issue with my case, however I didn't want to replace any of the hardware since my unit is brand new. Eventually they closed the case since this didn't occur again.
I'm betting money this is firmware related, hopefully if enough people complain, they will investigate the issue and come out with a fix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2014 10:51 AM
10-17-2014 10:51 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
For anyone else searching this post, here is some more useful information:
We've also experienced the same issue. MSA2040 using iSCSI directly attached to c7000 virtual connects. Under heavy network traffic to / from MSA2040 it simply disappears from vmware and each host can't see it.
We updated MSA2040 to latest firmware, added drivers to ESXi for the network adapters in each blade, and still no dice. From reading around it could be a firmward issue on the network adapter on each blade, or issue with ESXi 5.5. Will post back here once we have the solution. Some other posts about similar issues with the P2000 show that it might be a ESXi 5.5 issue and you (we) might have to roll it back to 5.1 for compatibility. Bummer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2014 11:02 AM
10-17-2014 11:02 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Just for an update in my situation, shortly after I updated to the latest firmware on the MSA 2040, also since then there's been a few updates to ESXi which I've done.
Since then, this issue has not occured again for me. If anything happens on my side, I'll post back.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2016 06:07 AM - edited 10-31-2016 06:15 AM
10-31-2016 06:07 AM - edited 10-31-2016 06:15 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Hello, anybody solve this problem?
I have this errors and the controller does not boot:
WARNING Killed partner controller. (reason: Non volatile device flush or restore failure)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2016 06:34 AM
10-31-2016 06:34 AM
Re: HP MSA 2040 - Dual Controller iSCSI - Disappeared. Critical Error: OSMEnterDebugger
Hello AnieBuhr,
Could you clarify if you have the problem mentioned in the initial thread? Or is it a new different problem?
If it is the same problem that was mentioned in the initial thread, please make sure you have your firmware up to date on the MSA2040 SAN (as the firmware update resolved my issues).
If your issue is different than the problem mentioned in the initial thread, please contact HPe support as they will be able to provide support and help diagnose your issue if your MSA2040 is under warranty or covered by an HPe Care Pack..