- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- StoreVirtual Storage
- >
- P4300 brings store down after 1 disk fails
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2012 01:38 AM
01-23-2012 01:38 AM
P4300 brings store down after 1 disk fails
Hello, first time poster here with a strange issue.
Last week 1 disk got status degraded - failed in a RAID 5 setup on 1 node (p4300), normally not a major problem when the disk is replaced asap.
But some 30 minutes after the disk failed these entries were seen in the manager info log:
DBD_EVENT:POST:type=STORE_LATENCY_STATUS_EXCESSIVE [latency='61.175',threshold='60.000'] DBD_EVENT:POST:type=STORE_LATENCY_STATUS_NORMAL DBD_EVENT:POST:type=STORE_LATENCY_STATUS_EXCESSIVE [latency='60.461',threshold='60.000'] DBD_MANAGER_HEARTBEAT:bringing store down after 25.567 secs (nheartbeat_failure=0) DBD_MANAGER_HEARTBEAT:bringing store down after 25.576 secs (nheartbeat_failure=0)
The store was brought down, with all hell breaking loose after that ... Servers going down etc ...
Around that time, or little after, the dbd_store messaged that is got blocked for about 120sec ...
After a short time the store came back online from offline --> degraded --> ready
Now, something tells me this is not normal behaviour, we logged a case with HP support obviously, but I was wondering if anybody had this issue before?
I do need to say this node is still in SAN/IQ 9.0 ... But even so ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2012 06:47 AM
01-23-2012 06:47 AM
Re: P4300 brings store down after 1 disk fails
do you have a failover manager for that particular management group? We had several disk failures over the past few months and right after two or three of them the affected node was offline for a minute or two. We have a setup with 4 nodes and a dedicated failover manager, so all volumes were still available. It seems that the raid controller sometimes takes some time to deal with a failed disk, doesn't react in time to requests and the SANiQ-Software takes the node offline due to too high latency.
If you only have two nodes in your setup and don't have a failover manager to provide quorum, the volumes would be unavailable for a short period of time. The same of course applies for all configurations with a single node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2012 06:56 AM
01-23-2012 06:56 AM
Re: P4300 brings store down after 1 disk fails
Hi,
That's what I figured, failed to mention that this customer only has 1 node in the MG, so the failover is non existing.
I didn't expect/know that there would be downtime for the store though, that's a little strange for a storage node, that means they should never be bought in single node setup?
There is a second node on order, but even then, as you mentioned they definitely need a FOM. They were planning to use this new node in solo mode as well...
I'm glad you witnessed this behavior as well ... Are your nodes running 9.0 or 9.5?
Thanks again
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2012 10:29 AM
01-23-2012 10:29 AM
Re: P4300 brings store down after 1 disk fails
we are running on 9.0. I can only assume that this behavior is in favor of setups with more than one node where there is no real danger in losing a node for a short period. As the network-raid spreads all accesses over all nodes in a management group, a node with a high latency is going to affect all volumes and all sessions, so the software decides to take the node offline to avoid clogging up the request queue. Sensible choice for setups with at least two nodes, fatal for setups without failover.