- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- HPE EVA Storage
- >
- SQL Cluster nodes on MSA1000 locking
HPE EVA Storage
1823181
Members
3793
Online
109647
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2008 06:26 AM
07-14-2008 06:26 AM
SQL Cluster nodes on MSA1000 locking
I have 4 servers fiber connected to a dual controller MSA1000 with 1 expansion shelf. Two (2) of the servers are nodes of an Oracle RAC cluster and 2 are nodes of a Microsoft SQL 2005 cluster. Twice in 2 weeks the SQL database has become unresponsive over the weekend. Once onsite it looks like a physical drive in the primary shelf of the MSA has gone bad due to an amber light showing on the drive. That physical drive is a member of a 3 drive RAID 5 array that is broken into 2 logical drives, one of which serves as the Microsoft Cluster Quorum drive.
The node that had been the Active node in the SQL cluster before the problem was now completely unresponsive even from the console. The passive node was responsive but I could not get into the Cluster Administrator.
I also received an error when trying to get into the Array Configuration Utility so I was not able to view the status through there.
The Oracle RAC system continued to function normally other than getting an error when trying to access the Array Configuration utility from 1 of the Oracle nodes.
We opted to shut all 4 nodes down then reboot the MSA1000 itself. It took 2 to 3 reboots of the MSA each time for it to come up clean...we had messages stating there was bad firmware on one of the drives in the shelf and prompts on whether or not we wanted to enable certain volumes. After the 2nd or 3rd reboot the MSA came up clean with no errors.
With the MSA back up, the drive that had been showing as bad no longer displayed the amber light indicating it was bad. We brought up the primary SQL node that had been active when the problem occured and were able to get into the Array Configuration Utility. A different array, a 2 disk RAID 1, was showing as being rebuilt and the array that contained the disk that had been showing the amber indicator light, was showing an odd circle icon on the 2 logical drives in that array. I have not tracked down the exact meaning of that icon yet but I assume it is related to this next error/status message showing in the ACU: “#771: The current array controller had valid data stored in its battery backed write cache the last time it was reset or was powered up. This indicates that the system may not have been shutdown gracefully. The array controller has automatically written, or attempted to write, this data to the drives. This message will continue to be displayed until the next reset or power cycle of the array controller.”
I am not exactly sure where my problem lies. Do I have a controller issue or is it a combination of a bad drive and the Microsoft cluster software? Either way I would like to replace the drive that initially displayed as bad but it is now showing as healthy. Is there a way in the HP utility to gracefully eject a drive from a RAID 5 array and replace it or do I have to just manually remove it while hot to simulate a failure and replace it then?
Any assistance is greatly appreciated.
Patrick G.
The node that had been the Active node in the SQL cluster before the problem was now completely unresponsive even from the console. The passive node was responsive but I could not get into the Cluster Administrator.
I also received an error when trying to get into the Array Configuration Utility so I was not able to view the status through there.
The Oracle RAC system continued to function normally other than getting an error when trying to access the Array Configuration utility from 1 of the Oracle nodes.
We opted to shut all 4 nodes down then reboot the MSA1000 itself. It took 2 to 3 reboots of the MSA each time for it to come up clean...we had messages stating there was bad firmware on one of the drives in the shelf and prompts on whether or not we wanted to enable certain volumes. After the 2nd or 3rd reboot the MSA came up clean with no errors.
With the MSA back up, the drive that had been showing as bad no longer displayed the amber light indicating it was bad. We brought up the primary SQL node that had been active when the problem occured and were able to get into the Array Configuration Utility. A different array, a 2 disk RAID 1, was showing as being rebuilt and the array that contained the disk that had been showing the amber indicator light, was showing an odd circle icon on the 2 logical drives in that array. I have not tracked down the exact meaning of that icon yet but I assume it is related to this next error/status message showing in the ACU: “#771: The current array controller had valid data stored in its battery backed write cache the last time it was reset or was powered up. This indicates that the system may not have been shutdown gracefully. The array controller has automatically written, or attempted to write, this data to the drives. This message will continue to be displayed until the next reset or power cycle of the array controller.”
I am not exactly sure where my problem lies. Do I have a controller issue or is it a combination of a bad drive and the Microsoft cluster software? Either way I would like to replace the drive that initially displayed as bad but it is now showing as healthy. Is there a way in the HP utility to gracefully eject a drive from a RAID 5 array and replace it or do I have to just manually remove it while hot to simulate a failure and replace it then?
Any assistance is greatly appreciated.
Patrick G.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Learn About
News and Events
Support
© Copyright 2025 Hewlett Packard Enterprise Development LP