- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- StoreVirtual Storage
- >
- Re: Cache-1 Corrupt?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2011 12:52 PM
тАО08-30-2011 12:52 PM
Cache-1 Corrupt?
Had an "interesting" experience with our P4000 cluster this weekend.
We had an extended power outage which meant that one of our P4000 sites lost power so the node died.
The remaining node + FOM did their job and kept quorum, however, when power came back and the lost node booted, it showed in the CMC with a red X and a "Cache-1 Corrupt" status.
Running a diagnostic on that node showed the cache module passed the diagnostic, but the status of "Corrupt" failed.
When I got someone from L2 on the phone they explained that forcing the node online could corrupt the cluster because the corrupt cache content might be flushed to the storage, so their plan was to ship out a replacement cache module, which they did, and they then did a "node exchange" in the CMC and the volumes restriped overnight.
I'd like to get a bit better understanding of what actually happened, because whilst the cluster carried on running, which is great, I'm a little concerned/puzzled that corruption in the cache could render an entire node unusable.
Simply put, if replacing the cache module removed the potential for corrupt cache data to be flushed to disk why isn't there some option to just discard the contents of the existing cache module?
Thanks in advance.
Paul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2011 01:29 AM
тАО08-31-2011 01:29 AM
Re: Cache-1 Corrupt?
Is the RAM and the battery on the same module or would it be possible to just disconnect the battery to the controller and thus clear the cache?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2011 02:13 AM
тАО08-31-2011 02:13 AM
Re: Cache-1 Corrupt?
> just disconnect the battery to the controller and thus clear the cache?
Oh, please ... !!! Don't try to be 'creative' !!
That is another way to cause data corruption.
If you are concerend about the integrity of your data:
You MUST NOT and NEVER simply ignore/skip lost cache data !!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2011 02:31 AM
тАО08-31-2011 02:31 AM
Re: Cache-1 Corrupt?
Uwe, I agree with you not to ignore it, but it's a fair question - if Lefthand support say to replace the cache module how am I better off than simply disconnecting the battery and flushing the existing cache module?
(I wouldn't just do that, but I'm asking about the theory).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2011 08:06 PM
тАО08-31-2011 08:06 PM
Re: Cache-1 Corrupt?
I am very surprised tha that support shipped you a battery and cache. This is a known issue and can be fixed by a patch or by going in to a putty session. Cache corrupt is alot differnt error then Cache faulty, in order to determine if indeed the cache were bad and needed to be replaced a hpadu.zip file would need to be anyalyed. If you have this error just log a call with support and mention the patch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2011 01:13 AM
тАО09-01-2011 01:13 AM
Re: Cache-1 Corrupt?
When they checked, mine apparently has that patch installed already.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2011 02:47 AM
тАО09-01-2011 02:47 AM
Re: Cache-1 Corrupt?
Out of interest, how long was your node powered down for Paul?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2011 02:56 AM
тАО09-01-2011 02:56 AM
Re: Cache-1 Corrupt?
Only around 10 minutes or so.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2011 07:35 AM
тАО09-01-2011 07:35 AM
Re: Cache-1 Corrupt?
- You will need to install patch 10096 this patch Is rolled up in patch 20020
Resolves a situation where, under certain conditions, on reboots after an upgrade to 9.0, SAN/iQ 9.0.00 will appear to detect and report an erroneous controller cache discard event long after the original condition has been addressed corrected and no longer exists
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2011 09:55 AM
тАО09-01-2011 09:55 AM
Re: Cache-1 Corrupt?
Apparently not though, from the L2 response:
"Upon reviewing the logs this unit already has PS02 installed, so 10096 will be unable to resolve the issue."