- Community Home
- >
- Storage
- >
- Entry Storage Systems
- >
- Disk Enclosures
- >
- replacing a bad disk on SC10 array / FC60 (help pl...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 06:24 AM
08-30-2011 06:24 AM
replacing a bad disk on SC10 array / FC60 (help plz)
Hi All,
kindly help me with the following issue:
the fact is that one of our disks (4:0) on the SC10 array went bad and after replacing it we're still having the disk's orange LED "ON" from the front side panel . In addition, in the amdsp output we are getting the following (as you can also see in attachment):
Disk State = REPLACED instead of "OPTIMAL" and for the hot spare activity field we're gettin the following: "2:4 is sparing 4:0" what is the reason for such behavior? and how could it be fixed? in addition, why there is always a sparing activity and why didn't the rebuild start automatically?
Note that we already know that there is a battery critical status as well as controller B are in BAD status and we are waiting for spare parts in order to have them replaced.
Thanks in advance for your replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 07:42 AM
08-30-2011 07:42 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Note that controller B is the owning controller for the LUN in which the affected disks reside.
----------------------------------------- LUN Status Capacity Ctrl RAID Segment Disks --- ----------------- --------- ---- ---- ------- ----- 0 OPTIMAL 67.7 GB A 5 16 1:0 3:0 5:0 1:1 3:1 1 OPTIMAL 136.7 GB B 5 16 2:0 4:0 6:0 2 OPTIMAL 136.7 GB A 5 16 2:1 4:1 6:1
Since controller B is not functioning correctly, I would wait until its working before doing anything with the disks. Its unknown status could be reason the rebuild did not occur. Right now the disk is being spared, so the LUN still has redundancy. Performing tasks like trying to force a rebuild could cause more problems.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 08:00 AM
08-30-2011 08:00 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Thanks a lot for your cooperation.
In addidtion, is there a way to temporarily move all the LUN (containing the Failed disk) to controller A? if so, could you please advise?
And once we received the controller, what are the steps to be performed prior to replace it? I guess and as the LUN (containing the replaced disk) is on controller B which is not in a GOOD state there are necessarily steps required to perform before replacing the controller.
Once more, many thanks for your precious help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 08:38 AM
08-30-2011 08:38 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
OK, so first, I emplore you to make and ensure you have a good back up of your data. These FC60's can be a bit touchy when working on them. Unfortunately somewhat simple problems like this can quickly lead to a DEAD LUN. Ok, so you are warned!
You can transfer ownership of LUNs from one controller to another. Use the amcfg command:
amcfg -M <LUN> -c <CtrlID> <ArrayID>
For example: To set the ownership of LUN 1 to controller A on array with ID "000800A0B809500A":
# amcfg -M 1 -c A 000800A0B809500A
Replacing an FC60 controller is typically straight-forward and should be done "hot"; that is with power on.
- Remove the original controller.
- Check to be sure the replacement controller has the same DIMM configuration as the orignal.
- Install the replacement.
The replacement should sync up with firmware and become active. Check state with amdsp -c or amdsp -a command.
You can then move the LUN ownership back as desired.
Good luck!
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 09:02 AM - edited 08-30-2011 03:14 PM
08-30-2011 09:02 AM - edited 08-30-2011 03:14 PM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
One more thing please, while reading the FC60 Advanced user's guide, i saw the ammgr command.
Does ammgr -c AA <ArrayID> make any change to the ControllerB status in that case? Please give me your opinion as i do not need to type any command that could have negative consequences.
Many Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 04:37 AM
08-31-2011 04:37 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Hi All,
i wanted to be 100% sure before judging that the controller B is defective so I proceeded by shutting down the whole platform and i swapped the physical locations of the 2 controllers A & B.
The results that ive got can be found in the attachment.
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 08:13 AM
08-31-2011 08:13 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Well, I would not have advised what you had done. Shutting down these arrays is really the last thing, especially with known problems. I suppose it is lucky that your LUNs are still accessible.
Basically now you have:
- A bad array controller (though maybe something more as the problem stayed with location B; the problem is that even by swapping one controller with the other, this was done in an offline state, so the array does not see a controller replacement. It could simply be the array controllers are now very confused).
- A bad BCC controller in one of the SC10 enclosures.
- The two bad disks are both on channel 4. This could simply be a coincidence, or a failure due to the BCC controller.
A good thing is that you were able to get that LUN rebuild kicked off. I would wait for that to complete before doing anything else.
The controller is what I would first focus on...try replacing it and see if the status recovers. If that succeeds, contnue with replacing the BCC. If the drives are still marked as bad, then look to replace them as well.
If you cannot replace the hardware, you can try the following to fail and unfail the controller:
- Transfer ownership of LUN 2 to controller A.
- Attempt to "fail" the controller: amutil -C b <arrayID>
- If the command fails, then you will need to actually replace it. If this works, unfail it as follows: amutil -c b <arrayID>
Oh, and the armmgr -AA command simply sets the array controller to an Active / Active status. This would only need to be done if one controller was in a "Passive" state. I dont think "Unknown" would qualify for this.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 09:15 AM - edited 08-31-2011 09:17 AM
08-31-2011 09:15 AM - edited 08-31-2011 09:17 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Hi Bob, and one more time thanks for your precious help
just to tell you that after restarting the whole platform and after the rebuild was completed, the Orange LED on the lately replaced disk was off and the disk's LED passed to green, it is really strange. Could you plz explain what happens? i really want to know or a logical explanation for all this.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 09:42 AM
08-31-2011 09:42 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Well, I would say that either the BCC controller was problematic and/or the problematic array controller was affecting the ability of the LUN to automatically perform the rebuild. By resetting the array, the controller was able to start the rebuild process that was "hung".
As I stated, these arrays can be a bit tricky to work wtih.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2011 09:51 AM - edited 09-01-2011 10:09 AM
09-01-2011 09:51 AM - edited 09-01-2011 10:09 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Thanks Bob once again.
i attached the latest output received on 1st September 2011 and as you will see in the output of amdsp -a that the status for both disks 4:0 and 4:1 the state is "NO RESPONSE" eventhough the LED green is ON on both disks..
Any suggestions?