- Community Home
- >
- Storage
- >
- Entry Storage Systems
- >
- Disk Enclosures
- >
- replacing a bad disk on SC10 array / FC60 (help pl...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 06:24 AM
08-30-2011 06:24 AM
replacing a bad disk on SC10 array / FC60 (help plz)
Hi All,
kindly help me with the following issue:
the fact is that one of our disks (4:0) on the SC10 array went bad and after replacing it we're still having the disk's orange LED "ON" from the front side panel . In addition, in the amdsp output we are getting the following (as you can also see in attachment):
Disk State = REPLACED instead of "OPTIMAL" and for the hot spare activity field we're gettin the following: "2:4 is sparing 4:0" what is the reason for such behavior? and how could it be fixed? in addition, why there is always a sparing activity and why didn't the rebuild start automatically?
Note that we already know that there is a battery critical status as well as controller B are in BAD status and we are waiting for spare parts in order to have them replaced.
Thanks in advance for your replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 07:42 AM
08-30-2011 07:42 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Note that controller B is the owning controller for the LUN in which the affected disks reside.
----------------------------------------- LUN Status Capacity Ctrl RAID Segment Disks --- ----------------- --------- ---- ---- ------- ----- 0 OPTIMAL 67.7 GB A 5 16 1:0 3:0 5:0 1:1 3:1 1 OPTIMAL 136.7 GB B 5 16 2:0 4:0 6:0 2 OPTIMAL 136.7 GB A 5 16 2:1 4:1 6:1
Since controller B is not functioning correctly, I would wait until its working before doing anything with the disks. Its unknown status could be reason the rebuild did not occur. Right now the disk is being spared, so the LUN still has redundancy. Performing tasks like trying to force a rebuild could cause more problems.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 08:00 AM
08-30-2011 08:00 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Thanks a lot for your cooperation.
In addidtion, is there a way to temporarily move all the LUN (containing the Failed disk) to controller A? if so, could you please advise?
And once we received the controller, what are the steps to be performed prior to replace it? I guess and as the LUN (containing the replaced disk) is on controller B which is not in a GOOD state there are necessarily steps required to perform before replacing the controller.
Once more, many thanks for your precious help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 08:38 AM
08-30-2011 08:38 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
OK, so first, I emplore you to make and ensure you have a good back up of your data. These FC60's can be a bit touchy when working on them. Unfortunately somewhat simple problems like this can quickly lead to a DEAD LUN. Ok, so you are warned!
You can transfer ownership of LUNs from one controller to another. Use the amcfg command:
amcfg -M <LUN> -c <CtrlID> <ArrayID>
For example: To set the ownership of LUN 1 to controller A on array with ID "000800A0B809500A":
# amcfg -M 1 -c A 000800A0B809500A
Replacing an FC60 controller is typically straight-forward and should be done "hot"; that is with power on.
- Remove the original controller.
- Check to be sure the replacement controller has the same DIMM configuration as the orignal.
- Install the replacement.
The replacement should sync up with firmware and become active. Check state with amdsp -c or amdsp -a command.
You can then move the LUN ownership back as desired.
Good luck!
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-30-2011 09:02 AM - edited 08-30-2011 03:14 PM
08-30-2011 09:02 AM - edited 08-30-2011 03:14 PM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
One more thing please, while reading the FC60 Advanced user's guide, i saw the ammgr command.
Does ammgr -c AA <ArrayID> make any change to the ControllerB status in that case? Please give me your opinion as i do not need to type any command that could have negative consequences.
Many Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 04:37 AM
08-31-2011 04:37 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Hi All,
i wanted to be 100% sure before judging that the controller B is defective so I proceeded by shutting down the whole platform and i swapped the physical locations of the 2 controllers A & B.
The results that ive got can be found in the attachment.
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 08:13 AM
08-31-2011 08:13 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Well, I would not have advised what you had done. Shutting down these arrays is really the last thing, especially with known problems. I suppose it is lucky that your LUNs are still accessible.
Basically now you have:
- A bad array controller (though maybe something more as the problem stayed with location B; the problem is that even by swapping one controller with the other, this was done in an offline state, so the array does not see a controller replacement. It could simply be the array controllers are now very confused).
- A bad BCC controller in one of the SC10 enclosures.
- The two bad disks are both on channel 4. This could simply be a coincidence, or a failure due to the BCC controller.
A good thing is that you were able to get that LUN rebuild kicked off. I would wait for that to complete before doing anything else.
The controller is what I would first focus on...try replacing it and see if the status recovers. If that succeeds, contnue with replacing the BCC. If the drives are still marked as bad, then look to replace them as well.
If you cannot replace the hardware, you can try the following to fail and unfail the controller:
- Transfer ownership of LUN 2 to controller A.
- Attempt to "fail" the controller: amutil -C b <arrayID>
- If the command fails, then you will need to actually replace it. If this works, unfail it as follows: amutil -c b <arrayID>
Oh, and the armmgr -AA command simply sets the array controller to an Active / Active status. This would only need to be done if one controller was in a "Passive" state. I dont think "Unknown" would qualify for this.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 09:15 AM - edited 08-31-2011 09:17 AM
08-31-2011 09:15 AM - edited 08-31-2011 09:17 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Hi Bob, and one more time thanks for your precious help
just to tell you that after restarting the whole platform and after the rebuild was completed, the Orange LED on the lately replaced disk was off and the disk's LED passed to green, it is really strange. Could you plz explain what happens? i really want to know or a logical explanation for all this.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-31-2011 09:42 AM
08-31-2011 09:42 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Well, I would say that either the BCC controller was problematic and/or the problematic array controller was affecting the ability of the LUN to automatically perform the rebuild. By resetting the array, the controller was able to start the rebuild process that was "hung".
As I stated, these arrays can be a bit tricky to work wtih.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2011 09:51 AM - edited 09-01-2011 10:09 AM
09-01-2011 09:51 AM - edited 09-01-2011 10:09 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Thanks Bob once again.
i attached the latest output received on 1st September 2011 and as you will see in the output of amdsp -a that the status for both disks 4:0 and 4:1 the state is "NO RESPONSE" eventhough the LED green is ON on both disks..
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2011 11:58 AM
09-01-2011 11:58 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Both drives are on the same channel. Are these two disks in the same enclosure with the BCC that is marked as "Unknown"? Are there any other drives in this enclosure on this same bus?. It could be that the "bad" BCC is causing the disks to report as bad/unknown. One of them is being spared, but the other (4:0) is not, yet the LUN it belongs to is still marked as OPTIMAL. Strange. Id work to get the BCC and the array controller working before messing with the disk modules at this time.
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2011 04:51 AM - edited 09-05-2011 06:28 AM
09-04-2011 04:51 AM - edited 09-05-2011 06:28 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Hello Bob,
how can i find if these 2 disks (4:0 and 4:1) are in the same enclosure with the BCC controller that is marked as "unknown"? These 2 disks are located on the SC10 disks array (bay 2) however the BCC controller isn't meant to be located in the controller B enclosure? Note that both controllers A and B are meant to be configured for redundancy (connected to 3x SC10 disks arrays and 2 servers / server 1 : Application server and server 2: Database server).
So, in your opinion, first i should replace the BCC & controller ?
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2011 01:50 PM
09-07-2011 01:50 PM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
> how can i find if these 2 disks (4:0 and 4:1) are in the same enclosure with the BCC controller that is marked as "unknown"?
According to the amdsp output the bad controller is disk enclosure "2", but thumbwheel setting of 1 is what is referenced:
Firmware Revision = HP04 Information for Disk System 2 (USSD00098870), Controller A: SCSI Channel = 3 Thumbwheel Setting = 1 Controller Status = GOOD Vendor ID = HP Product ID = A5294A BCC Serial Number = USSD00098870 Firmware Revision = HP04 Information for Disk System 2 (USSD00098870), Controller -1: SCSI Channel = 0 Thumbwheel Setting = -1 Controller Status = UNKNOWN Vendor ID = NO_VENDOR Product ID = NO_MODEL BCC Serial Number = NO_SER_NUM Firmware Revision = NO_FWREV
From your first amdsp attachment you can see that disk 4:0 is in enclosure 1, slot 1 and disk 4:1 is in enclosure 1, slot 3. Both of these disks are affected by the faulted BCC "B". Replace that to get your disks back (hopefully).
> however the BCC controller isn't meant to be located in the controller B enclosure?
Not sure what you mean, but array controllers and BCC (bus control card) are different. There is an A and B for each.
> So, in your opinion, first i should replace the BCC & controller ?
Replace the BCC to hopefully get all of your drives in good order. Then focus on the array controller as previously discussed.
Thats my suggestion anyways!!
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-18-2011 01:25 AM
12-18-2011 01:25 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
Hello,
still getting issues on this topic, we replaced both BCCs on disk system2 and proceed by replacing the controller with no results. We proceeded by backing up all data and syswiped the array and now when trying to create a LUN on controller B we received the following output:
> amcfg -force -L B:1 -d 2:0,4:0,6:0 -r 5 -s 16 Array1
Error in command execution, "RMT_AM60ERRORSTATUS_MSG"
AM60ERR : ERR_COMMAND_FAILED
AM60ERR QUAL : CREATE_LUN
MODULE_CODE_ID : SUBSYSTEM
COMMAND STATE : A SCSI error occurred
ERROR NUMBER : 2
Sense Key = 0x05: "ILLEGAL REQUEST"
Additional Sense Code = 0x91
Additional Sense Code Qual = 0x03
Decoded SCSI Sense:
Illegal Operation for Current Disk State
amcfg: Error in command execution
isnt syswipe intended to delete all disks data and config?
Any help is much appreciated
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2011 07:18 AM
12-19-2011 07:18 AM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
It looks like there is still a hardware issue somewhere.
Does amdsp -a report any improvements since the controller replacements you referenced?
-Bob
Was this helpful? Like this post by giving me a thumbs up below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2011 12:16 PM
12-19-2011 12:16 PM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
No improvements even after replacing the controller.
These FC60 are too weird, i troubleshooted everything (scsi cables, ports.......)
then by itself after some time it showed no problems and everything went OK.
However nobody knows what does it happen. Too Weird !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2011 12:54 PM
12-19-2011 12:54 PM
Re: replacing a bad disk on SC10 array / FC60 (help plz)
This is probably the line that says it all;
> These FC60 are too weird,
Very true!
-bob
Was this helpful? Like this post by giving me a thumbs up below!