MSA Storage
1752794 Members
5580 Online
108789 Solutions
New Discussion юеВ

Re: Autospare vs spareset (hot spare)

 
SOLVED
Go to solution
CA1044123
Occasional Contributor

Autospare vs spareset (hot spare)

Please tell me,
1. whether it is suggested to configure both, autospare and spareset in HSxxx ?
2. If not, is there any supporting document on this ?
3. If Yes, which will take precedence ?
3 REPLIES 3
Uwe Zessin
Honored Contributor
Solution

Re: Autospare vs spareset (hot spare)

Ok. Here is some background information how this works:

You create at least one storage set with a raid-level that contains redundancy (e.g. a mirrorset). This set must have a replacement policy defined (POLICY= BEST_PERFORMANCE or BEST_FIT).

You assign disks that are large enough to the spareset (ADD SPARESET DISK61500).

Now, when a disk from the mirrorset fails the controller will automatically select a disk from the spareset depending on the replacement policy of the mirrorset. AUTOSPARE, is not involved in this process.

If you check, you will see that the bad disk (slot) is assigned to the FAILEDSET. 'SHOW FAILEDSET' will show you that the AUTOSPARE switch is a characteristic of the FAILEDSET.

It is clear that you like to replace the bad disk. You pull it and put a good disk in the bay.

This is the moment when the AUTOSPARE switch plays a role. The controller is notified about the plug-in and checks the AUTOSPARE switch. If it is set to 'NOAUTOSPARE' the controller does nothing and it is up to you to decide what to do.

If the switch is set to 'AUTOSPARE' (SET FAILEDSET AUTOSPARE) the controller will first go to the disk and check if there is any metadata on it (if it has been previously used in a HSx storage system).

If there _is_ metadata on it the controller, again, does nothing. You could have put in a wrong disk and the controller does not overwrite any of your data. When the controller cannot find any metadata it will automatically remove the slot from the FAILEDSET and put the disk into the SPARESET.

I don't like that approach, so I have AUTOSPARE disabled. If there were another bad disk this would have caused another replacement. However, I want that the old spare disk becomes a spare disk again and that must be done manually. For mirrorsets that is rather easy:

- delete the slot from the FAILEDSET

- put the replaced disk into the mirrorset, too, so that you get a 3-member mirrorset

- wait until the mirrorset is normalized

- reduce the old 'spare disk' from the mirrorset

- put that disk back into the SPARESET

Now you configuration does match your documentation again and any old planning as been restored, too.

Also, SPARE replacement might lead to a case were all members of a storageset are located on the same physical bus - a failure of that bus would take out the entire set. That would have to be un-made manually, too.

2) I beleive that is somewehere in the ACS documentation, although I can't point you to a specific manual right away - I would have to look myself first.

3) It is my understanding that there is no precedence. They are independent mechanisms that work during different stages of a disk failure.
.
RTB
New Member

Re: Autospare vs spareset (hot spare)

Uwe,

You covered the replacement of a failed mirrorset drive, could you do the same for a failed raidset drive?

As you mentioned, the idea is that the disk layout should be restored to the pre-failure configuration...

Thanks!
Uwe Zessin
Honored Contributor

Re: Autospare vs spareset (hot spare)

Hello RTB,
unfortunately a mirrorset cannot be used in a raidset. You have two choices:

1) rip the spare disk out and manually replace it:
> SET r1 NOPOLICY
> SET r1 REMOVE=diskS
> SET r1 REPLACE=diskM
> SET r1 POLICY=?
> DELETE FAILEDSET diskS
> ADD SPARESET diskS

Of course that means another raid-reconstruct, but you can do it at least online.

2) stop access to the unit and delete all objects top-down, move the disk and re-add the objects. Remember not to use INITIALIZE on the raidset! This will keep the data and the 128-bit LUN ID (important for Tru64 Unix!) intact.

This means that your data is unavailable for a period of time.
.