Compaq MA8000 and Cluster

Paul_338 · ‎12-05-2003

I am new to the cluster and SAN technologies and would like to have some advises here. Currently we have a Compaq Cluster using 2 DL380 G1 with Windows 2000 AS and configured them a cluster. We also created 5 raidsets and several LUNs from a Compaq MA8000 (HSG80 with V8.6) with a multiplbus failover setting for shares and the quorum resources to the cluster. Because we are running out of space and plan to replace all existing small disks with 84 142GB hard drives (6 shelves * 14 per shelf).

My questions are:

1. What is the recommended plan?
2. Do we need to delete all disk configuration (DISKs, Raidsets, units) except connection name settings (reuse)?
3. Because it involves in the cluster. Do we need to move the quorun drive or recreate it?
4. It also involved in the new disk signatures when we create new LUNs with new signatures. What is the procedure to correct the problem if the cluster didn't recognize the old signatures?
3. We have several resources such as share resources using the SAN storage. Do we need to delete them and recreate them again? Can we just stop the cluster service on both nodes and reuse them?
4. After creating the LUN, how these LUNs are recognized by servers? How can I set them up so that they can be used as a disk resources?

Thanks for help

Mike Naime · ‎12-05-2003

0.) In order to support the 146GB disk drives, you MUST upgrade your firmware. Your options are. A.) upgrade to 8.6?-13 http://h18006.www1.hp.com/products/storageworks/softwaredrivers/acs/index.html
b.) Purchase the 8.7 Firmware cards. If you are planning to replacing ALL of the drives, I would recommend that you upgrade to the newer ACS while you are at it.

1.) Make good backups first! Verify that you can read your backups since you will probably be restoring your data from tape.

The "plan" depends on how much time you want to mess with this, and how much downtime you can afford.

If you swap out one LUN at a time, and copy/move data from the old LUNS to the new LUNS as you transition the data from old to new LUNS will require more of your time with less overall downtime.

2.) Since you said you are a newbie, I will assume that you know nothing about configuring and programming the HSG controllers at the CLI level. Yes, In order to replace a LUN, you must Delete the Unit, Delete the Storage Set, and Delete the Disks that made the Storage Set. Then you can remove the spindles/drives without setting off any alarms. This is best done at the CLI prompt of the storage controller, and not by using some GUI sh*t interface that only works 1/2 the time at 1/2 the speed.

del Dxxx (Delete the unit)
del Rxxx (Delete the storage set)
del diskxxx00 (Delete the disk)
del diskxxx00 Repeat until finished.
Now you can pull those disks.

Note: unless you have shutdown all of your servers, you need to wait about 30 seconds inbetween removing the drives if you do not quies the bus first. (The button is on the other side of the cabinet/row. I don't) HP recommends 1 minute in their published maintenance procedures. I just wait a few seconds after I see drive activity resume to give it a chance to catch up. Also, rotate buses/shelves. Do not remove all of the drives from the same shelf all at once. Remove one from each shelf to distribute the SCSI interupts across all the buses before you repeat.

3a.) Only if you really want to replace ALL of the smaller drives and replace your quorum disk too. I would leave the quorum drive alone. You have no need of a larger quorum disk. It's wasted space from what I can tell.

4a.) Your new LUNS will be blank since you must run DILX to initialize the LUN for WINxxxx to use it. Therefor you must write a new disk signature there. Deal with it!

3b.) no comment... not my area.

4b.) same as they when they where setup the first time.
run config -- scan for new drives.
Add {raid/mirror} {Storage set name} {List of drives} This adds a storageset.
init {storageset}
add unit Dxxx {Storageset} dis=all {if using SPP}
set Dxxx id=nnn
show {storageset} - check normalization progress.
set dxxx ena={Connection list}

Example of adding a 3 drive raidset:
add raid r123 disk10800 disk20800 disk30800
init r123
add unit d123 r123 dis=all
set d123 id=123 (This is your LUN id!)
RUN DILX (Prep the LUN) CAUTION!!!!
set d123
ena=server1b1,server1t1,server1b2,server1t2 -- this is Selective Storage Presentation

Notes:
DILX can and will interupt your LUN presentation to your other servers. It is best used as a DOWNTIME procedure. You only need to run it for the minimum time (1 minute) to prep your LUN.

if you write data to the LUN before it has finished normalizing the disks, you can crash your system! a 700GB LUN may take overnight (10+ hours) to finish the normalization process depending on other HSG activity.

Based on the above, here is what I would do if I could get downtime for a weekend.

Before the downtime prepare for it!
Connect a laptop to the CLI port of the storage controller and capture this information into a flat file. (LOG)
SHOW UNIT
SHOW ID
SHOW UNIT FULL
SHOW STORAGE FULL

This shows you where you started from! This will also allow you to make a text file of all the commands that you want/need to run in order to accomplish your task. Do this in advance.

Make 2 backups of everything! Verify that I can restore/read the backups. (A lot of poeple never do this!)

When you are ready. Shutdown your servers!
(Friday night? or Xmas Eve?)

Copy and paste into the CLI window the commands that you created for the following.
Delete all of the units except your quorum disk.
Delete all of the storage sets (and spareset disks) except the quorum disk.
Delete all of the disks(Spindles) that you need to replace.

If you did not buy the new ACS cards, patch the current ACS cards now.

SHUTDOWN OTHER
SHUTDOWN THIS

Remove all of the old disk drives that need to be swapped out. Replace them with the newer drives.

While holding down the reset buttons remove and Replace the ACS firmware cards. (If you bought them)

After the controllers re-boot.

RUN CONFIG (Be patient, it may take 5 minutes to scan if you replaced ALL of the drives.)

copy and past time again to re-build your configuration. The HSG paste buffer is kinda small, so you may need to only paste about 3-4 lines at a time if they are real long.
add raid r1 disk.........
add raid r2 disk........
add .......
init ....
init....
...
add unit......
...
SET UNIT....
(Don't forget your spareset drives!)
Add spare disk1xx00
add spare disk2xx00
add spare disk3xx00
add spare disk4xx00
add spare disk5xx00
add spare disk6xx00

RUN DILX - chose the units you just made.

You can boot your servers now and verify that they can see the LUNS, but you really should not write any data to them until after the normalization process is done in the morning.

Go home and sleep!

In the morning,(Sturday) start restoring tapes! Hopefully you have multiple tape units and are finished by Monday.

Alternative plan that may take less downtime and hopefully no backups from tape, but may take several weeks to accomplish.

Same setup steps of getting the storage configuration info, and making backups.

Shutdown your servers. (Paranoid here, but not absolutely required)

Either Patch your ACS or replace the firmware cards.

Boot your servers and verify that all LUNS are still visible.

OS Copy all data from the first LUN (LUN#1) that you want to replace to other disk locations if possible, or send it to tape. (should already be done)

OS - Dismount LUN #1.

HSG - Delete unit, raid, disks. Remove and Replace disks. Re-create LUN (See #4B above)
Wait for normalization....

OS - Identify new LUN (Are REBOOTS still required for new LUNS on the widoze cluster?)
Copy all data back to the new LUN.

OS - copy data from LUN #2 to LUN #1. (If space permits, do the same for LUN #3.)

HSG - Replace LUN.

(Repeat as necessary)

As you can see. The total downtime may be less for the second procedure, but it requires more re-boots/interuptions than if you bite the bullet and do it all at once.

Question?

Did someone at Compaq/HP tell you that you could use the 146GB disks without upgrading your firmware, or is this a brite idea that your manager had without examining all of the technical details? Someone should have informed you of the ACS requirements of the 146GB disks.

Good Luck. It's a lot of work either way.

If it's a 24x7x365 shop, tell them to buy a new controller with the 146 GB disks already installed. It will save them most of the downtime!

VMS SAN mechanic

Stephen Kebbell · ‎12-07-2003

Something else about using 146GB disks:
The HSG80 has a Maximum Storage-Set size of 1TB. With 146GB disks you can easily exceed this 1TB limit. You should take this into account when planning your storage sets.

Regards,

Stephen

Doug de Werd · ‎12-08-2003

Mike provided some excellent technical details in his previous response, but here is a little more - bottom line is that it's a lot of work either way, and you may just be better off backing up your data and re-doing everything from scratch.

1. What is the recommended plan? Fisrt is to make a solid reliable backup before any changes are made. Second if the same storage enclosure will be used (HSG80) then try the upgrade on 1 of the 5 raidsets first. Using cluster admin you will need to take the first set of cluster shares offline while you work on the first Raidset. You wonâ t be able to delete the â physical disk resource from cluster admin because of the dependencies of the file share. After a new raidset is presented to the cluster you should modify the dependencies of the file shares to the new disk and then delete the old disk resources in cluster admin.

2. Do we need to delete all disk configuration (DISKs, Raidsets, units) except connection name settings (reuse)? Yes, if the same set of HBAs will be used there should not be any new connections, but you should delete only 1 raidset at a time during this upgrade if possible.

3. Because it involves in the cluster. Do we need to move the quorun drive or recreate it?
Once the first of the new drives is created you could use cluster admin to temporarily move the quorum resource to another drive until a dedicated raidset (mirrorset) can be created for the quorum resource or you could simply leave the quorum drive as is until the end.

4. It also involved in the new disk signatures when we create new LUNs with new signatures. What is the procedure to correct the problem if the cluster didn't recognize the old signatures?
There definitely will be new disk signatures; you should make a backup of the disk signatures prior to making any changes. Microsoft provides a â Resource Kitâ utility called â dumpcfgâ which will allows an administrator to capture the disk signatures and set disk signatures in case of problems with clusters or other resources. This utility can be obtained from the W2K Resource Kit or Microsoftâ s website.

3. We have several resources such as share resources using the SAN storage. Do we need to delete them and recreate them again? Can we just stop the cluster service on both nodes and reuse them?
The resource may need to be deleted and then recreated using the new LUNs, cluster admin will not let them delete the â physical disk resourceâ as long as there are dependencies. The new LUNs should be created and presented to the cluster group before attempting to delete the file shares. You may find that you can get around not having to delete them if you can modify the dependencies of the file shares.

4. After creating the LUN, how these LUNs are recognized by servers? How can I set them up so that they can be used as a disk resources?
Create the new LUNs much the same way the originals were. You can use CLI to create the raidsets and present the units to the same connections (HBA connections). After the LUNs are created and presented to the cluster nodes, you can use Cluster Administrator to create new disk resources in the specific cluster groups for the newly presented raidsets. Keep in mind that it might be a good idea to present the newly created disks to one cluster node at a time and have only one see the drives, format the drives and add them to the cluster via cluster admin before presenting the disk to the alternate

I am an HPE employee

Craig Howe · ‎12-26-2003

There have been some excellent points made. Its all about planning and choosing the best approach based on business needs. (Permitted downtime, etc)

Gather Information:
Check Software Firmware on everything, HBAs, servers, HSG80s, fabric switch, etc.

Document settings, disk signatures, controller settings, etc.

Check configuration of current MSA8000 - is it what you want after the upgrade? Eg. Is the system in SCSI-2 mode and would like to move to SCSI-3 mode? Be careful on these changes! Is UPS Settings correct (DATACENTER_WIDE, NOUPS?) These will require controller restarts.

Draw up plan

I'm guessing you won't be able to do this all at one outage because of the size. Maybe target one storageset at a time?

Backup Tape vs' Prepped Robocopy:
You could prep new disks over time for a storageset and hen "robocopy /mir /copyall" (and log) the data over to the new storageset after hours. This will bring over all the data on that day. Then on the day of changeover, full backup (verified) and then only a another robocopy "synch" is required for the changes that have been made for an exact copy. If you are interested in this approach go for the latest .net version of Robocopy and check all the switches. I would recommend the /mir to mirror, the /copyall to copy all ownership, permissions, etc and the log switch. You can "find" in notepad for "retry" or "fail" to see if any data didn't come over and modify ownership/permissions and then rerun the script.
This obviously can take time but is most useful for large amounts of data as the time to backup and then push the data off the tape during the outage can take a lot of time!

There would be some prep work involved in getting the storageset up as per Mike described, there would also be some work to get the new cluster group up with the new drive. But this would mean that after the "synch" there is an exact copy of data on the new storageset with permissions, ownership, etc all in place. The resources can then be bought offline in the cluster and have the new disk dependancy added, with the old one removed. The resource can then be moved into the new group. The new Group would then be renamed after the old group has all resources moved over and has been made redundant.

If you are looking at moving your quorum you can create a new folder on another shared disk as a "temp quorum drive". At the top of Cluster Administrator, at the cluster name, R-Click and go to Properties. There you can change the quorum to move to this new folder on a seperate disks. Obviously once you have the new disk created you can move it back. Alternatively look at technet - there are MS docs in the Knowledgebase that talk about moving the quorum drive or using the -fixqurorum switch.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Compaq MA8000 and Cluster

Compaq MA8000 and Cluster

Re: Compaq MA8000 and Cluster

Re: Compaq MA8000 and Cluster

Re: Compaq MA8000 and Cluster

Re: Compaq MA8000 and Cluster