- Integrated Systems
- About Us
- Integrated Systems
- About Us
08-26-2011 04:31 PM
Alphastation XP1000 with Mylex inconsistent state after accidental powerdown of disk cabinets
I have a strange Mylex/SRM disk status error/inconsistency on Tru64 V4.0f with an Alphastation XP1000 and Mylex RAID controller.
By accident we powered down the wrong disk cabinet. Since then the impacted system does not reboot any more (AdvFS reports I/O errors when mounting).
It appears that the SRM console and the Mylex controller have a different view on the state of the disks.
The SRM "show device" command says: "Failed" for all of the domains.
When I initially ran the ra200rcu RAID utility, the state of the 6 disks were FLD-FLD-FLD-OPT-FLD-FLD.
After I attempted to run the "Tools -> Make Optimal" command as decribed in the rcu_ug.pdf User guide the system replied with: "There are no drives connected to the adapter OR the state of the connected or the state of the connected drives does not support this option."
Still the status of the disks in the Mylex configuration utility changed...
The new ra200rcu state of the disks is now: RDY-UNF-RDY-RDY-RDY-RDY.
The first 4 disks are in RAID-5; the last 2 are JBOD.
The disk with the UNF status has a "yellow alarm led".
When I poweron, I see along list of messages like:
waiting for dra.0.0.12.0 to poll...
Then I start the ARC console:
Via the AlphaBIOS advanced menu we can run the Mylex controller configuration program. I need to use the -o flag to force the utility to continu to scan. Otherwise the utility stops at the first failed device.
Both the SRM console and the raid controller take a long time to synchronize with the RAID controller due to the dra-polling problem.
Run utility: a: ra200rcu -o
It appears that the SRM console and the ra200rcu utility do not agree and do not report the same status of the disks.
The net result is that the system does not boot any more, because the system does not find its disks to boot from and the application and data file system cannot be properly mounted.
How could we reset or synchronize the status of the disks between the Mylex controller and the SRM console? I have already tried to powerdown the CPU and the disk cabinets and to poweron in the reverse sequence!
But this did not change the situation.
Any ideas are welcome. I can provide more information when required. Thanks.
09-02-2011 01:46 PM
Additional screen shots
I have taken a photo before and after the Mylex ra200rcu Tools -> Make Optimal operation (2 first photos).
Disk state before Make Optimal: FLD-FLD-FLD-OPT-FLD-FLD
Disk state after Make Optimal: RDY-UNF-RDY-RDY-RDY-RDY
After the Make Optimum, I get the following message from the ra200rcu utility: "There are no drives connected to the adaptor OR the state of the connected drives does not support this option."
The 3rd photo shows the SRM SHOW DEVICE output.
As to the SRM console all disks are in Failed state, while according to the raid controller 5 disks are RDY, and one is UNF.
What is the UNF state?
The first 4 disks are in RAID-5; so if there are 3 good disks, I can recover the 4th one.
The 2 remaining JBOD disks are RDY on the RAID controller, but are Failed on the SRM console.
My conclusion is that the SRM console and the Mylex controller do not agree on status.
I have already tried to power cycle the system and the disks (in the right sequence!), but this does not help.
Is there any SRM or ra200rcu command to synchronize state between the Mylex and the SRM console?
Any further ideas? Thanks!
09-08-2011 03:21 PM
Could a restore of ra200rcu config backup solve the problem?
Are there any SRM commands to reconnect the disks? Or to first remove the disks from the console?
If I would power-on the system with the disks once in power-off? And then again powered-on?
Could I restore the Mylex RAID controller config backup with the ra200rcu diskette utility?
Don't I have a risk of losing or erasing data on the disks in this case?