1827871 Members
2245 Online
109969 Solutions
New Discussion

Re: DS20E Boot Problem

 
Rob Buxton
Honored Contributor

DS20E Boot Problem

I have a DS20E running VMS 7.2-2, using an RA200 Array accessing an external Disk Array.
VMS etc. built and then the Server was shut down for a month or two before relocation to our DR Site.
Along the way the external array was dropped.

On attempting to boot it finds and loads the VMS Base Image. Then reports the following for the DRA Array;
Drives=0, optimal=4294967295, degraded=1, failed=0
Then I get:
SMP-I-SECMSG CPU #1 Missing P01>>>START
SMP-I-CPUTRN CPU #1 has joined active set

Then it hangs. There is no further disk activity.
A >>>SHOW CONFIG does list the DRA Devices that I had configured prior to the move.

Tried things like MIN Startup and Setting Expected Votes to 0, but this hanngs seems to me to be much earlier in the Boot Process.

It's been quite a while since I did any serious VMS stuff. And I cannot easily provide additional info as the Server is about 20K away.
Any pointers as to where to look would be appreciated.
Any suggestions as to the next areas for trouble-shooting welcomed.

Alas this Server is not on maintenance as we're not expected to have VMS here for much longer. So getting an engineer in is not our first choice option!
8 REPLIES 8
John Gillings
Honored Contributor

Re: DS20E Boot Problem

Hi Rob,

I'd suggest you boot from a CD and get the $$$ DCL interface. Try mounting the disks and run ANALYZE/DISK

From your description, I'd guess that the drop has done something nasty to the RAID array or some of the drives. Got a good backup? ;*)
A crucible of informative mistakes
Rob Buxton
Honored Contributor

Re: DS20E Boot Problem

John,
Thanks, I'll give that a go.
Backups? what are they?
It's pretty much a DR Server with the OS on it.
I do have some backups of the changed material.
John H. Reinhardt
Frequent Advisor

Re: DS20E Boot Problem

On attempting to boot it finds and loads the VMS Base Image. Then reports the following for the DRA Array;
Drives=0, optimal=4294967295, degraded=1, failed=0


This is bad. "Drives=0" means the RAID controller has no configured raidsets or jbod disks. "Optimal=4294967295" means you are foobar - Optimal should indicate how many raidsets the controller has that are in perfect shape. i.e. no failed drives. "Degraded=1" means the controller found one set with a failed drive. "failed=0" means that the controller did not find any raidsets that are unusable due to excessive disk failures.

You may have noticed that this seems inconsistant. The controller is highly confused.

What kind of drive array is it? If it's a Storageworks (or even another brand) array with hot-swapable drives then you could try having whoever is on-site re-seat each drive just in case the fall knocked them loose. For safety you should probably have the on-site personnel shut it all down and have them do one drive at a time to make sure they do not get them put back in the wrong position. That would scramble any raidset and you would have to re-create them and restore from backup. Re-seating any circuit modules in the storage array would also be a good idea.

If neither this nor John Gillings' suggestion work then hopefully it was the shipping company that dropped the array and maybe you can get the insurance to pay for an engineer?
Volker Halle
Honored Contributor

Re: DS20E Boot Problem

Rob,

although - as others have already indicated - this very much looks like a system disk access problem on your RAID controller, you may still collect more information from the OpenVMS boot by setting additional boot flags (DBG_INIT and USER_MSGS)

>>> B -FL ,30000

You need to capture those messages with a laptop or PC.

Volker.
Rob Buxton
Honored Contributor

Re: DS20E Boot Problem

John H. R.
Yes, that Drives=0 did bother me. Show config does list the DRA Devices I had set up. So something is definately confused.

First off I'll try and source a new Shelf (it is Storageworks). We also have some spare disks.

The consensus does appear to support the fact it is the array. The bit that confused me was that it did seem to find the Bootstrap and at least recognised it had VMS.

Volker,
Thanks, I'll remember that tip.
Volker Halle
Honored Contributor

Re: DS20E Boot Problem

Rob,

during boot, DRDRIVER is questioning the MYLEX DAC 960 (SWXCR) controller with a SCSI inquiry command.

It should first report:

%DRA, Firmware Vx.x

Then the line about the status of the drives. Note that 4294967295 is actually -1, optimal is calculated as - - .

I would first try to find the configuration floppy and check the controller configuration from console level.

Volker.
Rob Buxton
Honored Contributor

Re: DS20E Boot Problem

Thanks, I'll try that first. I hope to get back out to the site today armed with boot CD's, Array Controller Floppy disks etc.

I also need to try and determine exactly what kind of shelf it is so it can be replaced. I'm down to the Storageworks 4354 or 4254.
Andy Bustamante
Honored Contributor

Re: DS20E Boot Problem


With this controller, it's important to power up the disks and wait for them to spin up before you power on the Alpha.

Quick fix, power everything down, reseat the physical disks. Power up the storage and wait. Power on the Alpha.

Sometimes the controller may decide that degraded stays degraded. The config utlity for the controller is on the VMS firmware CD.

Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net