Operating System - Tru64 Unix
1829965 Members
2506 Online
109998 Solutions
New Discussion

Re: AlphaServer 4100 disk problems

 
SOLVED
Go to solution
Hein van den Heuvel
Honored Contributor

Re: AlphaServer 4100 disk problems

> What I find strange is that when I pull the root disk off the HSZ50 and put it onto the other SCSI controller (NCR), it boots, but if I have the HSZ50 switched on, it crashes at boot time. I get:

That sounds like faulty scsi cabling. Over-or under-terminated, or... and you woudl not be the first one... a bend pin in one of the connectors. Do a careful visual inspection on the connectors.

The KZPSA can be internally terminated. Check!
The HSZ50 is not terminated. The cable is/should be a BN21K-x. On the controller size you need a "Y" to be abel to add a terminator. That "Y" can be a block: H885-AA tri-link, or a cable BN21W. The other side of the Y should have a terminator H879-AA or a short jumper cable to the second Y block on the second controller, where it would then be terminated.

> I found a copy of hszterm in the root folder of the root drive. It wasn't chmod +x so I did that, but I get:

That sounds suspect. A half hack job. But anyway, don't bother. First you must be succesfull in talking to the HSZ on its control port. Get that little cable or anythign vaguely resembling. I'll attach a pinout picture just in case.

Report back one you coerced the HSZ in talking with a termnal/terminal emulator (hyperterm or whatever).

Good luck,
Hein.


Hein van den Heuvel
Honored Contributor

Re: AlphaServer 4100 disk problems

Ah, reply collision. Oh well, so you have the console cable already. Be my picture is nicer :-). Btw... of you have a terminal server like DECserver90 or even the way old 100/200 'concentrator' then they work nice to be able to talk teh the HSZ from home!

Hein.
John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi,

OK, it was moaning about invalid cache so I reset it as per the manual:

CLEAR_ERRORS THIS_CONTROLLER INVALID_CACHE DESTROY_UNFLUSHED_DATA

CLEAR_ERRORS OTHER_CONTROLLER INVALID_CACHE DESTROY_UNFLUSHED_DATA

I moved things around and so I have a problem connecting shelves 5/6 but I figure it should be able to work with just shelf 4, which I'm guessing has / and /usr etc.

bot_hsz50>>show this full
Controller:
HSZ50-AX ZG63200517 Firmware V50Z-2, Hardware A01
Configured for dual-redundancy with ZG70223045
In dual-redundant configuration
SCSI address 6
Time: NOT SET
Host port:
SCSI target(s) (0, 1, 2, 3), Preferred target(s) (2, 3)
TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
32 megabyte write cache, version 3
Cache is GOOD
No unflushed data in cache
CACHE_FLUSH_TIMER = 5 (seconds)
CACHE_UPS
Host Functionality Mode = D
Licensing information:
RAID (RAID Option) is ENABLED, license key is VALID
WBCA (Writeback Cache Option) is ENABLED, license key is VALID
MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
Extended information:
Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
Operation control: 00000004 Security state code: 56869
Configuration backup disabled
Shelf 4 fixed
Cache battery charge is low
bot_hsz50>>show disk
Name Type Port Targ Lun Used by
------------------------------------------------------------------------------

DISK400 disk 4 0 0 R2
DISK410 disk 4 1 0 R1
DISK420 disk 4 2 0 R1
DISK430 disk 4 3 0 R1
DISK440 disk 4 4 0 R2
DISK450 disk 4 5 0 M1
DISK500 disk 5 0 0 R2
DISK510 disk 5 1 0 R2
DISK520 disk 5 2 0 R1
DISK530 disk 5 3 0 R1
DISK540 disk 5 4 0 R2
DISK550 disk 5 5 0 FAILEDSET
DISK600 disk 6 0 0 R2
DISK610 disk 6 1 0 SPARESET
DISK620 disk 6 2 0 R1
DISK630 disk 6 3 0 R1
DISK640 disk 6 4 0 R2
DISK650 disk 6 5 0 FAILEDSET
Shelf 4 fixed
Cache battery charge is low
bot_hsz50>>show mirror
Name Storageset Uses Used by
------------------------------------------------------------------------------

M1 mirrorset DISK450 D100
Shelf 4 fixed
Cache battery charge is low
bot_hsz50>>show unit
LUN Uses
--------------------------------------------------------------

D100 M1
D200 R1
D300 R2
Shelf 4 fixed
Cache battery charge is low
bot_hsz50>>

show dev on the console now shows dkc100/200/300 but shen I try to boot from the default dkc100 I get:

failed to open dkc100.1.0.4.1

Retrying, type ^C to abort...

Any ideas? Sounds like I'm getting there :)

Regards,

John
John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi Hein,

Thanks a lot for the reply.

On the host side I have a BN21W which connects to the host and has 2 short cables. On one end of it is a terminator and on the other end is a BN21K which goes to the HSZ50.

The HSZ50 has 2 H885-AAs (one for each controller). The top connector is connected to the BN21K. The bottom of the top H885-AA is connected to a BN21L-0B, which also connects to the top of the 2nd HSZ50. The very bottom has an unmarked terminator.

This sounds like it matches what you say, so I don't think it's a problem. I'll double check all connectors and then reboot the server to see what happens.

Regards,

John
John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi,

Cables all look good. I think the problem might relate to this:

top_hsz50>>show units full
LUN Uses
--------------------------------------------------------------

D100 M1
Switches:
RUN NOWRITE_PROTECT READ_CACHE
WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = THIS_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: 4109470 blocks
D200 R1
Switches:
RUN NOWRITE_PROTECT READ_CACHE
WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
UNKNOWN
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
Misconfigured:
disk DISK520 at PTL 5 2 0 --
No device installed, please see user guide
Misconfigured:
disk DISK620 at PTL 6 2 0 --
No device installed, please see user guide
Misconfigured:
disk DISK530 at PTL 5 3 0 --
No device installed, please see user guide
Misconfigured:
disk DISK630 at PTL 6 3 0 --
No device installed, please see user guide
Size: NOT YET KNOWN
D300 R2
Switches:
RUN NOWRITE_PROTECT READ_CACHE
WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
UNKNOWN
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
Misconfigured:
disk DISK500 at PTL 5 0 0 --
No device installed, please see user guide
Misconfigured:
disk DISK600 at PTL 6 0 0 --
No device installed, please see user guide
Misconfigured:
disk DISK540 at PTL 5 4 0 --
No device installed, please see user guide
Misconfigured:
disk DISK640 at PTL 6 4 0 --
No device installed, please see user guide
Misconfigured:
disk DISK510 at PTL 5 1 0 --
No device installed, please see user guide
Size: NOT YET KNOWN
Cache battery charge is low

D200/300 relate to the other trays, I think, so I'm not surprised they don't work, but D100 should work. There's presumably some way I can bring it back online, but I can't find it in the manual just yet...

Regards,

John
John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi,

I've juggled things around so all the shelves are installed:

top_hsz50>>show units full
LUN Uses
--------------------------------------------------------------

D100 M1
Switches:
RUN NOWRITE_PROTECT READ_CACHE
WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = THIS_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: 4109470 blocks
D200 R1
Switches:
RUN NOWRITE_PROTECT READ_CACHE
WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: 50255880 blocks
D300 R2
Switches:
RUN NOWRITE_PROTECT READ_CACHE
WRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
INOPERATIVE
Unit has lost data
PREFERRED_PATH = OTHER_CONTROLLER
WRITE_PROTECT - DATA SAFETY
Size: 50255880 blocks
Shelf 5 has a bad power supply or fan
Cache battery charge is low

top_hsz50>>show storagesets
Name Storageset Uses Used by
------------------------------------------------------------------------------

M1 mirrorset DISK450 D100

R1 raidset DISK410 D200
DISK420
DISK430
DISK520
DISK530
DISK620
DISK630

R2 raidset DISK400 D300
DISK440
DISK500
DISK510
DISK540
DISK600
DISK640

SPARESET spareset DISK610

FAILEDSET failedset DISK550
DISK650

Sounds like I have a couple of bad disks, but I'm still not sure how to clear the inactive flag. Sorry for the flurry of messages!

John
John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi,

I've reset the rest of the errors and all seems to be working now :))

Thanks very much to everyone that helped, it's really been appreciated.

Regards,

John
Jeff Wolfe_1
Frequent Advisor

Re: AlphaServer 4100 disk problems

John,

The disks may not be bad. They can have just be put into a failedset state by you moving around the disks. Just delete the failedset.

delete failedset

You can then init the disk, create a unit and see what the OS sees.

John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi,

Thanks for that - looks like you are correct. The damned thing has taken a bigger disk from the spare pool to fill the mirror with as well!

Is there any way short of pulling out the disk which has taken its place (610, it was 650) to easily change a mirror pair?

Regards,

John
Jeff Wolfe_1
Frequent Advisor

Re: AlphaServer 4100 disk problems

real easy.

All operations are done with the set command.
set ...

First set the mirror set so it won't autoreplace a drive:

set nopolicy

Then remove the unwanted disk:

set remove=

The removed disk will go into a failedset, you can delete the failedset later.

Then insert the new member:

set replace=

I'm doing this from memory, I'm pretty sure the verbs are remove and replace. You can enter the command :

set ?

To get some command specific help

Mohamed  K Ahmed
Trusted Contributor

Re: AlphaServer 4100 disk problems

John,

When disks fail during execution , and when you see that they are ok (hardware wise) but stull unoperative, that is becasie the units have lost data, you have to issue the following comand and then it will start reconstructing

clear unit lost_data

always remember this command when you get a failed disk, replace it with a new one, but still the unit status is UNKNOWN or INOPERATIVE

Mohamed
Uwe Zessin
Honored Contributor

Re: AlphaServer 4100 disk problems

You can also use:
> REDUCE diskname

The good thing is that the disk is not dropped into the failedset. The bad thing (in this specific situation) is that it also decrements the membership count, but that can be fixed:

> SET mirrorset NOPOLICY
> SET mirrorset MEMBERSHIP=2

and then you proceed with the REPLACE operation like Jeff has already written.

Just remember to apply a replacement policy later on or no spare disk will kick in if you loose a disk from that mirrorset.

> SET mirrorset POLICY=BEST_PERFORMANCE
.
John Appleby_1
Advisor

Re: AlphaServer 4100 disk problems

Hi,

Yeah that seems to have worked a treat. I've reset the time and the errors and the CLI, reset the performance and it seems to be running just nice.

The cache batteries appear to be dead, but that shouldn't be too hard to fix, they just seem like 2x 2V 5Ah jobs that can be bought from electrical places.

Thanks a lot to the help everyone has given me; I need to brush up on my Tru64 knowledge but that's life.

Regards,

John
Jeff Wolfe_1
Frequent Advisor

Re: AlphaServer 4100 disk problems

With regard to the batteries, if the HSZ thinks they are dead, it will disable the writeback cache feature. If you have the HSZ connected to a UPS power source, then you can set the controller into UPS power mode and it will use the writeback regardless of the battery state. I believe the command is :

set this_controller cache_ups


Good Luck!
Uwe Zessin
Honored Contributor

Re: AlphaServer 4100 disk problems

> SET THIS_CONTROLLER CACHE_POLICY=?
> RESTART OTHER_CONTROLLER
> RESTART THIS_CONTROLLER

* CACHE_POLICY=A makes inoperative all RAIDsets and mirrorsets until the battery has been recharged.

* CACHE_POLICY=B allows write-through access to all RAIDsets and mirrorsets for 10 hours, during which time the battery should be recharging. If the batteries do not recharge within 10 hours, the RAIDsets and mirrorsets become inoperative.

-----
The 'cache_ups' is used on more modern controllers.
.