Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with replacing faulty HSG80

Vladimir Fabecic
Honored Contributor

Problem with replacing faulty HSG80

One HSG80 controler failed (the top one). I tried to replace it using legal procedure. ( I did it four times before, and I know how to do it. )
Everithing went OK untill I had to copy configuration to other controler.
After "SET MULTIBUS_FAILOVER COPY=THIS" other controler reboots (which is normal) but still have "Controllers misconfigured".
Software version is V86-13 (I installed patches before SET MULTIBUS_FAILOVER).
Is there a problem with that version or I just got bad controller from HP?
Another interesting thing: I replaced cache bateries but this controller still complains that BATTERY LIFETIME HAS EXPIRED.
Very strange!
Here are some interesting stuff:

HSG80 Top> show this
Controller:
HSG80 ZG90208460 Software V86S-13, Hardware E08
NODE_ID = 5000-1FE1-0013-9A60
ALLOCATION_CLASS = 1
SCSI_VERSION = SCSI-3
Configured for MULTIBUS_FAILOVER with ZG12300273
Controllers misconfigured -- configuration mismatch, a
SET MULTIBUS_FAILOVER COPY= is required to re-synchronize
controllers
Device Port SCSI address 7
Time: 21-AUG-2006 17:01:33
Command Console LUN is lun 0 (IDENTIFIER = 98)
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0013-9A63
PORT_1_TOPOLOGY = FABRIC (standby)
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0013-9A64
PORT_2_TOPOLOGY = FABRIC (standby)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is GOOD
No unflushed data in cache
CACHE_FLUSH_TIMER = 40 (seconds)
Mirrored Cache:
256 megabyte write cache, version 0022
Cache is GOOD
No unflushed data in cache
Battery:
NOUPS
DANGER: BATTERY LIFETIME HAS EXPIRED, REPLACE BATTERY NOW!
Controllers misconfigured. Type SHOW THIS_CONTROLLER
HSG80 Top>



FMU> show MANUFACTURING_FAILURE_INFORMATION all

Manufacturing Failure Information Entry: 1.
Event reported by: DAEMON Diagnostic
Instance Code: 82052002 Description:
An unrecoverable error was detected during execution of the HOST PORT
Subsystem Test. The system will not be able to communicate with the host.
Reporting Component: 130.(82) Description:
Subsystem Built-In Self Tests (BIST)
Reporting component's event number: 5.(05)
Event Threshold: 2.(02) Classification:
HARD. Failure of a component that affects controller performance or
precludes access to a device connected to the controller is indicated.
Header type: 00 Header flags: 00
Test entity number: 19 Test number Demand/Failure: 08 Command: 01
Error Code: 0308 Return Code: 0008 Address of Error: 180003C8
Expected Error Data: 80000200 Actual Error Data: 00001000
Extra Status(1): 00000000 Extra Status(2): 00000000 Extra Status(3): 00000000
Total Number of Occurrences: 27.
Event Power On Timestamps:
First Occurrence: 0. Years, 3. Days, 1. Hours, 5. Minutes, 11. Seconds
Last Occurrence: 0. Years, 3. Days, 2. Hours, 38. Minutes, 49. Seconds
Software Version In Use On Event Occurrence:
First Occurrence: V86S
Last Occurrence: V86S
***Manufacturing Failure Information Entry 2. unused; translation terminated***
***Manufacturing Failure Information Entry 3. unused; translation terminated***
***Manufacturing Failure Information Entry 4. unused; translation terminated***
***Manufacturing Failure Information Entry 5. unused; translation terminated***
***Manufacturing Failure Information Entry 6. unused; translation terminated***
***Manufacturing Failure Information Entry 7. unused; translation terminated***
***Manufacturing Failure Information Entry 8. unused; translation terminated***
***Manufacturing Failure Information Entry 9. unused; translation terminated***
***Manufacturing Failure Information Entry 10. unused; translation
terminated***
***Manufacturing Failure Information Entry 11. unused; translation
terminated***
***Manufacturing Failure Information Entry 12. unused; translation
terminated***
***Manufacturing Failure Information Entry 13. unused; translation
terminated***
***Manufacturing Failure Information Entry 14. unused; translation
terminated***
***Manufacturing Failure Information Entry 15. unused; translation
terminated***
***Manufacturing Failure Information Entry 16. unused; translation
terminated***
FMU>
In vino veritas, in VMS cluster
5 REPLIES
RIchard Beresford_1
Occasional Advisor

Re: Problem with replacing faulty HSG80

Did you hold down button 6 to clear the configuration of the new contoller that you inserted. Try and do a set nofailover and then clear any cache errors at the controller level and the unit level and then try set multibus failover copy=this again. Command for deleting contoller errors is
CLEAR_ERRORS THIS_CONTROLLER INVALID_CACHE
NODESTROY_UNFLUSHED_DATA

See how that goes.

Cheers

Richard
Vladimir Fabecic
Honored Contributor

Re: Problem with replacing faulty HSG80

Hello Richard
I did what you said.
But there were several problems.
First I got bad spare controller and asked for new one. Second controller acted allmost like first.
Just could not make controller pair to work in MULTIBUS_FAILOVER mode.
Problem was that working controller was also bad. So at the end I replaced both controllers and it worked.
Very strange, two controllers bad, but that was the reason of problem.
Thanks for your help anyway.
In vino veritas, in VMS cluster
Uwe Zessin
Honored Contributor

Re: Problem with replacing faulty HSG80

> Another interesting thing:
> I replaced cache bateries but this controller still complains that
> BATTERY LIFETIME HAS EXPIRED.

Did you use FRUTIL for the replacement?
.
Vladimir Fabecic
Honored Contributor

Re: Problem with replacing faulty HSG80

Hello Uwe
Yes, I did use frutil to replace batteries.
If you are interesting about this case, I can send you capture files and give more details (but later).
The controller complained about batteries after doing SET MULTIBUS_FAILOVER COPY=THIS. Before this command (and while patching) controller did not complain.
I think reason was:
Last Failure Code: 01942088
(reside on the PCI Data or Address Line (PDAL) bus)
In vino veritas, in VMS cluster
Vladimir Fabecic
Honored Contributor

Re: Problem with replacing faulty HSG80

See above.
In vino veritas, in VMS cluster