Disk Enclosures
1747986 Members
4631 Online
108756 Solutions
New Discussion юеВ

HSG80 - problem determination

 
SOLVED
Go to solution
Rafal Niesiobedzki
Frequent Advisor

HSG80 - problem determination

Hello,
There is a problem with Compaq HSG80 controller.
Can somebody tell me, what i have to do with that situation?

########### LOGS: ###################
HSG80_B> show this_controller
%CER--HSG80_B> --16-APR-2007 15:38:39-- Invalid cache -- CLI command set-
reduced. Type SHOW THIS_CONTROLLER. Please see product documentation to-
determine corrective action
HSG80_B> show this_con
Controller:
HSG80 ZG10707640 Software V86F-13, Hardware E12
NODE_ID = 5000-1FE1-0014-1A30
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-2
Configured for MULTIBUS_FAILOVER with ZG13802078
In dual-redundant configuration
Device Port SCSI address 6
Time: 16-APR-2007 15:38:40
Command Console LUN is disabled
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0014-1A31
PORT_1_TOPOLOGY = FABRIC (standby)
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0014-1A32
PORT_2_TOPOLOGY = FABRIC (standby)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is INVALID. Cache containing unflushed data
has been removed from this controller
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
Mirrored Cache:
256 megabyte write cache, version 0022
Cache is INVALID. Cache containing unflushed data
has been removed from this controller
No unflushed data in cache
Battery:
NOUPS
DANGER: BATTERY LIFETIME HAS EXPIRED, REPLACE BATTERY NOW!
This controller has an invalid cache module
Cache battery is near its end of life, it should be replaced SOON. Run frutil-
to replace.
Cache battery charge is low
Mirror cache battery charge is low
Invalid cache -- CLI command set reduced. Type SHOW THIS_CONTROLLER. Please-
see product documentation to determine corrective action
HSG80_B> show this_controller
%CER--HSG80_B> --16-APR-2007 15:38:49-- Invalid cache -- CLI command set-
reduced. Type SHOW THIS_CONTROLLER. Please see product documentation to-
determine corrective action
HSG80_B> show this_con
Controller:
HSG80 ZG10707640 Software V86F-13, Hardware E12
NODE_ID = 5000-1FE1-0014-1A30
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-2
Configured for MULTIBUS_FAILOVER with ZG13802078
In dual-redundant configuration
Device Port SCSI address 6
Time: 16-APR-2007 15:38:50
Command Console LUN is disabled
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0014-1A31
PORT_1_TOPOLOGY = FABRIC (standby)
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0014-1A32
PORT_2_TOPOLOGY = FABRIC (standby)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is INVALID. Cache containing unflushed data
has been removed from this controller
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
Mirrored Cache:
256 megabyte write cache, version 0022
Cache is INVALID. Cache containing unflushed data
has been removed from this controller
No unflushed data in cache
Battery:
NOUPS
DANGER: BATTERY LIFETIME HAS EXPIRED, REPLACE BATTERY NOW!
This controller has an invalid cache module
Cache battery is near its end of life, it should be replaced SOON. Run frutil-
to replace.
Cache battery charge is low
Mirror cache battery charge is low
Invalid cache -- CLI command set reduced. Type SHOW THIS_CONTROLLER. Please-
see product documentation to determine corrective action
################################################
Best Regards
Rafal N.
3 REPLIES 3
rich pattison
Trusted Contributor
Solution

Re: HSG80 - problem determination

OK - looking at the output, you have only shown one controller (HSG_B) - if this is a dual-controller configuration - check the status of the other controller to see if that also has "invalid cache" - this condition is often caused when the HSG is switched off without performing a controlled shutdown (HSG> SHUTDOWN THIS) - but in your case, it is suggesting the controller was removed with unflushed data in cache. Either way, there is some possibility of data corruption on the attached disk unit(s).

To get rid of the fault condition type in:

HSG> clear this invalid_cache destroy_unflushed_data

This will clear the cache (you may need to do this on both controllers). You also need to check each of the units for "lost data" which often accompanies the "invalid cache" condition. For each unit with lost data - type in :

HSG>Clear D1 lost_data (then D2, D3, D4 etc)

Like I said - if write operations were in progress when the error occured - you could have some data corruption. The only way to find out is to run a filesystem or database check on the units (you can't do this from the HSG itself)

A couple of other thing - your firmware (ACS) version is low - it should really be at V8.7 or 8.8, and the batteries are shown as needing replacement.

Rich
Mark...
Honored Contributor

Re: HSG80 - problem determination

Hi,

Agree 100% withabove noter.

It is also worth looking for unwritable_data. In your note it is more likely to be lost_data but if it is not then try this:

RETRY_ERRORS UNWRITEABLE_DATA D1
if this works then your data should be OK. You should still check your data integrity once it has completed. You may have to wait for a little while for this to complete. If it dosn't work then use:
CLEAR_ERRORS D1 UNWRITEABLE_DATA
the problem with this is that you may loose data but if the retry command does not work then this is what you will have to do to make the unit presentable again.
NOTE: make sure if you have unwritable data you use the RETRY command first.

Either way as the above noter explained not all / any of the units will not be presented till:
1: you have first sorted out the cache issue [one or both controllers]
2: you have cleared lost_data or unwritable_data on each unit concerned
3: check your files from the os for data integrity

You should look into replacing your batteries ASAP...

Mark...
if you have nothing useful to say, say nothing...
rich pattison
Trusted Contributor

Re: HSG80 - problem determination

Hi Mark
I think you'll find that If a storageset or disk drive fails before its data has been written to it, the controller reports an unwriteable data error, but in this case we know the cache is invalid, so this can't be the case - unless cache and unit both failed at the same time.

Unwriteable data would still be held in cache and a retry (flush) might be possible if the unit is back online, and the cache is still valid.

Bottom line is LOST_DATA is a cache problem,
UNWRITEABLE_DATA is a unit/storageset problem.

Rich