HPE EVA Storage

HSG80 on x86_64 host Device not ready error

 
Osvaldo Giusti
Occasional Advisor

HSG80 on x86_64 host Device not ready error

Hi there,
I'm trying to setup a HSG80 Array Controller on a Linux x86_64 host. The host is running CentOS 5.2 with kernel 2.6.18-92.el5.

The host is equipped with a PCI-X QLogic ISP2312-based Fibre Channel card. The host and controller ports are connected to the same Fibre channel Switch but the controller port 2 is disabled.

I can see the disk device on the host but when I try to access it I get "Device not ready". For the moment I'm trying to use a simple unit with a single disk.
Any help truly appreciated.

More information about the problem follows:

Output from CLI SHOW THIS_CONTROLLER
====================================
Controller:
HSG80 ZG14802442 Software V87P-1, Hardware E16
NODE_ID = 5000-1FE1-0014-FF40
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-3
Not configured for dual-redundancy
Device Port SCSI address 7
Time: NOT SET
Command Console LUN is lun 0 (IDENTIFIER = 1)
Host Connection Table is NOT locked
Smart Error Eject Disabled
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0014-FF41
PORT_1_TOPOLOGY = FABRIC (fabric up)
Address = 011100
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0014-FF42
PORT_2_TOPOLOGY = OFFLINE (offline)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is FAILED
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
Battery:
NOUPS
DANGER: BATTERY BAD, REPLACE BATTERY NOW!
Previous controller operation terminated by depression of (//) RESET button.
Shelf 1 has a bad power supply or fan
Shelf 2 has a bad power supply or fan
Shelf 5 has a bad power supply or fan
Shelf 6 has a bad power supply or fan
Cache module failed diagnostic testing of memory pages
Write-back caching is disabled
Other controller not responding - RESET signal asserted
Cache battery declared failed due to not becoming fully charged in 10 hours

Output from CLI SHOW UNITS
==========================
LUN Uses Used by
------------------------------------------------------------------------------

D1 DISK30000
Previous controller operation terminated by depression of (//) RESET button.
...

OUTPUT OF CLI SHOW D1
=====================
LUN Uses Used by
------------------------------------------------------------------------------

D1 DISK30000
LUN ID: 6000-1FE1-0014-FF40-0009-1480-2442-009A
NOIDENTIFIER
Switches:
RUN NOWRITE_PROTECT NOREAD_CACHE
NOREADAHEAD_CACHE NOWRITEBACK_CACHE
MAX_READ_CACHED_TRANSFER_SIZE = 32
MAX_WRITE_CACHED_TRANSFER_SIZE = 32
Access:
!NEWCON20
State:
AVAILABLE
Host Based Logging NOT Specified
Size: 286679457 blocks
Geometry (C/H/S): ( 49840 / 8 / 719 )
Previous controller operation terminated by depression of (//) RESET button.
...

Output of CLI SHOW DISK30000
============================
Name Type Port Targ Lun Used by
------------------------------------------------------------------------------

DISK30000 disk 3 0 0 D1
COMPAQ BD14685A26 HPB7
Switches:
NOTRANSPORTABLE
TRANSFER_RATE_REQUESTED = 20MHZ (synchronous 20.00 MHZ negotiated)
Size: 286679457 blocks
Previous controller operation terminated by depression of (//) RESET button.
...

Output of CLI SHOW !NEWCON20
============================
Connection Unit
Name Operating system Controller Port Address Status Offset

!NEWCON20 SUN THIS 1 011500 OL this 0
HOST_ID=2000-00E0-8B18-CAE5 ADAPTER_ID=2100-00E0-8B18-CAE5
Previous controller operation terminated by depression of (//) RESET button.
...

Output of dmesg
===============
QLogic Fibre Channel HBA Driver
PCI: Enabling device 0000:03:01.0 (0150 -> 0153)
ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 28 (level, low) -> IRQ 185
qla2xxx 0000:03:01.0: Found an ISP2312, irq 185, iobase 0xffffc2000001c000
qla2xxx 0000:03:01.0: Configuring PCI space...
qla2xxx 0000:03:01.0: Configure NVRAM parameters...
qla2xxx 0000:03:01.0: Verifying loaded RISC code...
qla2xxx 0000:03:01.0: Allocated (412 KB) for firmware dump...
scsi6 : qla2xxx
qla2xxx 0000:03:01.0:
QLogic Fibre Channel HBA Driver: 8.02.00-k5-rhel5.2-04
QLogic QLA2340 - 133MHz PCI-X to 2Gb FC, Single Channel
ISP2312: PCI-X (133 MHz) @ 0000:03:01.0 hdma+, host#=6, fw=3.03.26 IPX
qla2xxx 0000:03:01.0: LOOP UP detected (1 Gbps).
Vendor: DEC Model: HSG80CCL Rev: V87P
Type: RAID ANSI SCSI revision: 02
scsi 6:0:0:0: Attached scsi generic sg2 type 12
Vendor: DEC Model: HSG80 Rev: V87P
Type: Direct-Access ANSI SCSI revision: 02
sdc : READ CAPACITY failed.
sdc : status=1, message=00, host=0, driver=08
sd: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, initializing command required

sdc: test WP failed, assume Write Enabled
sdc: asking for cache data failed
sdc: assuming drive cache: write through
sdc : READ CAPACITY failed.
sdc : status=1, message=00, host=0, driver=08
sd: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, initializing command required

sdc: test WP failed, assume Write Enabled
sdc: asking for cache data failed
sdc: assuming drive cache: write through
sdc:<6>sd 6:0:0:1: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, initializing command required

end_request: I/O error, dev sdc, sector 0
printk: 168 messages suppressed.
Buffer I/O error on device sdc, logical block 0
sd 6:0:0:1: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, initializing command required

end_request: I/O error, dev sdc, sector 0
...
Dev sdc: unable to read RDB block 0
unable to read partition table
sd 6:0:0:1: Attached scsi disk sdc
sd 6:0:0:1: Attached scsi generic sg3 type 0
...
sd 6:0:0:1: Device not ready: <6>: Current: sense key: Not Ready
Add. Sense: Logical unit not ready, initializing command required

end_request: I/O error, dev sdc, sector 2097024
...
end_request: I/O error, dev sdc, sector 2097144
...
end_request: I/O error, dev sdc, sector 2097088
...
end_request: I/O error, dev sdc, sector 2097136
...

Output of lsscsi
================
[0:0:0:0] disk COMPAQ BD14685A26 HPB7 /dev/sda
[0:0:1:0] disk SEAGATE ST3146807LC 0007 /dev/sdb
[6:0:0:0] storage DEC HSG80CCL V87P -
[6:0:0:1] disk DEC HSG80 V87P /dev/sdc
12 REPLIES 12
Uwe Zessin
Honored Contributor

Re: HSG80 on x86_64 host Device not ready error

It is possible that the system has so many errors ("Cache is FAILED") that it attempts to protect the data on the disks by not allowing read/write I/Os.

You can get rid of old error messages that no longer apply with the command "clear_error cli_error".
.
Osvaldo Giusti
Occasional Advisor

Re: HSG80 on x86_64 host Device not ready error

clear_errors cli command get rid of the error messages, but I still have the "Device not ready" error and cannot use the unit.

I don't think that the controller is preventing host access because of the cache errors since I disabled all the cache features.

When caching was enabled the unit state was
INOPERATIVE, but when I disabled the caching features by setting the unit NOWRITE_PROTECT, NOREAD_CACHE, NOREADAHEAD_CACHE and NOWRITEBACK_CACHE, its state become AVAILABLE.
Rob Leadbeater
Honored Contributor

Re: HSG80 on x86_64 host Device not ready error

Hi,

> since I disabled all the cache features

If I recally correctly, you *must* have good batteries in a HSG80 in order for it to function correctly.

Can you post the output of "show this" now that you've cleared the errors, and "show units full".

Cheers,

Rob
Rob Leadbeater
Honored Contributor

Re: HSG80 on x86_64 host Device not ready error

Another couple of thoughts...

I'm not sure 64bit Linux is supported on a HS80.

Your version of ACS looks to be rather old. If you can't get hold of 8.8, then you should probably look to patch your current version.

Cheers,

Rob
Osvaldo Giusti
Occasional Advisor

Re: HSG80 on x86_64 host Device not ready error

Output of show this
===================
Controller:
HSG80 ZG14802442 Software V87P-1, Hardware E16
NODE_ID = 5000-1FE1-0014-FF40
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-3
Not configured for dual-redundancy
Device Port SCSI address 7
Time: NOT SET
Command Console LUN is lun 0 (IDENTIFIER = 1)
Host Connection Table is NOT locked
Smart Error Eject Disabled
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0014-FF41
PORT_1_TOPOLOGY = FABRIC (fabric up)
Address = 011100
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0014-FF42
PORT_2_TOPOLOGY = OFFLINE (offline)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is FAILED
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
Battery:
NOUPS
DANGER: BATTERY BAD, REPLACE BATTERY NOW!

Output of show units full
=========================
LUN Uses Used by
------------------------------------------------------------------------------

D1 DISK30000
LUN ID: 6000-1FE1-0014-FF40-0009-1480-2442-009A
NOIDENTIFIER
Switches:
RUN NOWRITE_PROTECT NOREAD_CACHE
NOREADAHEAD_CACHE NOWRITEBACK_CACHE
MAX_READ_CACHED_TRANSFER_SIZE = 32
MAX_WRITE_CACHED_TRANSFER_SIZE = 32
Access:
!NEWCON20
State:
AVAILABLE
Host Based Logging NOT Specified
Size: 286679457 blocks
Geometry (C/H/S): ( 49840 / 8 / 719 )
Rob Leadbeater
Honored Contributor

Re: HSG80 on x86_64 host Device not ready error

Hi,

If you check the ACS CLI manual here:

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c00596728/c00596728.pdf

You'll see that the unit status should be ONLINE, not AVAILABLE.

Change your batteries.

Cheers,

Rob
Osvaldo Giusti
Occasional Advisor

Re: HSG80 on x86_64 host Device not ready error

The whole system is running under ups, therefore I don't need cache batteries. So I used the command:
SET THIS_CONTROLLER UPS=DATACENTER_WIDE
to get rid of the batteries error.
I still have the device "available" rather than "online" as desirable. Is there any chance to get it work under this settings?
Current situation follows:

Output of show this
===================
Controller:
HSG80 ZG14802442 Software V87P-1, Hardware E16
NODE_ID = 5000-1FE1-0014-FF40
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-3
Not configured for dual-redundancy
Device Port SCSI address 7
Time: NOT SET
Command Console LUN is lun 0 (IDENTIFIER = 1)
Host Connection Table is NOT locked
Smart Error Eject Disabled
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0014-FF41
PORT_1_TOPOLOGY = FABRIC (fabric up)
Address = 011100
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0014-FF42
PORT_2_TOPOLOGY = OFFLINE (offline)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is FAILED
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
Battery:
UPS = DATACENTER_WIDE

Output of show units full
=========================
LUN Uses Used by
------------------------------------------------------------------------------

D1 DISK30000
LUN ID: 6000-1FE1-0014-FF40-0009-1480-2442-009A
IDENTIFIER = 0
Switches:
RUN NOWRITE_PROTECT NOREAD_CACHE
NOREADAHEAD_CACHE NOWRITEBACK_CACHE
MAX_READ_CACHED_TRANSFER_SIZE = 32
MAX_WRITE_CACHED_TRANSFER_SIZE = 32
Access:
!NEWCON20
State:
AVAILABLE
Host Based Logging NOT Specified
Size: 286679457 blocks
Geometry (C/H/S): ( 49840 / 8 / 719 )
Rob Leadbeater
Honored Contributor

Re: HSG80 on x86_64 host Device not ready error

Hi,

Even with UPS=DATACENTER_WIDE I'm pretty certain that you need good batteries.

You might be able to fool the controllers into thinking that you've replaced the batteries by running frutil.

run frutil

Follow the prompts as if you're going to swap the batteries, but don't actually do it.

Cheers,

Rob
Osvaldo Giusti
Occasional Advisor

Re: HSG80 on x86_64 host Device not ready error

I used frutil to fool the controller, but still got the Device available and not online; maybe because of the "Cache is FAILED" error. I tried to get rid of it by using:

CLEAR THIS INVALID_CACHE DESTROY_UNFLUSHED_DATA

but still got the following situation.
Thank you all so much for your help.

Output of show this
===================
Controller:
HSG80 ZG14802442 Software V87P-1, Hardware E16
NODE_ID = 5000-1FE1-0014-FF40
ALLOCATION_CLASS = 0
SCSI_VERSION = SCSI-3
Configured for MULTIBUS_FAILOVER with ZG94213655
In dual-redundant configuration
Device Port SCSI address 7
Time: 13-APR-2009 10:13:45
Command Console LUN is lun 0 (IDENTIFIER = 1)
Host Connection Table is NOT locked
Smart Error Eject Disabled
Host PORT_1:
Reported PORT_ID = 5000-1FE1-0014-FF43
PORT_1_TOPOLOGY = FABRIC (fabric up)
Address = 011100
Host PORT_2:
Reported PORT_ID = 5000-1FE1-0014-FF44
PORT_2_TOPOLOGY = OFFLINE (offline)
NOREMOTE_COPY
Cache:
256 megabyte write cache, version 0022
Cache is FAILED
Unknown unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
Battery:
UPS = DATACENTER_WIDE

Output of show units full
=========================
LUN Uses Used by
------------------------------------------------------------------------------

D1 DISK30000
LUN ID: 6000-1FE1-0014-FF40-0009-1480-2442-009A
IDENTIFIER = 0
Switches:
RUN NOWRITE_PROTECT NOREAD_CACHE
NOREADAHEAD_CACHE NOWRITEBACK_CACHE
MAX_READ_CACHED_TRANSFER_SIZE = 32
MAX_WRITE_CACHED_TRANSFER_SIZE = 32
Access:
!NEWCON20
State:
AVAILABLE
NOPREFERRED_PATH
Host Based Logging NOT Specified
Size: 286679457 blocks
Geometry (C/H/S): ( 49840 / 8 / 719 )