Re: rx2660 with failing disk

Kevin Bingham_1 · ‎11-30-2017

Hi,

I am after some advice. I have an aging rx2660 machine running hp-ux 11.31 that we've used on and off for s/w porting. It was switched of for a period of about 4 yrs because we did not need to build any s/w on for that duration. Recently we restarted it to do a build and all went well with that.

The machine has 2 x 72Gb HDs installed and it seems one is failing, evidenced by the clicking noise it began to make a few days ago. Now, when we bought the machine, it was setup as a bare-bones build machine, and it seems that the failing HD was never added to vg00, or any other vg for that matter, it was (and still is) unsed. I have browsed various conversations (mostly dealing with setups that DID use the spare disk, so not 100% relevant) on here that discuss replacing bad disks etc, and I hope I have gleaned the right approach to follow for replacing that bad disk.

First the evidence:

In the Syslog I get:

{root} /homeroot> more /var/adm/syslog/syslog.log
.
. last few lines
.
Nov 27 07:10:24 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188417 -a
Nov 27 07:10:25 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188418 -a
Nov 27 07:10:29 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider
Nov 27 10:18:02 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188419 -a
Nov 27 10:18:02 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188420 -a
Nov 27 10:18:03 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider
Nov 27 12:32:12 hp3 su: + ta apps-root
Nov 27 12:48:09 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188421 -a
Nov 27 12:48:10 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188422 -a
Nov 27 12:48:10 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider
Nov 27 14:03:40 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188423 -a
Nov 27 14:03:40 hp3 EMS [1956]: ------ EMS Event Notification ------   Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0"     (Threshold:  >= " 3")    
Execute the following command to obtain event details:   /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188424 -a
Nov 27 14:03:41 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider

which leads me to :

{root} /homeroot> /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188418 -a

ARCHIVED MONITOR DATA:

Event Time..........: Mon Nov 27 07:10:24 2017
Severity............: CRITICAL
Monitor.............: disk_em
Event #.............: 13
System..............: hp3

Summary:
     Disk at hardware path 64000/0xfa00/0x0 : I/O request failed.


Description of Error:

     As part of the polling functionality, the monitor periodically requests
     data from the device. The monitor's I/O request failed in this case. The
     monitor was requesting data for Inquiry command.

Probable Cause / Recommended Action:

     The monitor could not finish the requested I/O operation to the device.
     Check /etc/opt/resmon/log/api.log file for an entry logged by
     tl_scsi_dev_io request.

Additional Event Data:
     System IP Address...: 19.150.50.193
     Event Id............: 0x5a1bba6000000002
     Monitor Version.....: B.01.01
     Event Class.........: I/O
     Client Configuration File...........:
     /var/stm/config/tools/monitor/wbem_default_disk_em.clcfg
     Client Configuration File Version...: A.01.00
          Qualification criteria met.
               Number of events..: 1
     Associated OS error log entry id(s):
          None
     Additional System Data:
          System Model Number.............: ia64 hp server rx2660
          OS Version......................: B.11.31
          STM Version.....................: D.06.00
          EMS Version.....................: A.04.20.31.03
     Latest information on this event:
          http://docs.hp.com/hpux/content/hardware/ems/disk_em.htm#13

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
     Physical Device Path...: 64000/0xfa00/0x0
     Device Class...........: Disk
     Inquiry Vendor ID......: HP
     Inquiry Product ID.....: DH0072FAQRD
     Firmware Version.......: HPDC
     Serial Number..........: Confidential Info Erased

Product/Device Identification Information:

     Logger ID.........: disc30; sdisk
     Product Identifier: Disk
     Product Qualifier.: HP      DH0072FAQRD
     SCSI Target ID....: 0x00
     SCSI LUN..........: 0x00

SCSI Command Data Block:

     Command Data Block Contents:
          0x0000: 12 00 00 00   FF 00

     Command Data Block Fields (6-byte fmt):
          Command Operation Code...(0x12)..: INQUIRY
          Logical Unit Number..............: 0
          EVPD Bit.........................: 0
          Page Code........................: 0 (0x00)
          Allocation Length................: 255 (0xFF)

SCSI Sense Data: (not present in log record)

=============================================================================================================================

{root} /homeroot> ioscan -m lun
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health  Description
======================================================================
disk      2  64000/0xfa00/0x0   esdisk  CLAIMED     DEVICE       online  HP      DH0072FAQRD
             0/1/1/0.0x5000c5000bec5495.0x0
                      /dev/disk/disk2   /dev/rdisk/disk2
disk      3  64000/0xfa00/0x1   esdisk  CLAIMED     DEVICE       online  HP      DH0072FAQRD
             0/1/1/0.0x5000c5000bec5609.0x0
                      /dev/disk/disk3      /dev/disk/disk3_p2   /dev/rdisk/disk3     /dev/rdisk/disk3_p2
                      /dev/disk/disk3_p1   /dev/disk/disk3_p3   /dev/rdisk/disk3_p1  /dev/rdisk/disk3_p3
disk      5  64000/0xfa00/0x2   esdisk  CLAIMED     DEVICE       online  TEAC    DVD-ROM DW-224EV
             64000/0x0/0x0.0x0.0x0
                      /dev/disk/disk5   /dev/rdisk/disk5

==============================================================================================================================
{root} /homeroot>  sasmgr get_info -D /dev/sasd1 -q vpd
Vital Product Data Information
------------------------------
Product Description                 : PCI-X Serial Attached SCSI
Part Number                         : AB419-60001
Engineering Date Code               : A-4842
Serial Number                       : SCAN_READ_SN
Misc. Information                   : PW=15W PCI-X 66MHz Core IO
Manufacturing Date                  : 4645
Manufacturing ID                    : N/A
Checksum                            : 0x4c
EFI Version                         : 03.05.01.00
HBA Firmware Version                : 01.23.42.00
Asset Tag                           : NA
{root} /homeroot>

==============================================================================================================================
{root} /homeroot> ioscan -funC disk
Class     I  H/W Path     Driver S/W State   H/W Type     Description
=====================================================================
disk      0  0/1/1/0.0.0.0.0  sdisk   CLAIMED     DEVICE       HP      DH0072FAQRD
                         /dev/dsk/c0t0d0   /dev/rdsk/c0t0d0
disk      1  0/1/1/0.0.0.1.0  sdisk   CLAIMED     DEVICE       HP      DH0072FAQRD
                         /dev/dsk/c0t1d0     /dev/dsk/c0t1d0s2   /dev/rdsk/c0t1d0    /dev/rdsk/c0t1d0s2
                         /dev/dsk/c0t1d0s1   /dev/dsk/c0t1d0s3   /dev/rdsk/c0t1d0s1  /dev/rdsk/c0t1d0s3
disk      4  255/1/0.0.0  sdisk   CLAIMED     DEVICE       TEAC    DVD-ROM DW-224EV
                         /dev/dsk/c1t0d0   /dev/rdsk/c1t0d0
{root} /homeroot>
{root} /homeroot>
{root} /homeroot> sasmgr get_info -D /dev/sasd1 -q target=all

Mon Nov 27 15:30:41 2017

Target SAS Address                                 : 0x5000c5000bec5495
Target Health                                      : ONLINE
IPort SAS Address                                  : 0x500600000001c277
Previous IPort SAS Address                         : 0x0
Target Type                                        : SCSI Device
Target Topology                                    : DIRECT
Protocol Capability of Target                      : SSP
Target Slot                                        : 0x1
Target Enclosure ID                                : 0x1
Target Enclosure Type                              : Direct Attached SGPIO

Target SAS Address                                 : 0x5000c5000bec5609
Target Health                                      : ONLINE
IPort SAS Address                                  : 0x500600000001c276
Previous IPort SAS Address                         : 0x0
Target Type                                        : SCSI Device
Target Topology                                    : DIRECT
Protocol Capability of Target                      : SSP
Target Slot                                        : 0x2
Target Enclosure ID                                : 0x1
Target Enclosure Type                              : Direct Attached SGPIO

*********************************************************************
*****              HBA Specific information                     *****
*********************************************************************
Information for target (0x5000c5000bec5495)
Target State                                       : READY

Information for target (0x5000c5000bec5609)
Target State                                       : READY

{root} /homeroot> sasmgr get_info -D /dev/sasd1 -q raid

Mon Nov 27 15:30:51 2017

---------- PHYSICAL DRIVES ----------
LUN dsf              SAS Address          Enclosure    Bay      Size(MB)

/dev/rdsk/c0t0d0     0x5000c5000bec5495     1            1      70007
/dev/rdsk/c0t1d0     0x5000c5000bec5609     1            2      70007

{root} /homeroot> sasmgr get_info -D /dev/sasd1 -q lun=all
LUN dsf              Hardware Path                  SAS Address
------------------------------------------------------------------
/dev/rdsk/c0t0d0     0/1/1/0.0.0.0.0                0x5000c5000bec5495
/dev/rdsk/c0t1d0     0/1/1/0.0.0.1.0                0x5000c5000bec5609
{root} /homeroot>

So, now on to my proposed solution (here's where your tips/comments are most welcome):

Since the /dev/disk/disk2 seems to be completely unused by the system, and since I have sourced 2 x 146Gb hotswap drives (I ordered 2 on the basis that the other drive might fail soon), I should not have to worry about vgreduce and unmounting disk2 etc
So, can I simply plug the 2 x 146Gb HDs into spare bays, and run the folowing:
ioscan -m lun
then use "smh" to create a new vg01 specifying one disk as "master" and the other as a mirror
then use "drd" to clone the current 72Gb boot disk to the new vg01 disks
finally use drd to switch/swap the boot disk(s) to the vg01 disks and then reboot
when all has been tested, I would then plan to remove the 2 x 72Gb disks somehow (tips on this welcome, my reading has not progressed this far yet)

Looking forward to some detailed responses ;-)

K

Kevin Bingham_1 · ‎12-04-2017

Update:

Here' s my progress so far.

I plugged in the 2 new HDs and booted the machine
I ran "ioscan -funC disk" to check that the disks were recognised correctly
I then used "drd clone -v -x overwrite=true -t /dev/dsk/disk6" to clone the existing 72Gb boot disk to a new 146Gb disk
Used "drd activate" to swap the boot devices
reboot the system and verify that the new disk has been used for booting

So, now to the removal of the failed disk, I need to remove the disk from the system and then remove the physical disk... Question: do I need to re-order the disks in the bays (since they are SCSI) or can I simply remove it and place it with a bay "filler" that I removed from the spaces where the new drives went it.

I am in two minds as to whether to remove the 72Gb drive that is still working, or to keep it as a bootable spare in the system. Also, I am considering whether or not to make the new 146Gb boot disk the primary boot disk, and then regularly re-clone it to the 2nd 146Gb disk for redundancy/backup purposes. Any advice on these choices? What benefit would mirroring the new 146Gb give vs an automated weekly/monthly re-clone exercise?

For reference, this is my current status

{root} /homeroot> ioscan -funC disk
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
disk 0 0/1/1/0.0.0.0.0 sdisk CLAIMED DEVICE HP DH0072FAQRD
/dev/dsk/c0t0d0 /dev/rdsk/c0t0d0
disk 1 0/1/1/0.0.0.1.0 sdisk CLAIMED DEVICE HP DH0072FAQRD
/dev/dsk/c0t1d0 /dev/dsk/c0t1d0s2 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d0s2
/dev/dsk/c0t1d0s1 /dev/dsk/c0t1d0s3 /dev/rdsk/c0t1d0s1 /dev/rdsk/c0t1d0s3
disk 6 0/1/1/0.0.0.2.0 sdisk CLAIMED DEVICE HP DG146BB976
/dev/dsk/c0t2d0 /dev/dsk/c0t2d0s2 /dev/rdsk/c0t2d0 /dev/rdsk/c0t2d0s2
/dev/dsk/c0t2d0s1 /dev/dsk/c0t2d0s3 /dev/rdsk/c0t2d0s1 /dev/rdsk/c0t2d0s3
disk 7 0/1/1/0.0.0.3.0 sdisk CLAIMED DEVICE HP DG146BB976
/dev/dsk/c0t3d0 /dev/rdsk/c0t3d0
disk 4 255/1/0.0.0 sdisk CLAIMED DEVICE TEAC DVD-ROM DW-224EV
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
{root} /homeroot> man drvcfg
No manual entry for drvcfg.
{root} /homeroot> drd status
======= 12/04/17 10:42:56 GMT BEGIN Displaying DRD Clone Image Information (user=root) (jobid=hp3)
* Clone Disk: /dev/disk/disk8
* Clone EFI Partition: AUTO file present, Boot loader present
* Clone Rehost Status: SYSINFO.TXT not present
* Clone Creation Date: 12/01/17 11:17:40 GMT
* Clone Mirror Disk: None
* Mirror EFI Partition: None
* Original Disk: /dev/disk/disk3
* Original EFI Partition: AUTO file present, Boot loader present
* Original Rehost Status: SYSINFO.TXT not present
* Booted Disk: Clone Disk (/dev/disk/disk8)
* Activated Disk: Clone Disk (/dev/disk/disk8)
======= 12/04/17 10:43:06 GMT END Displaying DRD Clone Image Information succeeded. (user=root) (jobid=hp3)

Torsten. · ‎12-04-2017

Since the disk configuration depends on the hardware path (on the slot), a re-order would break your config.

But if you want a mirror of the disk, you should first check if mirroring is available in your OS. If yes, just create a LVM mirror of the disk. If mirroring is not available, consider to create a hardware RAID of the 2 new disks.

Since all data will be lost, consider either a backup/restore or a DRD clone to the hardware mirrored drive.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!

Kevin Bingham_1 · ‎12-05-2017

Ok, so I think I have finished this task. For the record, here's what I have done:

Identified failing disk as "disk2" (bay1 in hot-swap bays on front of machine)
order a pair of 146Gb disk
make a full system backup using
"fbackup -f /mount/nasdrive/directory -i / "
stop the system
remove two blanks and install 2 146Gb disks in the "hotswap" bays 3,4
start the system
run
"ioscan -fNnkC disk" to see that the new disks were recognised
run
"drd clone -p -v -t /dev/disk/disk8" to check that DRD is an option
run
"drd clone -v -x overwrite=true -t /dev/disk/disk8 " to create a ful cloned copy of the single boot disk that was previously active in vg00
run
"drd activate -x reboot=true " to swap to the newly boot disk (disk8)
reboot
verify boot disk
"setboot -v " shows disk8 related results
make a clone of the new boot disk on the 2nd new 146Gb disk
"drd clone -v -x overwrite=true -t /dev/disk/disk9 "
shutdown the system
remove the failed HD from drive bay 1
change the "disk9" disk from bay 4 to bay1 and insert a blanking plate in bay4
boot the system, checking console logs as it boots, no issues
re-run
"ioscan -fNnkC disk" to see that the disk swap was recognised, now showing bay1=disk9, bay2=old_boot_disk, bay3=disk8
run
"drd status" shows disk8 is primary boot device and disk9 is clone
run
"smh" to verify that all disks still installed are in good health
write a script to be schedulled by cron to redo the clone action regularly

The end...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: rx2660 with failing disk

rx2660 with failing disk