- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: rx2660 with failing disk
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2017 07:53 AM - last edited on тАО02-06-2018 10:38 PM by Parvez_Admin
тАО11-30-2017 07:53 AM - last edited on тАО02-06-2018 10:38 PM by Parvez_Admin
Hi,
I am after some advice. I have an aging rx2660 machine running hp-ux 11.31 that we've used on and off for s/w porting. It was switched of for a period of about 4 yrs because we did not need to build any s/w on for that duration. Recently we restarted it to do a build and all went well with that.
The machine has 2 x 72Gb HDs installed and it seems one is failing, evidenced by the clicking noise it began to make a few days ago. Now, when we bought the machine, it was setup as a bare-bones build machine, and it seems that the failing HD was never added to vg00, or any other vg for that matter, it was (and still is) unsed. I have browsed various conversations (mostly dealing with setups that DID use the spare disk, so not 100% relevant) on here that discuss replacing bad disks etc, and I hope I have gleaned the right approach to follow for replacing that bad disk.
First the evidence:
In the Syslog I get:
{root} /homeroot> more /var/adm/syslog/syslog.log . . last few lines . Nov 27 07:10:24 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188417 -a Nov 27 07:10:25 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188418 -a Nov 27 07:10:29 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider Nov 27 10:18:02 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188419 -a Nov 27 10:18:02 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188420 -a Nov 27 10:18:03 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider Nov 27 12:32:12 hp3 su: + ta apps-root Nov 27 12:48:09 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188421 -a Nov 27 12:48:10 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188422 -a Nov 27 12:48:10 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider Nov 27 14:03:40 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188423 -a Nov 27 14:03:40 hp3 EMS [1956]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/64000_0xfa00_0x0" (Threshold: >= " 3")
Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188424 -a Nov 27 14:03:41 hp3 CIM Indication[1522]: Indication (default format):PerceivedSeverity = 7, EventID = 13, ProviderName = DiskIndicationProvider
which leads me to :
{root} /homeroot> /opt/resmon/bin/resdata -R 128188418 -r /storage/events/disks/default/64000_0xfa00_0x0 -n 128188418 -a ARCHIVED MONITOR DATA: Event Time..........: Mon Nov 27 07:10:24 2017 Severity............: CRITICAL Monitor.............: disk_em Event #.............: 13 System..............: hp3 Summary: Disk at hardware path 64000/0xfa00/0x0 : I/O request failed. Description of Error: As part of the polling functionality, the monitor periodically requests data from the device. The monitor's I/O request failed in this case. The monitor was requesting data for Inquiry command. Probable Cause / Recommended Action: The monitor could not finish the requested I/O operation to the device. Check /etc/opt/resmon/log/api.log file for an entry logged by tl_scsi_dev_io request. Additional Event Data: System IP Address...: 19.150.50.193 Event Id............: 0x5a1bba6000000002 Monitor Version.....: B.01.01 Event Class.........: I/O Client Configuration File...........: /var/stm/config/tools/monitor/wbem_default_disk_em.clcfg Client Configuration File Version...: A.01.00 Qualification criteria met. Number of events..: 1 Associated OS error log entry id(s): None Additional System Data: System Model Number.............: ia64 hp server rx2660 OS Version......................: B.11.31 STM Version.....................: D.06.00 EMS Version.....................: A.04.20.31.03 Latest information on this event: http://docs.hp.com/hpux/content/hardware/ems/disk_em.htm#13 v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v Component Data: Physical Device Path...: 64000/0xfa00/0x0 Device Class...........: Disk Inquiry Vendor ID......: HP Inquiry Product ID.....: DH0072FAQRD Firmware Version.......: HPDC Serial Number..........:Confidential Info ErasedProduct/Device Identification Information: Logger ID.........: disc30; sdisk Product Identifier: Disk Product Qualifier.: HP DH0072FAQRD SCSI Target ID....: 0x00 SCSI LUN..........: 0x00 SCSI Command Data Block: Command Data Block Contents: 0x0000: 12 00 00 00 FF 00 Command Data Block Fields (6-byte fmt): Command Operation Code...(0x12)..: INQUIRY Logical Unit Number..............: 0 EVPD Bit.........................: 0 Page Code........................: 0 (0x00) Allocation Length................: 255 (0xFF) SCSI Sense Data: (not present in log record) ============================================================================================================================= {root} /homeroot> ioscan -m lun Class I Lun H/W Path Driver S/W State H/W Type Health Description ====================================================================== disk 2 64000/0xfa00/0x0 esdisk CLAIMED DEVICE online HP DH0072FAQRD 0/1/1/0.0x5000c5000bec5495.0x0 /dev/disk/disk2 /dev/rdisk/disk2 disk 3 64000/0xfa00/0x1 esdisk CLAIMED DEVICE online HP DH0072FAQRD 0/1/1/0.0x5000c5000bec5609.0x0 /dev/disk/disk3 /dev/disk/disk3_p2 /dev/rdisk/disk3 /dev/rdisk/disk3_p2 /dev/disk/disk3_p1 /dev/disk/disk3_p3 /dev/rdisk/disk3_p1 /dev/rdisk/disk3_p3 disk 5 64000/0xfa00/0x2 esdisk CLAIMED DEVICE online TEAC DVD-ROM DW-224EV 64000/0x0/0x0.0x0.0x0 /dev/disk/disk5 /dev/rdisk/disk5 ============================================================================================================================== {root} /homeroot> sasmgr get_info -D /dev/sasd1 -q vpd Vital Product Data Information ------------------------------ Product Description : PCI-X Serial Attached SCSI Part Number : AB419-60001 Engineering Date Code : A-4842 Serial Number : SCAN_READ_SN Misc. Information : PW=15W PCI-X 66MHz Core IO Manufacturing Date : 4645 Manufacturing ID : N/A Checksum : 0x4c EFI Version : 03.05.01.00 HBA Firmware Version : 01.23.42.00 Asset Tag : NA {root} /homeroot> ============================================================================================================================== {root} /homeroot> ioscan -funC disk Class I H/W Path Driver S/W State H/W Type Description ===================================================================== disk 0 0/1/1/0.0.0.0.0 sdisk CLAIMED DEVICE HP DH0072FAQRD /dev/dsk/c0t0d0 /dev/rdsk/c0t0d0 disk 1 0/1/1/0.0.0.1.0 sdisk CLAIMED DEVICE HP DH0072FAQRD /dev/dsk/c0t1d0 /dev/dsk/c0t1d0s2 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d0s2 /dev/dsk/c0t1d0s1 /dev/dsk/c0t1d0s3 /dev/rdsk/c0t1d0s1 /dev/rdsk/c0t1d0s3 disk 4 255/1/0.0.0 sdisk CLAIMED DEVICE TEAC DVD-ROM DW-224EV /dev/dsk/c1t0d0 /dev/rdsk/c1t0d0 {root} /homeroot> {root} /homeroot> {root} /homeroot> sasmgr get_info -D /dev/sasd1 -q target=all Mon Nov 27 15:30:41 2017 Target SAS Address : 0x5000c5000bec5495 Target Health : ONLINE IPort SAS Address : 0x500600000001c277 Previous IPort SAS Address : 0x0 Target Type : SCSI Device Target Topology : DIRECT Protocol Capability of Target : SSP Target Slot : 0x1 Target Enclosure ID : 0x1 Target Enclosure Type : Direct Attached SGPIO Target SAS Address : 0x5000c5000bec5609 Target Health : ONLINE IPort SAS Address : 0x500600000001c276 Previous IPort SAS Address : 0x0 Target Type : SCSI Device Target Topology : DIRECT Protocol Capability of Target : SSP Target Slot : 0x2 Target Enclosure ID : 0x1 Target Enclosure Type : Direct Attached SGPIO ********************************************************************* ***** HBA Specific information ***** ********************************************************************* Information for target (0x5000c5000bec5495) Target State : READY Information for target (0x5000c5000bec5609) Target State : READY {root} /homeroot> sasmgr get_info -D /dev/sasd1 -q raid Mon Nov 27 15:30:51 2017 ---------- PHYSICAL DRIVES ---------- LUN dsf SAS Address Enclosure Bay Size(MB) /dev/rdsk/c0t0d0 0x5000c5000bec5495 1 1 70007 /dev/rdsk/c0t1d0 0x5000c5000bec5609 1 2 70007 {root} /homeroot> sasmgr get_info -D /dev/sasd1 -q lun=all LUN dsf Hardware Path SAS Address ------------------------------------------------------------------ /dev/rdsk/c0t0d0 0/1/1/0.0.0.0.0 0x5000c5000bec5495 /dev/rdsk/c0t1d0 0/1/1/0.0.0.1.0 0x5000c5000bec5609 {root} /homeroot>
So, now on to my proposed solution (here's where your tips/comments are most welcome):
- Since the /dev/disk/disk2 seems to be completely unused by the system, and since I have sourced 2 x 146Gb hotswap drives (I ordered 2 on the basis that the other drive might fail soon), I should not have to worry about vgreduce and unmounting disk2 etc
- So, can I simply plug the 2 x 146Gb HDs into spare bays, and run the folowing:
ioscan -m lun - then use "smh" to create a new vg01 specifying one disk as "master" and the other as a mirror
- then use "drd" to clone the current 72Gb boot disk to the new vg01 disks
- finally use drd to switch/swap the boot disk(s) to the vg01 disks and then reboot
- when all has been tested, I would then plan to remove the 2 x 72Gb disks somehow (tips on this welcome, my reading has not progressed this far yet)
Looking forward to some detailed responses ;-)
K
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2017 02:10 AM - edited тАО12-04-2017 08:01 AM
тАО12-04-2017 02:10 AM - edited тАО12-04-2017 08:01 AM
Re: rx2660 with failing disk
Update:
Here' s my progress so far.
- I plugged in the 2 new HDs and booted the machine
- I ran "ioscan -funC disk" to check that the disks were recognised correctly
- I then used "drd clone -v -x overwrite=true -t /dev/dsk/disk6" to clone the existing 72Gb boot disk to a new 146Gb disk
- Used "drd activate" to swap the boot devices
- reboot the system and verify that the new disk has been used for booting
So, now to the removal of the failed disk, I need to remove the disk from the system and then remove the physical disk... Question: do I need to re-order the disks in the bays (since they are SCSI) or can I simply remove it and place it with a bay "filler" that I removed from the spaces where the new drives went it.
I am in two minds as to whether to remove the 72Gb drive that is still working, or to keep it as a bootable spare in the system. Also, I am considering whether or not to make the new 146Gb boot disk the primary boot disk, and then regularly re-clone it to the 2nd 146Gb disk for redundancy/backup purposes. Any advice on these choices? What benefit would mirroring the new 146Gb give vs an automated weekly/monthly re-clone exercise?
For reference, this is my current status
{root} /homeroot> ioscan -funC disk
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
disk 0 0/1/1/0.0.0.0.0 sdisk CLAIMED DEVICE HP DH0072FAQRD
/dev/dsk/c0t0d0 /dev/rdsk/c0t0d0
disk 1 0/1/1/0.0.0.1.0 sdisk CLAIMED DEVICE HP DH0072FAQRD
/dev/dsk/c0t1d0 /dev/dsk/c0t1d0s2 /dev/rdsk/c0t1d0 /dev/rdsk/c0t1d0s2
/dev/dsk/c0t1d0s1 /dev/dsk/c0t1d0s3 /dev/rdsk/c0t1d0s1 /dev/rdsk/c0t1d0s3
disk 6 0/1/1/0.0.0.2.0 sdisk CLAIMED DEVICE HP DG146BB976
/dev/dsk/c0t2d0 /dev/dsk/c0t2d0s2 /dev/rdsk/c0t2d0 /dev/rdsk/c0t2d0s2
/dev/dsk/c0t2d0s1 /dev/dsk/c0t2d0s3 /dev/rdsk/c0t2d0s1 /dev/rdsk/c0t2d0s3
disk 7 0/1/1/0.0.0.3.0 sdisk CLAIMED DEVICE HP DG146BB976
/dev/dsk/c0t3d0 /dev/rdsk/c0t3d0
disk 4 255/1/0.0.0 sdisk CLAIMED DEVICE TEAC DVD-ROM DW-224EV
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
{root} /homeroot> man drvcfg
No manual entry for drvcfg.
{root} /homeroot> drd status======= 12/04/17 10:42:56 GMT BEGIN Displaying DRD Clone Image Information (user=root) (jobid=hp3)
* Clone Disk: /dev/disk/disk8
* Clone EFI Partition: AUTO file present, Boot loader present
* Clone Rehost Status: SYSINFO.TXT not present
* Clone Creation Date: 12/01/17 11:17:40 GMT
* Clone Mirror Disk: None
* Mirror EFI Partition: None
* Original Disk: /dev/disk/disk3
* Original EFI Partition: AUTO file present, Boot loader present
* Original Rehost Status: SYSINFO.TXT not present
* Booted Disk: Clone Disk (/dev/disk/disk8)
* Activated Disk: Clone Disk (/dev/disk/disk8)======= 12/04/17 10:43:06 GMT END Displaying DRD Clone Image Information succeeded. (user=root) (jobid=hp3)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2017 10:35 AM
тАО12-04-2017 10:35 AM
Re: rx2660 with failing disk
Since the disk configuration depends on the hardware path (on the slot), a re-order would break your config.
But if you want a mirror of the disk, you should first check if mirroring is available in your OS. If yes, just create a LVM mirror of the disk. If mirroring is not available, consider to create a hardware RAID of the 2 new disks.
Since all data will be lost, consider either a backup/restore or a DRD clone to the hardware mirrored drive.
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-05-2017 03:53 AM
тАО12-05-2017 03:53 AM
SolutionOk, so I think I have finished this task. For the record, here's what I have done:
- Identified failing disk as "disk2" (bay1 in hot-swap bays on front of machine)
- order a pair of 146Gb disk
- make a full system backup using
"fbackup -f /mount/nasdrive/directory -i / " - stop the system
- remove two blanks and install 2 146Gb disks in the "hotswap" bays 3,4
- start the system
- run
"ioscan -fNnkC disk" to see that the new disks were recognised - run
"drd clone -p -v -t /dev/disk/disk8" to check that DRD is an option - run
"drd clone -v -x overwrite=true -t /dev/disk/disk8 " to create a ful cloned copy of the single boot disk that was previously active in vg00 - run
"drd activate -x reboot=true " to swap to the newly boot disk (disk8) - reboot
- verify boot disk
"setboot -v " shows disk8 related results - make a clone of the new boot disk on the 2nd new 146Gb disk
"drd clone -v -x overwrite=true -t /dev/disk/disk9 " - shutdown the system
- remove the failed HD from drive bay 1
- change the "disk9" disk from bay 4 to bay1 and insert a blanking plate in bay4
- boot the system, checking console logs as it boots, no issues
- re-run
"ioscan -fNnkC disk" to see that the disk swap was recognised, now showing bay1=disk9, bay2=old_boot_disk, bay3=disk8 - run
"drd status" shows disk8 is primary boot device and disk9 is clone - run
"smh" to verify that all disks still installed are in good health - write a script to be schedulled by cron to redo the clone action regularly
The end...