System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamic Root Disk (DRD) and NO_HW devices

SOLVED
Go to solution
Mikko Niskanen_1
Frequent Advisor

Dynamic Root Disk (DRD) and NO_HW devices

Aye, to whom it may concern:

After installing DRD, ran into problems with removed volumes on a EVA4k SAN box i.e. the ones that show as NO_HW on an ioscan output. Seems that DRD does not like those, either on 11.23 or 11.31.

Facts:
- two rx4640's, host A on HP-UX 11.23 Dec'2008 patch bundle, host B with 11.31 Sep'2009 patch bundle
- both hosts access shared EVA4k SAN disk through standalone FC fabric (2 x 2/16 switches).
- both have 2 x internal 72GB disks, mirrored with MirrorDisk
- latest SD-UX and DRD as of 1 Dec 2009 (today) on both hosts
- created a 72GB vdisk 1. LUN=4 to act as clone destination on EVA, present to host A, ioscan, insf, then unpresent, change LUN to 5, re-present, re-ioscan, re-insf
- created a 72GB vdisk 2 LUN=4, present to host B, ioscan, insf, unpresent, change LUN to 5, re-present, re-ioscan, re-insf

Goal:
- have a DRD image of both host A and B system disks on EVA4k SAN box to act as last resort backup

Situation:
- both hosts see the disk with a new LUN, but both also have a disk device with NO_HW status.
- what will DRD say?

Results:
- on host A (11.23 box), running "drd clone -p -vv -t /dev/dsk/" says nothing at all, it seems just to freeze, i.e. can't tell it's waiting for NO_HW baddies to go away! /var/opt/drd/drd.log contains nothing except "ERROR: Exiting due to signal SIGINT." after breaking with CTRL-C.
- on host B (11.31.box), DRD has the decency to crash so that you at least know it failed royally:
root@policy /data1/sys/HP/HP-UX-11.31/Downloaded # drd clone -p -v -t /dev/disk/

======= 11/30/09 17:03:57 EET BEGIN Clone System Image Preview (user=root)
(jobid=)

* Reading Current System Information
* Selecting System Image To Clone
* Converting legacy DSF "/dev/dsk/c2t1d0" to "/dev/disk/disk15"
* Converting legacy DSF "/dev/dsk/c3t0d0" to "/dev/disk/disk16"
* Selecting Target Disk
(0) 0x60000000c9b82200 _ZN5OSAPI10stackTraceEv + 0x2e0 at lib/OSAPI_basic.cc:870 [/opt/drd/lib/libcommon.1]
(1) 0x60000000c9bec0e0 _ZN10SMDprivate17FatalEventHandler12handleSignalEi + 0x120 at logmgmt/FatalEventHandler.cc:59 [/opt/drd/lib/libcommon.1]
(2) 0x60000000c9bebd90 _ZN10SMDprivate13SignalHandler10dispatcherEi + 0x190 at logmgmt/SignalHandler.cc:55 [/opt/drd/lib/libcommon.1]
(3) 0xe00000012081ef80 ---- Signal 11 (SIGSEGV) delivered ----
(4) 0x60000000c96805d0 _ZN10SMDprivate13DiskInventory17areDSFsEquivalentERKSsS2_ + 0x7f0 at ../common/include/Debug.h:317 [/opt/drd/lib/libsyscore.1]
(5) 0x60000000c969ff90 _ZN10SMDprivate12SystemConfig18removeVdisksExceptEP11disk_configSs + 0x180 [/opt/drd/lib/libsyscore.1]
(6) 0x60000000c9693bd0 _ZN10SMDprivate12SystemConfig17copyRootDiskGroupERKSsS2_i + 0x14f0 [/opt/drd/lib/libsyscore.1]
(7) 0x60000000c968f7e0 _ZN10SMDprivate9VolumeMgr19createNewSaveConfigERKSsPNS_12SystemConfigERSs + 0x710 at system/VolumeMgr.cc:432 [/opt/drd/lib/libsyscore.1]
(8) 0x60000000c988d2d0 _ZN10SMDprivate17TargetSystemImage12updateConfigERSsS1_ + 0xb60 at ../common/include/Debug.h:317 [/opt/drd/lib/libsyscore.1]
(9) 0x60000000c98aa810 _ZN10SMDprivate14SystemImageMgr15initTargetImageEPNS_12SystemConfigESsSsPN3SMD15DRDRegistryDataE + 0x480 at ../common/include/Debug.h:317 [/opt/drd/lib/libsyscore.1]
(10) 0x60000000c98a7ff0 _ZN10SMDprivate14SystemImageMgr17chooseTargetDisksEPNS_12SystemConfigEPN3SMD15DRDRegistryDataE + 0x1040 at ../common/include/Debug.h:317 [/opt/drd/lib/libsyscore.1]
(11) 0x60000000c99301c0 _ZN10SMDprivate10SysCoreAPI17chooseTargetDisksEPN3SMD15DRDRegistryDataE + 0x260 at ../common/include/Debug.h:317 [/opt/drd/lib/libsyscore.1]
(12) 0x60000000c941df80 _ZN10SMDprivate19CloneDRDSysUserTask22doChooseTgtDisksActionEv + 0x220 at /build/sandboxes/090801_2100/src/drd/../common/include/Debug.h:317 [/opt/drd/lib/libdrd.1]
(13) 0x60000000c9413490 _ZN10SMDprivate19CloneDRDSysUserTask12stateMachineEv + 0x3a10 at usertasks/CloneDRDSysUserTask_stateMachine.cc:434 [/opt/drd/lib/libdrd.1]
(14) 0x60000000c9d836d0 _ZN10SMDprivate8UserTask7runTaskEv + 0x200 at usertasks/UserTask.cc:294 [/opt/drd/lib/libcommon.1]
(15) 0x60000000c9c32bc0 _ZN10SMDprivate4Task3runEv + 0x2d0 [/opt/drd/lib/libcommon.1]
(16) 0x60000000c9c2e040 _ZN3SMD13SMDJobManager3runESs + 0x4d0 at jobtask/SMDJobManager.cc:145 [/opt/drd/lib/libcommon.1]
(17) 0x60000000c9a49c40 _ZN3SMD3CLI3runEPSt4listIPNS_8TaskDataESaIS3_EE + 0xba0 [/opt/drd/lib/libcommon.1]
(18) 0x000000000400c270 _Z7smdmainiPPc + 0x860 at main.cc:252 [/opt/drd/bin/drd]
(19) 0x60000000c9ca71f0 _ZN3SMD7SMDinit7smdinitEPFiiPPcEiS2_ + 0x7f0 at main/SMDinit.cc:291 [/opt/drd/lib/libcommon.1]
(20) 0x000000000400b710 main + 0x30 [/opt/drd/bin/drd]
(21) 0x60000000c0030c90 main_opd_entry + 0x50 [/usr/lib/hpux32/dld.so]

Now, I know someone likes to know why on earth one should do something stupid like have NO_HW devices, I'd tell "it just happened, okay"....

You can kludge around by creating vdisks with NO_HW LUN numbers (i.e. LUN=4 here, used 1GB size) and present those to hosts A and B, thereby going around the fact you need to reboot boxes in order to get rid of NO_HW devices.
4 REPLIES
kevin_m
Valued Contributor
Solution

Re: Dynamic Root Disk (DRD) and NO_HW devices

If you want to remove entries for disks that are no longer present (and won't be again in the future), run 'rmsf -H ' for each NO_HW disk. It will remove the device special files as well so hence the warning about verifying the disk is really gone.
- Kevin
Judy Wathen
Advisor

Re: Dynamic Root Disk (DRD) and NO_HW devices

Hello Mikko â
Thank you for your interest in DRD. Iâ m sorry you are seeing this problem.
The stack trace defect occurs in the attempt to find the missing DSF (presumably for the unpresented LUN). We have fixed this bug in the next release (B.11.31.A.3.5), which will be delivered in the March, 2010 media release. It will probably also be delivered on the DRD web site shortly before that time.
With the fix, drd completes successfully when a device file for something other than the clone target or mirror is missing.
The problem report tracking this problem is
QXCR1000950254: Disks missing device files trigger stack trace in drd clone
I have just made it customer visible, but it may take a little time for that change to take effect.

We first encountered the problem on an HPVM where virtual disks had been added but the device files had not yet been created. In that case, running insf bypassed the problem. In your case of real SAN LUNs, I think your bypass you suggest is the best approach until the new release of DRD is available.
We will re-run this test on 11.23 to see if we can re-produce the hang you experienced. It is possible that the hardware inventory was hung on the missing LUN.
Thanks,
Judy

Mikko Niskanen_1
Frequent Advisor

Re: Dynamic Root Disk (DRD) and NO_HW devices

Aye,

Kevin, I could swear that I tried to rmsf those files without success.

Unfortunately couldn't re-check on 11.31 as already rebooted that, but at least on 11.23 it _did_ remove NO_HW disks. Now, the 1GB EVA vdisk "_drd_kludge" can go south...

Thanks also to Judy, good to hear it's already under investigation!

I think this pretty much sums this thread.
Mikko Niskanen_1
Frequent Advisor

Re: Dynamic Root Disk (DRD) and NO_HW devices

Closing thread.