- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Re: AdvFS Domain paniced / Filed Disk?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2009 03:05 AM
тАО01-27-2009 03:05 AM
OS: V5.1B rev.2650 alpha
Hello all,
I have a problem with one of the old systems in house (although I'm new to Tru64).
The problem is; system boots up properly. All the mountpoints accessible and "df" outputs regularly. But after a couple of hours, I cannot access any of the maountpoints
under core_dmn and when I try to 'cd' to one of the mountpoints under core_dmn it gives me the following error message.
# ksh: /u01/cet: permission denied
WHEN I REBOOT, I AHVE ACCESS TO CORE_DMN BUT IT BECOMES INACTIVE AFTER AN HOUR OR SO...
I suspect that it's this failed disk dsk10c but I am not quite sure. How can I verify this?
Can it also be a controller problem? How can I identify and find the source of problem properly?
PS: Attached you may see logs since boot...
Thanks in advance...
CET
----------------------------------------------
Below is a short overview of the system messages...
Jan 26 16:36:56 ds20-1 vmunix: AdvFS Domain Panic; Domain core_dmn Id 0x452b5421.000d53e6
Jan 26 16:36:56 ds20-1 vmunix: An AdvFS domain panic has occurred due to either a metadata write error or an internal inconsistency. This domain is being rendered inaccessible.
Jan 26 16:36:56 ds20-1 vmunix: Domain#Fileset: core_dmn#cet
Jan 26 16:36:56 ds20-1 vmunix: Mounted on: /u01/cet
Jan 26 16:36:56 ds20-1 vmunix: Volume: /dev/disk/dsk10c
Jan 26 16:36:56 ds20-1 vmunix: I/O error appears to be due to a hardware problem.
Jan 26 16:36:56 ds20-1 vmunix: Check the binary error log for details.
-----------------------------------------------
#df -h
Filesystem Size Used Available Capacity Mounted on
root_domain#root 977M 185M 786M 20% /
/proc 0 0 0 100% /proc
usr_domain#usr 19G 3927M 15G 21% /usr
var_domain#var 10G 869M 9866M 9% /var
usr_domain#tmp 19G 25K 15G 1% /cluster/members/member0/tmp
core_dmn#rsn 204G 30G 62G 33% /u01/rsn
core_dmn#cet 204G 46G 62G 43% /u01/cet
core_dmn#cettest 204G 34G 62G 36% /u01/cettest
core_dmn#cettest2 204G 30G 62G 33% /u01/cettest2
core_dmn#u02 204G 2220M 62G 4% /u02
core_dmn#u03 204G 16K 62G 1% /u03
core_dmn#u04 204G 2032K 62G 1% /u04
core_dmn#u05 204G 9660K 62G 1% /u05
eva#backup 343G 64G 279G 19% /backup
eva#test 343G 16K 279G 1% /u01/rsn1
-----------------------------------------------
# hwmgr view hierarchy
24: scsi_bus scsi2
60: disk bus-2-targ-0-lun-0 dsk0
94: disk bus-2-targ-1-lun-0 dsk9
61: disk bus-2-targ-2-lun-0 dsk1
62: disk bus-2-targ-3-lun-0 dsk2
97: disk bus-2-targ-4-lun-0 dsk12
64: disk bus-2-targ-5-lun-0 dsk4
95: disk bus-2-targ-6-lun-0 dsk10
-----------------------------------------------
# /etc/fdmns/core_dmn> ls
dsk10c dsk12c dsk1c dsk2c dsk4c dsk9c
-----------------------------------------------
### Last 50 error messages I received are as below;
# cat /var/adm/messages | tail -50
Jan 26 15:48:20 ds20-1 vmunix: scsi2: HTH intr. on bus 2, SBCL = 0x26
Jan 26 15:48:23 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 15:49:31 ds20-1 vmunix: scsi2: HTH intr. on bus 2, SBCL = 0x26
Jan 26 15:49:34 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 16:30:31 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 16:31:48 ds20-1 vmunix: scsi2: HTH intr. on bus 2, SBCL = 0x2e
Jan 26 16:32:23 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 16:33:39 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 16:35:36 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 16:36:49 ds20-1 vmunix: scsi2: SCSI Bus was reset
Jan 26 16:36:56 ds20-1 vmunix: AdvFS I/O error:
Jan 26 16:36:56 ds20-1 vmunix: Volume: /dev/disk/dsk10c
Jan 26 16:36:56 ds20-1 vmunix: Tag: 0xffffffd8.0000
Jan 26 16:36:56 ds20-1 vmunix: Page: 4142
Jan 26 16:36:56 ds20-1 vmunix: Block: 28520144
Jan 26 16:36:56 ds20-1 vmunix: Block count: 32
Jan 26 16:36:56 ds20-1 vmunix: Type of operation: Write
Jan 26 16:36:56 ds20-1 vmunix: Error: 5 (see /usr/include/errno.h)
Jan 26 16:36:56 ds20-1 vmunix: EEI: 0x6200 (Advfs cannot retry this)
Jan 26 16:36:56 ds20-1 vmunix: AdvFS initiated retries: 0
Jan 26 16:36:56 ds20-1 vmunix: Total AdvFS retries on this volume: 0
Jan 26 16:36:56 ds20-1 vmunix: I/O error appears to be due to a hardware problem.
Jan 26 16:36:56 ds20-1 vmunix: Check the binary error log for details.
Jan 26 16:36:56 ds20-1 vmunix:
Jan 26 16:36:56 ds20-1 vmunix: bs_osf_complete: metadata write failed
Jan 26 16:36:56 ds20-1 vmunix: AdvFS Domain Panic; Domain core_dmn Id 0x452b5421.000d53e6
Jan 26 16:36:56 ds20-1 vmunix: An AdvFS domain panic has occurred due to either a metadata write error or an internal inconsistency. This domain is being rendered inaccessible.
Jan 26 16:36:56 ds20-1 vmunix: Please refer to guidelines in AdvFS Guide to File System Administration regarding what steps to take to recover this domain.
Jan 26 16:36:56 ds20-1 vmunix: Domain panic appears to be due to a hardware problem
Jan 26 16:36:56 ds20-1 vmunix: Check the binary error log for more information.
Jan 26 16:36:56 ds20-1 vmunix: AdvFS I/O error:
Jan 26 16:36:56 ds20-1 vmunix: Domain#Fileset: core_dmn#cet
Jan 26 16:36:56 ds20-1 vmunix: Mounted on: /u01/cet
Jan 26 16:36:56 ds20-1 vmunix: Volume: /dev/disk/dsk10c
Jan 26 16:36:56 ds20-1 vmunix: Tag: 0x0000ea0f.8006
Jan 26 16:36:56 ds20-1 vmunix: Page: 3
Jan 26 16:36:56 ds20-1 vmunix: Block: 70050624
Jan 26 16:36:56 ds20-1 vmunix: Block count: 16
Jan 26 16:36:56 ds20-1 vmunix: Type of operation: Read
Jan 26 16:36:56 ds20-1 vmunix: Error: 5 (see /usr/include/errno.h)
Jan 26 16:36:56 ds20-1 vmunix: EEI: 0x6200 (Advfs cannot retry this)
Jan 26 16:36:56 ds20-1 vmunix: AdvFS initiated retries: 0
Jan 26 16:36:56 ds20-1 vmunix: Total AdvFS retries on this volume: 0
Jan 26 16:36:56 ds20-1 vmunix: I/O error appears to be due to a hardware problem.
Jan 26 16:36:56 ds20-1 vmunix: Check the binary error log for details.
Jan 26 16:36:56 ds20-1 vmunix: To obtain the name of the file on which
Jan 26 16:36:56 ds20-1 vmunix: the error occurred, type the command:
Jan 26 16:36:56 ds20-1 vmunix: /sbin/advfs/tag2name /u01/cet/.tags/59919
Jan 26 16:37:57 ds20-1 vmunix: scsi2: HTH intr. on bus 2, SBCL = 0x2e
Jan 26 16:37:59 ds20-1 vmunix: scsi2: SCSI Bus was reset
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2009 04:36 AM
тАО01-27-2009 04:36 AM
Re: AdvFS Domain paniced / Filed Disk?
My suggestion: first track down and fix the hardware problem(s). Then run fixfdmn on core_dmn to ensure there is no metadata corruption within the domain. But don't run fixfdmn until the hardware problem is resolved.
Martin
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2009 07:16 AM
тАО01-27-2009 07:16 AM
Re: AdvFS Domain paniced / Filed Disk?
Because as far as I can see, this device doesn't have a RAID controller.! (hwmgr v h output is attached)
So that means I will lose any data on dsk10.
- But will I be able to reboot system after changing the disk?
- Is there anything else I need to perform in OS level after changing the disk physically? (eg. hwmgr -scan scsi ...)
- Is it possible to recover or rebuild data on core_dmn (this may be a basic AdvFS question)?
Sorry for too many questions...
Regards,
CET
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2009 11:47 AM
тАО01-27-2009 11:47 AM
Re: AdvFS Domain paniced / Filed Disk?
Do you know what the underlying hardware configuration is ?
It would be useful to see the output of
# hwmgr show scsi
and/or
# hwmgr show scsi -full
If it is a stand alone disk, then you're probably going to have to start recovering from your backups...
It might also be useful to see the output of
# ls -lR /etc/fdmns
and
# showfdmn core_dmn
to see how the domain is created.
Cheers,
Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-27-2009 07:41 PM
тАО01-27-2009 07:41 PM
SolutionIf you correct the hardware issue, but, fear that the disk could go wrong anytime, you can run 'fixfdmn' on the domain and then 'addvol' a new disk (of similar size) and do 'rmvol' to remove this disk from the AdvFS domain (all data would be intact at the end of this operation).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-28-2009 11:59 PM
тАО01-28-2009 11:59 PM
Re: AdvFS Domain paniced / Filed Disk?
I kind a solve the problem;
- salvage command recovered all the files except only 1 out of 25.000.
Now I'll try to find the missing file because in log file there is no name.
Then remove domain / recreate a new one and transfer the recovered files back to original locations
CET