- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- recovering from disk failure RAID 0 TruCluster LSM...
Operating System - Tru64 Unix
1756220
Members
2500
Online
108843
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2009 10:49 AM
06-23-2009 10:49 AM
recovering from disk failure RAID 0 TruCluster LSM/AdvFS
We have an old two-node TruCluster (used to be four nodes but part of the hardware died two years ago). Each node has a local data store of 9 disks - RAID 0. Three disks in the ES45, 6 disks in an external shelf. For reasons no one could ever explain to me these were set up as 9-disk LSM volumes each in their own volume group; each volume was then used to to create an AdvFS file domain and one AdvFS fileset was created in each of those domains. All has been fine for several years. While I was away a disk failed on one of those LSM volumes and without asking someone tried to recover using a procedure we had that worked "some of the time". Unfortunately the person did not keep track of what command they issued and they also rebooted the cluster several times. At this point I can tell they did replace the failed disk and did commands like the following but I cannot sort out what they did before replaceing the disk:
# hwmgr -v d
# hwmgr -show scsi
# volprint -ht -g hwste01_dg
# umount /hwste01_data1
(they claim they fileset was already unmounted)
# voldg -g hwste01_dg -k rmdisk dsk7
# voldisk rm disk7
# hwmgr -scan scsi
# hwmgr -view device
# dsfmgr -m dsk77 dsk7
did not work, they tried
# hwmgr -delete scsi -did 77
(that was the failed disks HWID but
they said the command failed so they did
# hwmgr -delete component -id 77
# dsfmgr -m dsk77 dsk7
and this worked
# dsfmgr -vI
# disklabel -r dsk7
disk is unlabelled
# disklabel -wrn dsk7c
Tried to use voldg to add the new disk;
they aren't sure what they used.
They read back over the procedure and realized they had not done
# rmfdmn hwste01_data1_domain
so tried it but never got a confirmation prompt - not even after several hours.
At this point it seems that they did some fiddling in /etc/fdmns - like remove the lock file for the domain.
The storage was volatile so it all I am trying to do at this point is get it so I have a 9-disk volume and if necessary remake the domain and fileset and restore the directory structure.
The expected directory /etc/fdmns/hwste01_data1_domain exists and has a symbolic link to /dev/vol/hwste01_dg/hwste01_vol01 and "volprint -g hwste01_dg_vh" looks good to me and shows the plex and volume as ACTIVE. Unfortunately the command
# showfdmn hwste01_data1_domain
does not give an error nor does it return.
The command
# /sbin/advfs/advscan -g hwste01_dg
lists information and says it was created Jun 10 00:39:44 2003 (and that is the original setup date) but it says the Lastmount is Jun 22 13:09:34 2009.
Suggestions on how to proceed?
# hwmgr -v d
# hwmgr -show scsi
# volprint -ht -g hwste01_dg
# umount /hwste01_data1
(they claim they fileset was already unmounted)
# voldg -g hwste01_dg -k rmdisk dsk7
# voldisk rm disk7
# hwmgr -scan scsi
# hwmgr -view device
# dsfmgr -m dsk77 dsk7
did not work, they tried
# hwmgr -delete scsi -did 77
(that was the failed disks HWID but
they said the command failed so they did
# hwmgr -delete component -id 77
# dsfmgr -m dsk77 dsk7
and this worked
# dsfmgr -vI
# disklabel -r dsk7
disk is unlabelled
# disklabel -wrn dsk7c
Tried to use voldg to add the new disk;
they aren't sure what they used.
They read back over the procedure and realized they had not done
# rmfdmn hwste01_data1_domain
so tried it but never got a confirmation prompt - not even after several hours.
At this point it seems that they did some fiddling in /etc/fdmns - like remove the lock file for the domain.
The storage was volatile so it all I am trying to do at this point is get it so I have a 9-disk volume and if necessary remake the domain and fileset and restore the directory structure.
The expected directory /etc/fdmns/hwste01_data1_domain exists and has a symbolic link to /dev/vol/hwste01_dg/hwste01_vol01 and "volprint -g hwste01_dg_vh" looks good to me and shows the plex and volume as ACTIVE. Unfortunately the command
# showfdmn hwste01_data1_domain
does not give an error nor does it return.
The command
# /sbin/advfs/advscan -g hwste01_dg
lists information and says it was created Jun 10 00:39:44 2003 (and that is the original setup date) but it says the Lastmount is Jun 22 13:09:34 2009.
Suggestions on how to proceed?
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP