- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Failing disk in vg00, but stale ext in its mir...
Operating System - HP-UX
1753731
Members
4414
Online
108799
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 01:54 AM
тАО06-18-2010 01:54 AM
Failing disk in vg00, but stale ext in its mirror
Hi,
I have a failed disk in vg00 (mirrored):
(dmesg excerpt)
LVM: Failed to automatically resync PV 1f004000 error: 5
SCSI: First party detected bus hang -- lbolt: 75942186, bus: 0
lbp->state: 5020
lbp->offset: f0
lbp->uPhysScript: 280000
From most recent interrupt:
ISTAT: 21, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 00000010
lsp: 0000000000000000
lbp->owner: 0000000043522d00
bp->b_dev: 1f004000
scb->io_id: 22623e
scb->cdb: 28 00 00 c9 82 c0 00 02 00 00
lbolt_at_timeout: 75941886, lbolt_at_start: 75941886
lsp->state: 10d
scratch_lsp: 0000000043522d00
Pre-DSP script dump [0000000044012030]:
78347400 0000000a 78350800 00000000
0e000004 00280540 80000000 00000000
Script dump [0000000044012050]:
870b0000 002802d8 98080000 00000005
721a0000 00000000 98080000 00000001
SCSI: Resetting SCSI -- lbolt: 75942286, bus: 0
SCSI: Reset detected -- lbolt: 75942286, bus: 0
From event.log:
Summary:
Disk at hardware path 10/0.4.0 : Media failure
..and a dd to the disk gets hung (not able to finish or kill it):
HP:/#ps -ef|grep dd
7:19 dd if=/dev/rdsk/c0t4d0 of=/dev/null bs=1024
The diskinfo works for the failing disk:
HP:/#diskinfo -v /dev/rdsk/c0t4d0
SCSI describe of /dev/rdsk/c0t4d0:
vendor: SEAGATE
product id: ST19171W
type: direct access
size: 8886762 Kbytes
bytes per sector: 512
rev level: HP06
blocks per disk: 17773524
ISO version: 0
ECMA version: 0
ANSI version: 2
removable media: no
response format: 2
(Additional inquiry bytes: (32)41 etc
BUT, the stale extents are residing in the other disk of the mirror:
HP:/root#lvdisplay -v /dev/vg00/lvol8
--- Logical volumes ---
LV Name /dev/vg00/lvol8
VG Name /dev/vg00
LV Permission read/write
LV Status available/stale
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 1376
Current LE 344
Allocated PE 688
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default
--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c0t5d0 344 344
/dev/dsk/c0t4d0 344 344
--- Logical extents ---
LE PV1 PE1 Status 1 PV2 PE2 Status 2
00000 /dev/dsk/c0t5d0 01824 current /dev/dsk/c0t4d0 01511 current
00001 /dev/dsk/c0t5d0 01825 current /dev/dsk/c0t4d0 01512 current
00002 /dev/dsk/c0t5d0 01826 current /dev/dsk/c0t4d0 01513 current
00003 /dev/dsk/c0t5d0 01827 current /dev/dsk/c0t4d0 01514 current
00004 /dev/dsk/c0t5d0 01828 current /dev/dsk/c0t4d0 01515 current
00005 /dev/dsk/c0t5d0 01829 current /dev/dsk/c0t4d0 01516 current
......
00100 /dev/dsk/c0t5d0 01924 stale /dev/dsk/c0t4d0 01611 current
00101 /dev/dsk/c0t5d0 01925 stale /dev/dsk/c0t4d0 01612 current
00102 /dev/dsk/c0t5d0 01926 stale /dev/dsk/c0t4d0 01613 current
00103 /dev/dsk/c0t5d0 01927 stale /dev/dsk/c0t4d0 01614 current
00104 /dev/dsk/c0t5d0 01928 stale /dev/dsk/c0t4d0 01615 current
00105 /dev/dsk/c0t5d0 01929 stale /dev/dsk/c0t4d0 01616 current
.....etc
If I reduce the failing disk, stale extents will remain in the "good" disk; if I reduce the "good" disk, I have the a "bad" disk. How can I solve this problem?
(It is a test system and I have an Ignite tape, so it is not critical)
Regards,
I have a failed disk in vg00 (mirrored):
(dmesg excerpt)
LVM: Failed to automatically resync PV 1f004000 error: 5
SCSI: First party detected bus hang -- lbolt: 75942186, bus: 0
lbp->state: 5020
lbp->offset: f0
lbp->uPhysScript: 280000
From most recent interrupt:
ISTAT: 21, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 00000010
lsp: 0000000000000000
lbp->owner: 0000000043522d00
bp->b_dev: 1f004000
scb->io_id: 22623e
scb->cdb: 28 00 00 c9 82 c0 00 02 00 00
lbolt_at_timeout: 75941886, lbolt_at_start: 75941886
lsp->state: 10d
scratch_lsp: 0000000043522d00
Pre-DSP script dump [0000000044012030]:
78347400 0000000a 78350800 00000000
0e000004 00280540 80000000 00000000
Script dump [0000000044012050]:
870b0000 002802d8 98080000 00000005
721a0000 00000000 98080000 00000001
SCSI: Resetting SCSI -- lbolt: 75942286, bus: 0
SCSI: Reset detected -- lbolt: 75942286, bus: 0
From event.log:
Summary:
Disk at hardware path 10/0.4.0 : Media failure
..and a dd to the disk gets hung (not able to finish or kill it):
HP:/#ps -ef|grep dd
7:19 dd if=/dev/rdsk/c0t4d0 of=/dev/null bs=1024
The diskinfo works for the failing disk:
HP:/#diskinfo -v /dev/rdsk/c0t4d0
SCSI describe of /dev/rdsk/c0t4d0:
vendor: SEAGATE
product id: ST19171W
type: direct access
size: 8886762 Kbytes
bytes per sector: 512
rev level: HP06
blocks per disk: 17773524
ISO version: 0
ECMA version: 0
ANSI version: 2
removable media: no
response format: 2
(Additional inquiry bytes: (32)41 etc
BUT, the stale extents are residing in the other disk of the mirror:
HP:/root#lvdisplay -v /dev/vg00/lvol8
--- Logical volumes ---
LV Name /dev/vg00/lvol8
VG Name /dev/vg00
LV Permission read/write
LV Status available/stale
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 1376
Current LE 344
Allocated PE 688
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default
--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c0t5d0 344 344
/dev/dsk/c0t4d0 344 344
--- Logical extents ---
LE PV1 PE1 Status 1 PV2 PE2 Status 2
00000 /dev/dsk/c0t5d0 01824 current /dev/dsk/c0t4d0 01511 current
00001 /dev/dsk/c0t5d0 01825 current /dev/dsk/c0t4d0 01512 current
00002 /dev/dsk/c0t5d0 01826 current /dev/dsk/c0t4d0 01513 current
00003 /dev/dsk/c0t5d0 01827 current /dev/dsk/c0t4d0 01514 current
00004 /dev/dsk/c0t5d0 01828 current /dev/dsk/c0t4d0 01515 current
00005 /dev/dsk/c0t5d0 01829 current /dev/dsk/c0t4d0 01516 current
......
00100 /dev/dsk/c0t5d0 01924 stale /dev/dsk/c0t4d0 01611 current
00101 /dev/dsk/c0t5d0 01925 stale /dev/dsk/c0t4d0 01612 current
00102 /dev/dsk/c0t5d0 01926 stale /dev/dsk/c0t4d0 01613 current
00103 /dev/dsk/c0t5d0 01927 stale /dev/dsk/c0t4d0 01614 current
00104 /dev/dsk/c0t5d0 01928 stale /dev/dsk/c0t4d0 01615 current
00105 /dev/dsk/c0t5d0 01929 stale /dev/dsk/c0t4d0 01616 current
.....etc
If I reduce the failing disk, stale extents will remain in the "good" disk; if I reduce the "good" disk, I have the a "bad" disk. How can I solve this problem?
(It is a test system and I have an Ignite tape, so it is not critical)
Regards,
"When you look into an abyss, the abyss also looks into you"
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 03:02 AM
тАО06-18-2010 03:02 AM
Re: Failing disk in vg00, but stale ext in its mirror
Your problem with the stale extents is irritating, but I've seen and solved this before.
A bigger issue is: is c0t5d0 bootable ?
And are your first lvols (stand, root and swap) at the same physical extent numbers on your disks. Because the shown output has a differenc in PE numbers !
I would suggest: reboot the server from the alternate disk (c0t5d0), but only after making sure your backup is up-to-date... Then start removing c0t4d0 from the vg00 and replace it with a working new disk.
And then mirror in numerical order, not alphabetical order (lvol2 needs to be mirrored before lvol11 !)
A bigger issue is: is c0t5d0 bootable ?
And are your first lvols (stand, root and swap) at the same physical extent numbers on your disks. Because the shown output has a differenc in PE numbers !
I would suggest: reboot the server from the alternate disk (c0t5d0), but only after making sure your backup is up-to-date... Then start removing c0t4d0 from the vg00 and replace it with a working new disk.
And then mirror in numerical order, not alphabetical order (lvol2 needs to be mirrored before lvol11 !)
Every problem has at least one solution. Only some solutions are harder to find.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 04:40 AM
тАО06-18-2010 04:40 AM
Re: Failing disk in vg00, but stale ext in its mirror
The difference in PE number is due to the fact that I booted from alternate disk after lvsplitted the lvols and I did lvmerge using as "source_lvols" the "b" lvols. I splitted lvol3 the last one, hence the difference.
I have shutdown the system, removed c0t4d0 disk and now it drops me at bcheckrc prompt, because lvol8 (var) has I/O errors in metadata and "fsck -o full" does not work.
It seems that both copies of the "stale extents" were actually "stale" and now it is not possible to fix the file system. Am I right?
Before reboot the system, I tried to lvreduce the mirror and lvsplit unsuccesfully. There was a way to solve the problem before rebooting ?
I think now, the only way is to restore from ignite or to newfs "/var" and find a way to restore its contents from the ignite tape.
Regards,
I have shutdown the system, removed c0t4d0 disk and now it drops me at bcheckrc prompt, because lvol8 (var) has I/O errors in metadata and "fsck -o full" does not work.
It seems that both copies of the "stale extents" were actually "stale" and now it is not possible to fix the file system. Am I right?
Before reboot the system, I tried to lvreduce the mirror and lvsplit unsuccesfully. There was a way to solve the problem before rebooting ?
I think now, the only way is to restore from ignite or to newfs "/var" and find a way to restore its contents from the ignite tape.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 07:16 AM
тАО06-18-2010 07:16 AM
Re: Failing disk in vg00, but stale ext in its mirror
Hi,
In my opinion fault was actually on c0t5d0 and c0t4d0 was in good condition.
c0t4d0 couldnt sync up with c0t5d0 hence you notice error in syslog.
Proper procedure would have been according to me was..
Reboot the box, try booting from primary boot disk (without quorum) and check if it works, if not boot from alternate again without quorum. This way we could have found out which disk was the culprit. Reducing LV was not an option here.
You could have tried a dd on c0t5d0 aswell. Also you may check for read and write errors on these disks from cstm.
Best Regards,
Prashanth
In my opinion fault was actually on c0t5d0 and c0t4d0 was in good condition.
c0t4d0 couldnt sync up with c0t5d0 hence you notice error in syslog.
Proper procedure would have been according to me was..
Reboot the box, try booting from primary boot disk (without quorum) and check if it works, if not boot from alternate again without quorum. This way we could have found out which disk was the culprit. Reducing LV was not an option here.
You could have tried a dd on c0t5d0 aswell. Also you may check for read and write errors on these disks from cstm.
Best Regards,
Prashanth
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP