- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- PVLinks not found and then recovered
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 09:48 AM
09-09-2009 09:48 AM
PVLinks not found and then recovered
We have both HP-UX and Linux RHEL 5.x running. There was a disk that died on the SAN and rebuilt with the hot spare. HP-UX showed the PVLinks failures and then 3 minutes later recovered. Everything was fine with the filesystems, LVs, Vgs, etc. The Linux systems died. They also reported the errors and did not recover. They had to be rebooted.
So, do you have any ideas of where I can look and what I can fix on the linux side so that it won't die if this happens again. Like, there must be a reason it didn't recover and the hp-ux systems did. Maybe a timeout or buffer in dm-multipath....or somewhere else?
Thanks,
syslog - HP-UX:
LVM: VG 64 0x500000: PVLink 31 0x060200 Failed! The PV is not accessible.
Sep 3 07:13:59 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x140200 Failed! The PV is not accessible.
Sep 3 07:14:00 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0e0100 Failed! The PV is not accessible.
Sep 3 07:14:00 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0c0100 Failed! The PV is not accessible.
Sep 3 07:14:02 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x100300 Failed! The PV is not accessible.
Sep 3 07:14:02 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x0a0300 Failed! The PV is not accessible.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x060200 Recovered.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x140200 Recovered.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0e0100 Recovered.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0c0100 Recovered.
Sep 3 07:15:37 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x100300 Recovered.
Sep 3 07:15:37 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x0a0300 Recovered.
Messages file - linux:
Sep 3 07:14:47 linux kernel: SCSI error : <1 0 3 1> return code = 0x20000
Sep 3 07:14:47 linux kernel: end_request: I/O error, dev sdac, sector 173360919
Sep 3 07:14:47 linux kernel: device-mapper: dm-multipath: Failing path 65:192.
Sep 3 07:14:47 ncras144 kernel: SCSI error : <1 0 3 1> return code = 0x20000
Sep 3 07:14:47 linux kernel: end_request: I/O error, dev sdac, sector 173360959
Sep 3 07:14:47 linux kernel: end_request: I/O error, dev sdac, sector 173360967
Sep 3 07:14:47 linux kernel: SCSI error : <1 0 3 1> return code = 0x20000
Sep 3 07:14:47 linux kernel: end_request: I/O error, dev sdac, sector 173360999
Linux dm-multipath config:
defaults {
user_friendly_names yes
}
defaults {
multipath_tool "/sbin/multipath -v0"
udev_dir /dev
polling_interval 10
selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout /bin/true
default_features "0"
path_checker readsector0
rr_min_io 100
rr_weight priorities
failback immediate
no_path_retry fail
user_friendly_name yes
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*"
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 02:40 PM
09-09-2009 02:40 PM
Re: PVLinks not found and then recovered
That may be a reason for recovering HP-UX lvm after crash.
regards,
ivan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 03:16 PM
09-09-2009 03:16 PM
Re: PVLinks not found and then recovered
What kind of array was this disk a member of? If a disk dies and the hot spare kicks in and it gets rebuilt, there should be no i/o interruption and no failures on the server side.
> PVLink 31 0x060200 Failed! The PV is not accessible. ...
> PVLink 31 0x060200 Recovered.
Unless you left out some more errors from the syslog about pv-links, this is NOT a multipath error. This is a simple path that failed and then it came back.
Why the HP-UX server sustained and the Linux did not, it may have to do with the i/o demand on each server.
When a disk like this fails and there is NO i/o from the server, there will be NO hung processes. The server can come back when the disk comes back.
If there are too many processes that were hung because they were doing i/o when the disk failed, then most likely these processes will never come back when the disk comes back and the server needs rebooting.
As I said earlier, you need to dig deeper as to what happened when the disk failed and which disk arrays failed and how they impacted the servers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 05:40 PM
09-09-2009 05:40 PM
Re: PVLinks not found and then recovered
Please paste this report or attach
cstm>>-EOF
runutil logtool
rs
EOF
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 04:05 AM
09-10-2009 04:05 AM
Re: PVLinks not found and then recovered
Sep 3 07:13:29 server vmunix: LVM: WARNING: VG 64 0x500000: LV 1: Some I/O requests to this LV are waiting
Sep 3 07:13:29 server vmunix: indefinitely for an unavailable PV. These requests will be queued until
Sep 3 07:13:29 server vmunix: the PV becomes available (or a timeout is specified for the LV).
Sep 3 07:13:30 server vmunix: LVM: WARNING: VG 64 0x100000: LV 3: Some I/O requests to this LV are waiting
Sep 3 07:13:30 server vmunix: LVM: WARNING: VG 64 0x100000: LV 1: Some I/O requests to this LV are waiting
Sep 3 07:13:31 server vmunix: LVM: WARNING: VG 64 0x500000: LV 2: Some I/O requests to this LV are waiting
Sep 3 07:13:32 server vmunix: LVM: WARNING: VG 64 0x600000: LV 1: Some I/O requests to this LV are waiting
Sep 3 07:13:37 server vmunix: LVM: WARNING: VG 64 0x500000: LV 3: Some I/O requests to this LV are waiting
Sep 3 07:13:37 server vmunix:
Sep 3 07:13:37 server vmunix: SCSI: Read error -- dev: b 31 0x060200, errno: 126, resid: 1024,
Sep 3 07:13:37 server vmunix: indefinitely for an unavailable PV. These requests will be queued until
Sep 3 07:13:37 server above message repeats 5 times
Sep 3 07:13:37 server vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
Sep 3 07:13:37 server vmunix: the PV becomes available (or a timeout is specified for the LV).
Sep 3 07:13:37 server above message repeats 5 times
Sep 3 07:13:37 server vmunix: LVM: WARNING: VG 64 0x600000: LV 2: Some I/O requests to this LV are waiting
Sep 3 07:13:37 server vmunix: indefinitely for an unavailable PV. These requests will be queued until
Sep 3 07:13:37 server vmunix: the PV becomes available (or a timeout is specified for the LV).
Sep 3 07:13:37 server vmunix: LVM: WARNING: VG 64 0x600000: LV 3: Some I/O requests to this LV are waiting
Sep 3 07:13:37 server vmunix:
Sep 3 07:13:37 server vmunix: SCSI: Read error -- dev: b 31 0x100300, errno: 126, resid: 1024,
Sep 3 07:13:37 server vmunix: indefinitely for an unavailable PV. These requests will be queued until
Sep 3 07:13:37 server vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
Sep 3 07:13:37 server vmunix: the PV becomes available (or a timeout is specified for the LV).
Sep 3 07:13:41 server vmunix: LVM: WARNING: VG 64 0x100000: LV 2: Some I/O requests to this LV are waiting
Sep 3 07:13:41 server vmunix: indefinitely for an unavailable PV. These requests will be queued until
Sep 3 07:13:41 server vmunix: the PV becomes available (or a timeout is specified for the LV).
Sep 3 07:13:51 server vmunix:
Sep 3 07:13:51 server vmunix: SCSI: Write error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server vmunix: blkno: 2103624, sectno: 4207248, offset: 2154110976, bcount: 8192.
Sep 3 07:13:51 server vmunix: SCSI: Read error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server vmunix: blkno: 27807704, sectno: 55615408, offset: 28475088896, bcount: 8192.
Sep 3 07:13:51 server vmunix: SCSI: Async write error -- dev: b 31 0x0e0100, errno: 126, resid: 1024,
Sep 3 07:13:51 server vmunix: blkno: 1183517, sectno: 2367034, offset: 1211921408, bcount: 1024.
Sep 3 07:13:51 server vmunix: SCSI: Write error -- dev: b 31 0x0e0100, errno: 126, resid: 1024,
Sep 3 07:13:51 server vmunix: blkno: 32286088, sectno: 64572176, offset: 33060954112, bcount: 1024.
Sep 3 07:13:51 server vmunix: LVM: WARNING: VG 64 0x100000: LV 4: Some I/O requests to this LV are waiting
Sep 3 07:13:51 server vmunix:
Sep 3 07:13:51 server above message repeats 3 times
Sep 3 07:13:51 server vmunix: indefinitely for an unavailable PV. These requests will be queued until
Sep 3 07:13:51 server vmunix: the PV becomes available (or a timeout is specified for the LV).
Sep 3 07:13:51 server vmunix:
Sep 3 07:13:51 server vmunix: SCSI: Async write error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server vmunix: blkno: 5521728, sectno: 11043456, offset: 5654249472, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 5233480, sectno: 10466960, offset: 5359083520, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 2773608, sectno: 5547216, offset: 2840174592, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 2327616, sectno: 4655232, offset: 2383478784, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 2327656, sectno: 4655312, offset: 2383519744, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 2170960, sectno: 4341920, offset: 2223063040, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 2170968, sectno: 4341936, offset: 2223071232, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 1147664, sectno: 2295328, offset: 1175207936, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 1147696, sectno: 2295392, offset: 1175240704, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 676432, sectno: 1352864, offset: 692666368, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 677440, sectno: 1354880, offset: 693698560, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 228592, sectno: 457184, offset: 234078208, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 250832, sectno: 501664, offset: 256851968, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 249256, sectno: 498512, offset: 255238144, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 249240, sectno: 498480, offset: 255221760, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 171600, sectno: 343200, offset: 175718400, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 171632, sectno: 343264, offset: 175751168, bcount: 8192.
Sep 3 07:13:51 server vmunix: SCSI: Read error -- dev: b 31 0x0e0100, errno: 126, resid: 1024,
Sep 3 07:13:51 server vmunix: SCSI: Async write error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server above message repeats 16 times
Sep 3 07:13:51 server vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
Sep 3 07:13:51 server vmunix: SCSI: Async write error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server vmunix: blkno: 53792, sectno: 107584, offset: 55083008, bcount: 8192.
Sep 3 07:13:51 server vmunix: blkno: 18088, sectno: 36176, offset: 18522112, bcount: 8192.
Sep 3 07:13:51 server vmunix: SCSI: Write error -- dev: b 31 0x0e0100, errno: 126, resid: 1024,
Sep 3 07:13:51 server vmunix: SCSI: Async write error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server vmunix: blkno: 1416, sectno: 2832, offset: 1449984, bcount: 1024.
Sep 3 07:13:51 server vmunix: SCSI: Async write error -- dev: b 31 0x0e0100, errno: 126, resid: 8192,
Sep 3 07:13:51 server vmunix: blkno: 27690664, sectno: 55381328, offset: 28355239936, bcount: 8192.
Sep 3 07:13:59 server vmunix: LVM: VG 64 0x500000: Lost quorum.
Sep 3 07:13:51 server vmunix:
Sep 3 07:13:59 server above message repeats 21 times
Sep 3 07:13:59 server vmunix: This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the fo
llowing PVs (represented by current link) must become available:
Sep 3 07:13:59 server vmunix: <31 0x060200>
Sep 3 07:13:59 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x060200 Failed! The PV is not accessible.
Sep 3 07:13:59 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x140200 Failed! The PV is not accessible.
Sep 3 07:14:00 server vmunix: LVM: VG 64 0x100000: Lost quorum.
Sep 3 07:14:00 server vmunix: This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the fo
llowing PVs (represented by current link) must become available:
Sep 3 07:14:00 server vmunix: <31 0x0e0100>
Sep 3 07:14:00 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0e0100 Failed! The PV is not accessible.
Sep 3 07:14:00 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0c0100 Failed! The PV is not accessible.
Sep 3 07:14:02 server vmunix: LVM: VG 64 0x600000: Lost quorum.
Sep 3 07:14:02 server vmunix: This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the fo
llowing PVs (represented by current link) must become available:
Sep 3 07:14:02 server vmunix: <31 0x100300>
Sep 3 07:14:02 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x100300 Failed! The PV is not accessible.
Sep 3 07:14:02 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x0a0300 Failed! The PV is not accessible.
Sep 3 07:14:06 server inetd[19871]: registrar/tcp: Connection from localhost (127.0.0.1) at Thu Sep 3 07:14:06 2009
Sep 3 07:15:28 server inetd[19938]: registrar/tcp: Connection from server.tc.gc.ca (10.10.10.20) at Thu Sep 3 07:15:28 2009
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x500000: Reestablished quorum.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x060200 Recovered.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x500000: PVLink 31 0x140200 Recovered.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x100000: Reestablished quorum.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0e0100 Recovered.
Sep 3 07:15:34 server vmunix: LVM: VG 64 0x100000: PVLink 31 0x0c0100 Recovered.
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x500000: LV 1: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: waiting indefinitely for an unavailable PV have now completed.
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x500000: LV 2: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x500000: LV 3: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x100000: LV 2: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x100000: LV 4: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x100000: LV 1: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: LVM: NOTICE: VG 64 0x100000: LV 3: All I/O requests to this LV that were
Sep 3 07:15:37 server vmunix: LVM: VG 64 0x600000: Reestablished quorum.
Sep 3 07:15:37 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x100300 Recovered.
Sep 3 07:15:37 server vmunix: LVM: VG 64 0x600000: PVLink 31 0x0a0300 Recovered.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 04:08 AM
09-10-2009 04:08 AM
Re: PVLinks not found and then recovered
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 04:26 AM
09-10-2009 04:26 AM
Re: PVLinks not found and then recovered
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 07:22 AM
09-10-2009 07:22 AM
Re: PVLinks not found and then recovered
Does this tell you something about HP-UX?
There are more failures here than just a simple disks in the array. There are either whole LUNs (whole array groups) that failed in the disk array or BOTH your fiber switches or the entire disk array rebooted. I see multiple LVM volume groups missing disks.
Also there is NO evidence of PV-links or any other form of path failover here. Either you don't have path redundancy or both SAN switches rebooted at the same time or the array went down. The amount of downtime - 2 minutes- (at least that's what you show) indicates that there may have been a SAN or array controller rebot.
You need to check your volume groups (VG) and identify all the LUNs in them with their paths, path redundancy etc.
Also check the event logs of your SAN switches and disk array(s).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 07:26 AM
09-10-2009 07:26 AM
Re: PVLinks not found and then recovered
The rebuild of the hot spare should have been completely transparent to our systems, but it wasn't and caused downtime on a few of them. So in the event that this happens again, we would like to possibly have a fix for the linux systems...ie: larger buffer, timeouts, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 08:21 AM
09-10-2009 08:21 AM
Re: PVLinks not found and then recovered
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 10:03 AM
09-10-2009 10:03 AM
Re: PVLinks not found and then recovered
PVLink failure will occure when there is a connectivity loss. Local FC, switch or the link at storage end can cause this.
I suspect storage guys .....:(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 10:05 AM
09-10-2009 10:05 AM