- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- serviceguard failover problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 07:34 PM
02-27-2005 07:34 PM
serviceguard failover problem
We had a failure on the production server which caused a failover to the node/array in the second centre, however the node in the second centre hung on starting the package, the following errors where in the package control log when it tried to activate the first of 9 serviceguarded volume groups:-
Feb 27 10:02:55 - "hostname": Activating volume group vg01 with exclusive option
.
vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk
/c9t0d0":
The path of the physical volume refers to a device that does not
exist, or is not configured into the kernel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 07:49 PM
02-27-2005 07:49 PM
Re: serviceguard failover problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 07:54 PM
02-27-2005 07:54 PM
Re: serviceguard failover problem
use vgcfgrestore in second node.
also check for loose connection of the disk (use ioscan -fnC disk)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 07:54 PM
02-27-2005 07:54 PM
Re: serviceguard failover problem
Is there any way that you can restore the cluster to the working node and re-vgexport ALL vgs? You can then re-import the LVM configuration to the failover node.
Keith
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 08:06 PM
02-27-2005 08:06 PM
Re: serviceguard failover problem
When you say check the LVM config what do you mean? lvm conf files or something else.
When you say check the disk availability do you mean ioscan's, diskinfo? I'm not sure I can do diskinfo on the disk in question unless the disks are activated on that node.
i did run a script written by dietmar konermann that uses the VGDA to identify the device names from each node and this is what it shows:-
***** LVM-VG: 0161901557-0965999312
2 backup:c8t0d0 0161901557-0971456414 0/2/0/0.8.0.4.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
backup:c9t0d0 0161901557-0971456414 0/6/0/0.8.0.5.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
prod:c6t0d0 0161901557-0971456414 0/2/0/0.8.0.4.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
prod:c9t0d0 0161901557-0971456414 0/6/0/0.8.0.5.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
as you can see both prod and backup share the same device id for one of the routes to the disk, is that normal?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 09:10 PM
02-27-2005 09:10 PM
Re: serviceguard failover problem
LVM configuration issue mean ... VG configuration is not properly replicated over the two node.
Normally what we do is
On node 1:
1. We activate the VG and create one map file using "vgexport -s -v -p -m mapfilename vgname" command
2. Identify the PV used by VG on node one and confirm for e.g. PV1 and PV2 that seen from node 2 belong to that VG.
3. On node 2
# mkdir /dev/vgxx
# mknod /dev/vgxx/group c 64
4. Copy this mapfile to node 2. Then do vgimport "vgimport -s -v -m mapfile vgxx PV1 PV2"
5. Then deactivate VG on node 1 and you can activate it on node 2.
So the VG won't activate if the mapfile is not correct. It basically copies the entire VG structure from node 1 to node 2.
See man vgexport and vgimport.
Hope that helps.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 09:23 PM
02-27-2005 09:23 PM
Re: serviceguard failover problem
what is strange is that both servers share the same device id to this disk is that right?
Also when the backup server couldn't see the array in the main data centre (and should therefore have started the package from it's local array) I couldn't do a diskinfo on the disk c9t0d0. however once the link was re-established I could do a diskinfo from the backup server to that disk device!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2005 09:30 PM
02-27-2005 09:30 PM
Re: serviceguard failover problem
Do ioscan to see what hardware is showing as NO_HW, etc.
Also check that your package scripts start the VG's in exclusive mode with no quorum as in the layout you have you will not meet vg quorum requirements if conatct is lost with the other side.
Also confirm where this disc lies, i.e. is it local or remote to the node in question, check your syslogs for any other data.