Operating System - HP-UX
1825766 Members
2187 Online
109687 Solutions
New Discussion

Re: cmquerycl fails: Software caused connection abort

 
Kenneth Platz
Esteemed Contributor

cmquerycl fails: Software caused connection abort

Hello everyone,

In our customer's infinite wisdom, they have directed us to move one node of a Serviceguard cluster to a different piece of hardware (same model, same card layout, but different I/O paths. grr), and I am attempting to generate a new cluster configuration. When I attempt to do so, I get:

[/etc/cmcluster] root@zybachh #cmquerycl -v -C SZP-cluster.ascii -n zybachh -n zulchh
Warning: Unable to determine local domain name for zybachh
Looking for other clusters ... Done
Gathering storage information
Found 43 devices on node zybachh
Found 101 devices on node zulchh
Analysis of 144 devices should take approximately 11 seconds
0%----Error reading device /dev/dsk/c0t10d0s1 0x8
Error reading device /dev/dsk/c0t10d0s2 0x8
Error reading device /dev/dsk/c0t10d0s3 0x8
10%----20%----30%----40%----50%----60%----70%----80%-Error reading device /dev/dsk/c1t10d0s4 0x8
Error reading device /dev/dsk/c1t10d0s5 0x8
Error reading device /dev/dsk/c1t10d0s6 0x8
Error reading device /dev/dsk/c1t10d0s7 0x8
---90%----100%
Found 5 volume groups on node zybachh
Found 5 volume groups on node zulchh
Analysis of 10 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%Unable to receive device query message from zybachh: Software caused connection abort
----60%Note: Disks were discovered which are not in use by either LVM or VxVM.
Use pvcreate(1M) to initialize a disk for LVM or,
use vxdiskadm(1M) to initialize a disk for VxVM.
Volume group /dev/vgSZPlog is configured differently on node zybachh than on node zulchh
Volume group /dev/vgSZPlog is configured differently on node zulchh than on node zybachh
Volume group /dev/vgSZPother is configured differently on node zybachh than on node zulchh
Volume group /dev/vgSZPother is configured differently on node zulchh than on node zybachh
Volume group /dev/vgSZP is configured differently on node zybachh than on node zulchh
Volume group /dev/vgSZP is configured differently on node zulchh than on node zybachh
Gathering network information
Beginning network probing
Completed network probing
Failed to gather configuration information.

The systems in question are both HP-UX 11.23, running Serviceguard A.11.18 (patched to PHSS_40988). I've run cmscancl, and all the network connectivity seems to be correct, and remsh works to all systems as root, and I've even added in the cmclnodelist file.

Please help.
I think, therefore I am... I think!
8 REPLIES 8
melvyn burnard
Honored Contributor

Re: cmquerycl fails: Software caused connection abort

So, having moved the hardware and now having new I/O paths, have you first exported then imported the relevant shared VGs?
There seem to be an awful lot of erros, especislly refering to strange devices:
Error reading device /dev/dsk/c1t10d0s5 0x8
Error reading device /dev/dsk/c1t10d0s6 0x8
Error reading device /dev/dsk/c1t10d0s7 0x8

Also, have you changed anything else at all, such as networking information?

I would do a few tests.
First, do the cmquerycl on one node while logged on to that node, check this works ok.
Do the same test on the second node.
Then try to do the cmquerycl for the two nodes, first from one node, then the other and see if there are any differences/pointers etc.
But first, make sure you dik configuration is correct, as per my first point.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Kenneth Platz
Esteemed Contributor

Re: cmquerycl fails: Software caused connection abort

Melvyn,

We regenerated the ioconfig files and exported/imported the volume groups. I am attempting to use cmquerycl to create an ASCII file in order to update the network layout.

The error message I am concerned about is the "Unable to query device zybachh: Software caused connection abort". The thing is -- I am running the command from zybachh, and the zybachh node *never moved*. THe OTHER node moved.
I think, therefore I am... I think!
Kenneth Platz
Esteemed Contributor

Re: cmquerycl fails: Software caused connection abort

Taking the changed hardware out of the picture, I'm just trying to regen a 1 node cluster, and I get:

[/etc/cmcluster] root@zybachh #cmquerycl -v -n zybachh -C SZPtestcluster.ascii
Looking for other clusters ... Done
Gathering storage information
Found 43 devices on node zybachh
Analysis of 43 devices should take approximately 9 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Unable to receive device query message from zybachh: Software caused connection abort
Found 5 volume groups on node zybachh
Analysis of 5 volume groups should take approximately 1 seconds
0%----10%----20%Note: Disks were discovered which are not in use by either LVM or VxVM.
Use pvcreate(1M) to initialize a disk for LVM or,
use vxdiskadm(1M) to initialize a disk for VxVM.
Gathering network information
Beginning network probing
Completed network probing
Failed to gather configuration information.


How do I fix this?
I think, therefore I am... I think!
John Bigg
Esteemed Contributor

Re: cmquerycl fails: Software caused connection abort

Sounds like a cmclconfd daemon is failing. What do you see in syslog for this time? Is there a cmclconfd core file in the / filesystem?
Kenneth Platz
Esteemed Contributor

Re: cmquerycl fails: Software caused connection abort

[/root] root@zybachh #file /core
/core: ELF-32 core file - IA64 from 'cmclconfd' - received SIGFPE


Yep, there's a core in / from cmclconfd... any suggestions how to keep it from coredumping?
I think, therefore I am... I think!
Dennis Handly
Acclaimed Contributor

Re: cmquerycl fails: Software caused connection abort

>there's a core in / from cmclconfd. any suggestions how to keep it from coredumping?

To fix the problem or to keep it from taking up space when it divides by 0?

For the latter, you can do mkdir core.
Kenneth Platz
Esteemed Contributor

Re: cmquerycl fails: Software caused connection abort

I'd much prefer that it not divide by zero and generate the core file in the first place...
I think, therefore I am... I think!
Dennis Handly
Acclaimed Contributor

Re: cmquerycl fails: Software caused connection abort

>I'd much prefer that it not divide by zero and generate the core file in the first place.

For support questions like this, you'll need to contact the Response Center.