1829265 Members
1859 Online
109988 Solutions
New Discussion

Problem with cmquerycl

 
SOLVED
Go to solution
Frederic Sevestre
Honored Contributor

Problem with cmquerycl

Hi,

I had a problem when I tried to add a node to a running cluster.
The command cmquerycl -n hangs.

Using the mode debug I found the following lines in the syslog :

Oct 7 16:29:32 supedi05 cmclconfd[4725]: Received tcp msg from IP 127.0.0.1.
Oct 7 16:29:32 supedi05 cmclconfd[4725]: Querying local network interfaces for
node supedi05
Oct 7 16:29:32 supedi05 cmclconfd[4725]: DLPI found PPA 0 at 1/0/1/0/0/4/0 (0x0
0306e38b27c)
Oct 7 16:29:32 supedi05 cmclconfd[4725]: DLPI found PPA 1 at 1/0/6/0/0 (0x00306
e37d625)
Oct 7 16:29:32 supedi05 cmclconfd[4725]: DLPI found PPA 2 at 1/0/14/0/0 (0x0030
6e37f615)
Oct 7 16:29:32 supedi05 cmclconfd[4725]: DLPI found PPA 12 at 1/0/12/0/0 (0x000
0000000000000000000000000000000000000)
Oct 7 16:29:32 supedi05 cmclconfd[4725]: DLPI found PPA 11 at 1/0/12/0/0 (0x000
0000000000000000000000000000000000000)
Oct 7 16:29:32 supedi05 cmclconfd[4725]: DLPI found altogether 5 PPA's

and nothing more.

My server is a rp7410 :
lanscan :
Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI
Path Address In# State NamePPA ID Type Support Mjr#
1/0/1/0/0/4/0 0x00306E38B27C 0 UP lan0 snap0 1 ETHER Yes 119
1/0/6/0/0 0x00306E37D625 1 UP lan1 snap1 2 ETHER Yes 119
1/0/14/0/0 0x00306E37F615 2 UP lan2 snap2 3 ETHER Yes 119
1/0/12/0/0 0x000000000000 0 UP ixe12 5 X25 No 100
1/0/12/0/0 0x000000000000 0 UP ixe11 6 X25 No 100

The lan0 is used for heartbeat, lan1 for data + heartbeatand lan2 standby.

Thanks for your help.
Fr??d??ric



Crime doesn't pay...does that mean that my job is a crime ?
21 REPLIES 21
Ashwani Kashyap
Honored Contributor

Re: Problem with cmquerycl

cmquerycl results in detailed information about networks and bridged networks . It also suggests the use of heartbeat and standby lans and also gives tons of information about LVM .

Since none of it is present in your cmquerycl output , I guess wither there is some problems with the network or LVM configuration . .

Use cmquerylcl use the -k option first to eliminate disk probing and it won't give detailed LVM information .

THen use -w option of the cmquerycl command to probe any network configuration problems .

Do a man on cmquerycl for more information .
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

Thanks for your answer.
I ever tried the -k and -w none and the cmquerycl hangs.

Any idea?

Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Ashwani Kashyap
Honored Contributor

Re: Problem with cmquerycl

Interesting Frederick . What version of SG are you using . There might be some patching issues . Try to query your node using SAM and see what happens . SAM has more verbose output .
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl


The Service Guard : A.11.13
Patch : PHSS_27087 1.0 MC/ServiceGuard and SG-OPS Edition A.11.13

I will try with SAM.

Thanks again,
Fr??d??ric

Crime doesn't pay...does that mean that my job is a crime ?
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

Hi again,

No more information with sam. I had the same problem, sam hang while "gathering information" and I found the sam message in syslog.

Any idea ?

Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Each time you issue cmquerycl a cmclconfd helper process is started to gather the desired information.

You should have a closer look at that cmclconfd process your cmquerycl is waiting for...

- what's the priority (ps -el)... 148 usually means disk io
- what does tusc tell you when you attach to it
- what files does it have open (lsof or glance)

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl


The priority is 154 for the hanged cmclconfd. The waiting reason is streams.

I will try to install tusc on my server.

Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Hi, Frederic!

154 is a general (signalable) user priority... so we can rule out disk io related stuff.

I just had a look at a "clean" debug output of a cmquerycl/cmclconfd. The next steps should be something like

Adding ppa 0 type 4 at 0/0/0/0 for lan0
...

I will have a closer look on what should happen here.


Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Hi, again!

Looks like cmclconfd checks for APA trunks at this stage... maybe APA is installed in your system?

If yes, then it should be either properly patched or (if unused) deinstalled completely...

Just a guess...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

Hi Dietmar,

APA is not installed on my server.

Any idea ?

Thanks for your help
Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Hi!

Try to get tusc results... e.g. by changing /etc/inetd.conf as follows (with tusc in /tmp):

hacl-cfg dgram udp wait root /tmp/tusc /tmp/tusc -f -T "" -r all -w all -e -A -o /tmp/cmclconfd-p.trc /usr/lbin/cmclconfd -p
hacl-cfg stream tcp nowait root /tmp/tusc /tmp/tusc -f -T "" -r all -w all -e -A -p /tmp/cmclconfd-c.trc /usr/lbin/cmclconfd -c

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

Hi again,

I used tusc but now I have the following error :

supedi05:/tmp>cmquerycl -v -n supedi05

Begin checking the nodes...
Gathering configuration information ....... Done
Warning: Unknown message version: 91
Error: Unable to determine device configuration: failed to receive device query reply from node supedi05
Warning: Unknown message version: 91
Error: Unable to determine lvm configuration: failed to receive lvm query reply from node supedi05
Warning: Unknown message version: 91
Failed to gather configuration information.

The cmclconfd -p trace file seems to be ok (exit at the end), but no cmclconfd -c trace file.

Any idea ?

Thanks again for your help.
Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Oops, my fault...replace the -p option with -o in the 2nd line.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

Here is the last lines of the cmclconfd -c tracefile :

"" lseek(8, 4294967282, SEEK_CUR) ........................ = 42
"" close(8) .............................................. = 0
"" open("/dev/dlpi", O_RDWR, 0) .......................... = 8
"" putmsg(8, 0x40001fb0, NULL, NULL) ..................... = 0
"" poll(0x7f7f0b8c, 1, 0) ................................ = 1
"" getmsg(8, 0x7f7f0a40, 0x40001fa0, 0x7f7f0b88) ......... = 0
"" close(8) .............................................. = 0
"" open("/dev/dlpi", O_RDWR, 03) ......................... = 8
"" putmsg(8, 0x40001fb0, NULL, NULL) ..................... = 0
"" poll(0x7f7f0c8c, 1, 0) ................................ = 1
"" getmsg(8, 0x40001fb0, 0x40001fa0, 0x7f7f0c88) ......... = 0
"" ioctl(8, I_STR, 0x7f7f0b88) ........................... [sleeping]


Is that help you ?

Regards,
Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Pls send me the complete traces... you get my EMail adress by separarating first and surname with "." and appending @hp.com :)
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

... and I would like to see a kernel stack trace of the hanging cmclconfd.

Example (for HPUX < 11.11, PID oc cmclconfd is 1234 here):

# /usr/contrib/bin/q4pxdb /stand/vmunix
# /usr/contrib/bin/q4 /stand/vmunix /dev/mem
q4>load struct proc from proc max nproc
q4>keep p_pid==1234
q4>trace pile

"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Hi, Frederic!

The problem looks really weird, and I'm afraid I cannot tell the reason for sure. Nevertheless scanning through the source and through similar known problems I think we should do this:

- install latest Streams, ARPA, btlan, x25 patches
(if not already installed)
- doublecheck that no APA is in kernel
(what /stand/vmunix |grep -i apa)
- configure kernel tunable STRMSGSZ to 0

This would be the patch list incl. dependencies:

System: 800, HPUX: 11.11

PHNE_23465 s700_800 11.11 100BT unified driver cumulative patch
PHKL_25233 s700_800 11.11 select(2) and poll(2) hang
PHNE_25388 s700_800 11.11 LAN product cumulative patch
PHKL_25729 s700_800 11.11 signals,threads enhancement,Psets Enablement
PHNE_26728 s700_800 11.11 Cumulative STREAMS Patch
PHNE_27063 s700_800 11.11 cumulative ARPA Transport patch
PHKL_27091 s700_800 11.11 Core PM, vPar, Psets Cumulative, slpq1 perf
PHKL_27094 s700_800 11.11 Psets Enablement Patch, slpq1 perf
PHKL_27096 s700_800 11.11 VxVM,EMC,Psets&vPar,slpq1,earlyKRS
PHKL_27317 s700_800 11.11 detach; NOSTOP, Abort; Psets; slpq1 perf
PHNE_27403 s700_800 11.11 J2793B X.25 SX25-HPerf/SYNC-WAN



Regards...


Dietmar Konermann.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

I checked everything :

- APA is not installed on the server
- The patches are ok.

In addition, I tried without the x.25 cards, the command hanged, witout the lan0, the command hanged.

Regards,
Fr??d??ric
Crime doesn't pay...does that mean that my job is a crime ?
Frederic Sevestre
Honored Contributor

Re: Problem with cmquerycl

Hi again,

Everything is working fine now !!!

I install a new release of GigEther-01 :

GigEther-01 B.11.11.07 PCI GigEther;Supptd HW=A6794A/A6825A/A6847A

There is a new driver for the A6794A gigabit ethernet interface.

Thanks a lot everybody.
Dietmar a really appriciate your help.

Regards,
Fr??d??ric


Crime doesn't pay...does that mean that my job is a crime ?
Dietmar Konermann
Honored Contributor

Re: Problem with cmquerycl

Hi, Frederic!

That's good news! So it really was a driver issue... interesting. Could you please send me a ioscan -f and what gelan driver yo were using before. I would like to create a doc for the TKB.

Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Dietmar Konermann
Honored Contributor
Solution

Re: Problem with cmquerycl

I created document UXSGKBRC00010750 for reference.

DocId: UXSGKBRC00010750 Updated: 10/8/02 8:20:00 AM

PROBLEM
When trying to use cmquerycl on a node with Gigabit Ethernet interfaces the
command may simply hang, leaving a cmclconfd process hanging around.

cmquerycl -T 5 logs debug output similar to the following in syslog.log:

Oct 7 16:29:32 node1 cmclconfd[4725]: Querying local network interfaces ...
Oct 7 16:29:32 node1 cmclconfd[4725]: DLPI found PPA 0 at 1/0/1/0/0/4/0 ...
Oct 7 16:29:32 node1 cmclconfd[4725]: DLPI found PPA 1 at 1/0/6/0/0 ...
Oct 7 16:29:32 node1 cmclconfd[4725]: DLPI found PPA 2 at 1/0/14/0/0 ...
Oct 7 16:29:32 node1 cmclconfd[4725]: DLPI found altogether 3 PPA's
[hang]

Tracing the cmclconfd's system calls (e.g. using tusc) unveils that it is
sleeping on a /dev/dlpi ioctl():

...
open("/dev/dlpi", O_RDWR, 03) ......................... = 8
putmsg(8, 0x40001fb0, NULL, NULL) ..................... = 0
poll(0x7f7f0c8c, 1, 0) ................................ = 1
getmsg(8, 0x40001fb0, 0x40001fa0, 0x7f7f0c88) ......... = 0
ioctl(8, I_STR, 0x7f7f0b88) ........................... [sleeping]
CONFIGURATION
HP-UX 11.11
MC/ServiceGuard
A6794A interface
PCI GigEther driver B.11.11.01
RESOLUTION
The solution is to upgrade the Gigabit Ethernet driver to the latest revision,
in this case B.11.11.07 solved the problem. the software is downloadable from
http://software.hp.com.

GigEther-01 B.11.11.07 PCI GigEther;Supptd HW=A6794A/A6825A/A6847A

"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)