Operating System - HP-UX
1752652 Members
5500 Online
108788 Solutions
New Discussion юеВ

Re: cmquerycl gives "permission denied" on certain nodes

 
SOLVED
Go to solution
Craig Johnson_1
Regular Advisor

cmquerycl gives "permission denied" on certain nodes

Let me say right up front that two of us have quadruple checked /etc/hosts, nsswitch.conf, resolv.conf, and DNS. This is a HUGE mixed O/S (11.23 and 11.31) cluster (yes it is supported, and I don't plan to keep the 11.23 around for long). The 11.23 servers have SG 19 and are patched.

I cloned a bunch of servers from an 11.31 box running SG 19. Yesterday, I tried to run a cmquerycl against just one of the new nodes, and I got the "permission denied" error. I realized I had forgotten to clear the old crap out of /etc/cmcluster from the original server that I cloned from, so I did that, tried one more time, got the error, so I went home.

Tried again this AM and it worked. ??? Noticed in the syslog that cmproxyd had gone from complaining about 127.0.0.1 to "Ready". ??

None of the other nodes, which still had the old cluster info, worked today. So I cleaned those up and hope they work tomorrow.

What am I dealing with here? Any ideas?

Another troubling thing I noticed is that in the syslogs you'll see cmclconfd on the "new" node complaining about resolving the querying node name, but it's like it's trying to resolve the node name to every NIC on the box. We have unique names for each NIC unique to each IP address. For example LAN0=hostname, LAN2=hostname-h1, LAN4=hostname-h2. But that's nothing new.

I'm probably missing something simple (I hope).
12 REPLIES 12
Shibin_2
Honored Contributor

Re: cmquerycl gives "permission denied" on certain nodes

Create a /etc/cmcluster/cmclnodelist and add a line thus:

root

One node in a line.
Regards
Shibin
Emil Velez
Honored Contributor

Re: cmquerycl gives "permission denied" on certain nodes

might try

* root

in cmclnodelist
John Bigg
Esteemed Contributor
Solution

Re: cmquerycl gives "permission denied" on certain nodes

You say:

We have unique names for each NIC unique to each IP address. For example LAN0=hostname, LAN2=hostname-h1, LAN4=hostname-h2. But that's nothing new.

And although it may be nothing new it is wrong and is the probable cause of your problem. The documentation does state that every NICs primary IP address must resolve to or have the hostname as an alias. Without this there is a chance your commands may fail when a message comes from an unrecognised interface.

The commands send messages over all interfaces (even if they are not included in a configured clusters config file as statioary or heartbeat) and cmclconfd uses the resolution of the IP address to determine which node the IP is from. If it does not match the hostname it will be rejected with permission denied. This is why sometimes it appears to work and other times it does not. It depends which address gets used first.

So, what you need here is

LAN0 hostname
LAN2 hostname-h1, hostname
LAN4 hostname-h2, hostname

This misconfiguration is the biggest cause of permission problems within Serviceguard clusters.
Craig Johnson_1
Regular Advisor

Re: cmquerycl gives "permission denied" on certain nodes

Thanks John, that is very good information. The only thing I'm wondering is if this is a new requirement at SG 19? We've run with this exact setup for many years with earlier versions of SG without issue. We use /etc/hosts and this is a typical entry:

10.20.194.226 a300sua6 a300sua6.company.com a300sua6-g0
10.20.37.226 a300sua6-g1
169.254.1.226 a300sua6-f1
169.254.2.226 a300sua6-f2
10.20.224.226 a300sua6-wc


Also, I noticed that once you update cmclnodelist and populate it with all the new nodenames, and then copy it out to the new nodes, they don't seem to read it right away. There seems to be a delay of anywhere from a few minutes to a few hours, then you see the message in the log "cmproxyd[1234]: Ready". After that they start to work and not reject the query request.

???
Craig Johnson_1
Regular Advisor

Re: cmquerycl gives "permission denied" on certain nodes

Forgot to mention that I got the second of the new nodes to work by updating cmclnodelist and pushing it around the cluster. That was the basis for the last paragraph of my previous post.

Also, I'm concerned a bit by John's response. If you set an alias of the hostname to every interface it's going to confuse name resolution. When I do an nslookup hostname I need it to return the primary hostname and IP, not potentially one of the heartbeats.

So is the solution to rather put every interface that is part of the cluster in the cmclnodelist file? So you have something like:

hostname1 root
hostname1-g1 root
hostname1-g2 root

etc.?
John Bigg
Esteemed Contributor

Re: cmquerycl gives "permission denied" on certain nodes

Firstly, this is not a new requirement for Serviceguard 11.19 but cmproxyd is new. However, I'm pretty sure that cmproxyd is not used for anything other than the cmgetpkgenv command so I would not expect this to affect cmquerycl or other commands at all. I used tusc to trace cmproxyd when doing a cmquerycl and it did not come out of select() to confirm it is not used.

The entry "Ready" logged by cmproxyd is only logged when the daemon is started. You can check this by looking at the associated pid. It is normally started by the RC scripts and remains running all the time. If you see this message it would imply it is being killed or is failing. It will automatically be re-started by cmclconfd if it is not running when required. So, to test you could kill this daemon and run cmquerycl to see if it starts (it won't). I expect it would only be re-started if you ran "cmgetpkgenv "

Also, please note that cmclnodelist not used once the cluster is configured. Once there is a cluster binary file (cmclconfig) cmclnodelist will not be used at all. It can be then deleted and will not be referenced even if you do make changes to it. Try deleting it on a running cluster and you will find commands still work. Once the cluster is configured, access is controlled by knowing the nodes in the cluster and by the access control policy parameter, i.e. USER_NAME, USER_HOST and USER_ROLE configured in the cluster ascii file and hence binary config file cmclconfig. Obviously, if you are adding a new node to a cluster, that new node will require a cmclnodelist until it is part of the cluster.

Even if you have run without the correct configuration for many years without trouble this does not rule this out as the cause. It comes down to timing and maybe something changed the timing so you are more susceptible now. Even unrelated changes in 11.19 may affect timing and make the problem apparent when it could have been hidden before.

I have to question your comment, "If you set an alias of the hostname to every interface it's going to confuse name resolution.". I have to ask why you think this?

When you lookup a hostname, the first match in the hosts file will be taken. So, if you have:

10.20.194.226 a300sua6 a300sua6.company.com a300sua6-g0
10.20.37.226 a300sua6-g1 a300sua6
169.254.1.226 a300sua6-f1 a300sua6
169.254.2.226 a300sua6-f2 a300sua6
10.20.224.226 a300sua6-wc a300sua6

which is what is required by Serviceguard, if you lookup a300sua6 you will just be given the first match, i.e. 10.20.194.226. No confusion there. The heartbeat IPs will never be returned. I have heard many people say this will cause trouble, but never been presented with a scenario where it proves to be a problem when questioned.

You suggest adding all hostnames to cmclnodelist; you could also simply list the IP addresses instead.

But again, this will only work before the cluster is configured at which point cmclnodelist would not be used and an address which did not resolve to the hostname or have this as an alias would be rejected.

Once all your nodes are in the cluster, you must have the hostname on all IP addresses to prevent command failures and what is in cmclcnodelist on any node is irrelevant.

Craig Johnson_1
Regular Advisor

Re: cmquerycl gives "permission denied" on certain nodes

Fair enough, but it sure seems odd that we've never seen this behavior prior to SG 19.

I still say there's a timer or something that causes active SG daemons to reread the local copy of cmclnodelist. It's also possible our daily scripts are restarting something.

Remember these are clones of a pre-existing SG cluster node, so they came with junk on them, including the old cmclnodelist file from the other cluster.

I put the new file out there and wait overnight and voila, cmquerycl works.
John Bigg
Esteemed Contributor

Re: cmquerycl gives "permission denied" on certain nodes

Maybe it is cmclconfd caching the cmclnodelist info. Try killing the cmclconfd -p daemon.
Craig Johnson_1
Regular Advisor

Re: cmquerycl gives "permission denied" on certain nodes

I don't see cmclconfd running on any of the nodes. I don't think that will run until after the cluster has been established.