syslog error

Kranti Mahmud · ‎08-08-2010

Hi All,

Our main billing & rating server (model:rp 8440, OS:11.11) syslog showing some error which is attached here. Both the servers are active-active cluster mode, though the cluster status is OK as cmviewcl -v shows perfect output. Moreover, no operational impact is there right now.

Can anyone explain how to resolve it or whats the reason of such error message?

Rgds-Kranti

Dont look BACK as U will miss something INFRONT!

Matti_Kurkela · ‎08-08-2010

The error message on both nodes is:
cmcld: Unable to connect to IP address 192.168.99.X and port 5300
cmcld: The error status reported is: Can't assign requested address

So it's some sort of network connectivity problem.

Your nslookups indicate that the 192.168.99.X addresses are named _hb. In Serviceguard context, usually hb = HeartBeat.

So, this might mean there is some sort of problem in your heartbeat network connection. But since your cluster is working normally, there must be some alternate network path that is allowing the heartbeats to pass.

(You apparently have a well-built cluster with redundant heartbeat connections: your cluster is currently handling a fault in one heartbeat network path with no loss of production functionality at all!)

I guess the alternate heartbeat path is probably your production network. If that's true, the impact would become apparent if there was a problem in the production network. In that case, the cluster would lose its remaining heartbeat path and the nodes would go to split-brain prevention mode (dire emergency): both nodes would try to access the cluster lock, and the first node to get it would remain functional and start running all the packages. The other node would execute a TOC, and could not rejoin the cluster until the network problem was fixed.

When all the heartbeat paths are OK, a failure in the production network should not cause any TOCs: the cluster would simply move the packages to the node that has a working connection to the production network, if possible.

You should now find out what's wrong with your heartbeat connections. The tmbill node should have one network interface with the IP address 192.168.99.1, and the tmrate node should have one network interface with the IP address 192.168.99.2. These network interfaces should be completely separate from your production network connections.

If there is no network interface with those addresses assigned, someone has probably forgotten to add them to the /etc/rc.config.d/netconf file when setting up the cluster.

Look for network interfaces with no address assigned, and check your network cabling documentation to find out which interface it *should* be on each node. If the cabling documentation indicates there are two interfaces connected together with a crossover cable, those are probably the interfaces you're looking for.

If the heartbeat interfaces' IP addresses have been assigned properly, verify the link state ("lanadmin -g mibstats " and check the Operative state). If there is no link, there is probably a cable failure (or a switch failure, if a switch is used instead of a crossover cable).

MK

MK

Kranti Mahmud · ‎08-09-2010

Hi MK,

Thanks for your meaningful elaboration. I also find some error which is attached here. PLease check and suggest me. Moreover, from the error messages I'm suspecting there might be a cable dis-connectivity of bad cable.

Please check and correct me if I'm wrong.

Rgds-Kranti

Dont look BACK as U will miss something INFRONT!

DeafFrog · ‎08-10-2010

Hi Kranti ,

DO a linkloop test , note linkloop works between nic configured with ip in same subnet.

Regards,

FrogIsDeaf

Matti_Kurkela · ‎08-10-2010

The cmscancl errors only say that you cannot remsh to the other node. If these systems have been secured, this may be intentional, so it tells us nothing useful about the network problem.

On the other hand, the netfmt listings clearly indicate that both nodes have detected a link loss at about the same times. This definitely looks like a bad connection, and it seems to have been failing intermittently for quite a long time.

If this is simply a crossover cable, it might be an incompletely plugged-in or damaged connector, or a damaged cable.

If the cables from both nodes are connected to a network switch, then the fact that both nodes are detecting the loss of link at about the same times, the switch might be failing: it might lose power or crash repeatedly.

The hardware path for the problematic interface is 0/0/0/1/0 on both nodes. According to the I/O path tables in the User Service Guide for rp8440, that's the SYS LAN connector on Core I/O 0. It's probably named "lan0", but you can verify it by using the "lanscan" command to see the list of interface names vs. hardware paths.

MK

MK

Basheer_2 · ‎08-10-2010

Hello Mahmud,

These are not your network servers.

since they start with 192.

these are most of the time configured for heartbeats. check the local cables between the nodes.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

syslog error

syslog error

Re: syslog error

Re: syslog error

Re: syslog error

Re: syslog error

Re: syslog error