inconsistent ping and linkloop

Steve Lewis · ‎09-05-2005

I am having trouble pinging between 3 servers on the same segment.
dev can ping cnv and miguel;
cnv can ping dev but not miguel;
miguel can ping dev but not cnv;

All 3 are plugged into the same CISCO 4006 1000-baseSX fibre switch.

Everything is on auto. There are no speed issues between everything that works.

The connectivity issues happen between miguel and various other machines on the same segment. It sees most of them but not all. Everything else sees everything, but some don't see miguel. e.g. cnv

[root]cnv:/> lanadmin -x 1
Speed = 1000 Full-Duplex.
Autonegotiation = On.
[root]cnv:/> linkloop -i 1 0x00306EF5D673
Link connectivity to LAN station: 0x00306EF5D673
error: get_msg2 getmsg failed, errno = 4
-- FAILED
frames sent : 1
frames received correctly : 0
reads that timed out : 1

[root]miguel:/> lanadmin -x 1
Speed = 1000 Full-Duplex.
Autonegotiation = On.
[root]miguel:/> linkloop -i 1 0x00306E047115
Link connectivity to LAN station: 0x00306E047115
error: get_msg2 getmsg failed, errno = 4
-- FAILED
frames sent : 1
frames received correctly : 0
reads that timed out : 1

These are the lanscan outputs:
Miguel:
4/0/2/1/0 0x00306EF5D673 1 UP lan1 snap1 6 ETHER Yes 119

Cnv:
0/1/0/0 0x00306E047115 1 UP lan1 snap1 2 ETHER Yes 119

Errno 4 is this:

#define EINTR 4 /* interrupted system call */

The fact that the connectivity fails at the linkloop level means that IP and routing doesn't come into it.
So everything can see at least one other server on the same segment, but I don't understand the inconsistency. Especially since when I ping the broadcast address from miguel, some of these previously quiet servers suddenly respond, even though I cannot ping or linkloop them directly.
Any ideas?

Steven E. Protter · ‎09-05-2005

1) Physical inspection. Look for damaged fiber, even the slightest damage can cause this to happen.

2) mstm/cstm/xstm make sure the hardware is up. This could be a card getting ready to go south.

3) fcmsutil to see if there are fiber errors on the card.

4) Check software integrity with:
swverify \*

5) Verify the switch works correctly with Cisco utilities(don't ask me what they are use the ? command),

If you have gathered I'm on a fishing expedition you are exactly right. I'd consider calling in hardware for at least a telephone consult

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Bill Hassell · ‎09-05-2005

linkloop is a very low level packet which is not passed through routers. If the servers are truly on the same subnet, I would check your switch. Or simply connect two of the problem machines with a crossover cable to see that linkloop does indeed work.

As far as ping is concerned, be sure to use IP addresses, not hostnames during troubleshooting. You don't want DNS config errors to create problems at this level. Also, check that your routers can be pinged. If not, the network admins may have disabled ping response from routers (a somewhat common practice) and HP-UX has a dead router detection process. To see if this has happened:

ndd -get /dev/ip ip_ire_status | grep -e IRE_GATEWAY -e flag

If you see the gateway has become disabled, you must turn off this feature (or have your network admins turn ping responses back on again). You can check the detection setting using:

ndd -get /dev/ip ip_ire_gw_probe

If the value = 1, then the feature is enabled.

To turn off this feature, edit the file /etc/rc.config.d/nddconf and add the value at the end of the file, using the next unused index:

TRANSPORT_NAME[0]=ip
NDD_NAME[0]=ip_ire_gw_probe
NDD_VALUE[0]=0

Now tell ndd to set the value and check it again:

ndd -c
ndd -get /dev/ip ip_ire_gw_probe

It should now be 0. NOTE: for 11.11, there was an annoying bug where the -c option did not work correctly. Be sure to get the latest patch for ndd.

Once the setting is 0, it wil stay that way through a reboot.

Bill Hassell, sysadmin

Steve Lewis · ‎09-07-2005

The problem has now been solved by our network team.

Apparently the CISCO gigabit fibre switch had something called a 'channel group' configured on the 2 ports which recently got plugged. This meant that they were being treated as an aggregate, which (when connected to 2 different servers) caused a variety of bizarre shut-outs.

When the channel group was removed all communication was restored.

Thanks to Steve and Bill for their time.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

inconsistent ping and linkloop

inconsistent ping and linkloop

Re: inconsistent ping and linkloop

Re: inconsistent ping and linkloop

Re: inconsistent ping and linkloop