1828343 Members
3124 Online
109976 Solutions
New Discussion

Re: ssh/connection issue

 
J. Maestre
Honored Contributor

ssh/connection issue

Hi everyone,

We have a 6 node SUSE9 cluster we virtualized on VMWare ESX a while ago, and while the cluster itself seems to work fine we are experiencing a weird issue with external connections.

Thing is we can ssh from one node of the cluster to another one just fine, but if we ssh from outside the vlan (through the same virtual interface) as soon as we run any command that returns more than just a few lines of text the ssh session freezes. A simple ls on a directory with more than, say, 10 files, and it gets stuck (and same thing happens with sftp).

I'm starting to think that the problem might lie somewhere else and not on the SUSE machines, but then again the VCenter box is in the same vlan and we can connect to it just fine.

Any ideas? I can't think of any SUSE misconfiguration I might have missed that would cause the connection to fail that way when connected from outside the vlan but not from inside (firewalls have been disabled, by the way), hence why I'm asking. It'd be nice if I could at the very least discard the OS side of things.

Thanks in advance.
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: ssh/connection issue

Shalom,

Please re-run your test with ssh -vvv

Then post the errors here.

My crystal ball is broken, and without the actual error message I can't do much. -vvv works for sftp and scp as well.

This is very verbose.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ivan Ferreira
Honored Contributor

Re: ssh/connection issue

>>> It'd be nice if I could at the very least discard the OS side of things.

If from another host on the same vlan is working, then probably the problem is not related to the O.S.

By the way, ¿can you describe your "vlans"?

I would run tcpdump to capture the packets on the server and the client, for example:

server> tcpdump -i -n 'port 22 and host '

client> tcpdump -i -n 'port 22 and host '

Then start a ssh session. This can give us information about if the packets is leaving the server and not reaching the client, or if the packet never leaves the server.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
J. Maestre
Honored Contributor

Re: ssh/connection issue

Sorry about the lack of information. Right now I've to connect through a vpn with a not very versatile windows box, and getting useful data is a bit awkward.

I've attached the output from tcpdump on the server (10.10.1.5 being the client). It's exactly the part when I run a ls (only had plink available, and the -v didn't show anything).

I'll try to get more info as soon as I can.
Ivan Ferreira
Honored Contributor

Re: ssh/connection issue

In your output for example, you can see that the server is sending ack to seq 156 and the client keeps sending ack to seq 321, but the server ignores it.

You can see the message "IP bad-len 0" in your output.

Searching about that, you can find some information about TCP Segmentation Offload.

http://www.network-builders.com/tcp-segmentation-offload-t54157.html
http://seer.entsupport.symantec.com/docs/294308.htm
https://bugzilla.redhat.com/show_bug.cgi?id=519535

I would try disabling TSO.

Cheers.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Ivan Ferreira
Honored Contributor

Re: ssh/connection issue

Also, plese provide the output of netstat -ni, check for errors/collisions in your ESX server.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
J. Maestre
Honored Contributor

Re: ssh/connection issue

On a quick side note, I'm not sure how to make sense out of this but it works fine when adding another virtual interface and bonding both together.
Tim Nelson
Honored Contributor

Re: ssh/connection issue

If your ESX server interfaces are bonded across multiple switches you may need to investigate.

not all switches are capable of channel group(or etherchannelling) across switches.

bonding defaults to round-robin and across multiple switchs may also be bad. try bonding mode=1 ( e.g. failover )

try to remove the bonding and configure just one interface, if your problem goes away then it is the bonding config.

I also heard that beaconing was bad.. research ?