You are not logged in. Login Now
 0-24   25-26         
 
Author Message
janc
Why Can't I Connect to Grex? Mark Unseen   May 26 19:10 UTC 2007

I have an older computer and a newer computer both running Linux.  They both
sit on my desk, and connect to the same router.  However, I can't connect to
Grex from the new computer.  I can ping grex just fine, but ssh, http, https,
and telnet seem to fail.  Attempts to connect just hang.  I can login with
ftp, but as soon as I attempt to do anything that sends a bigger hunk of
data, it hangs and eventually the connection times out.  I haven't noticed
any computers other than Grex that the new computer can't connect to.  
The old computer connects to Grex just fine.  The newer computer is a
64-bit system, for what it's worth.

I'm guessing that it has something to do with fragmentation, but even if that
is true, I don't know what to do about it.

Any suggestions?
26 responses total.
cross
response 1 of 26: Mark Unseen   May 26 23:59 UTC 2007

I suspect there might be some subtle bug in Grex's TCP stack that could be
causing these problems, or perhaps its a bug in our firewall configuration.
I'd like to upgrade grex to the latest version of OpenBSD to see if that fixes
these sorts of problems (that we're hearing about more and more frequently,
but not at an alarming rate), as well as some of the mail problems we've been
experiencing.
mcnally
response 2 of 26: Mark Unseen   May 27 00:39 UTC 2007

 Offhand, I wonder whether this behavior has anything to do with the problems
 that some mail servers (e.g. U of M's ITD) have establishing connections to
 Grex's mail server.
janc
response 3 of 26: Mark Unseen   May 27 02:25 UTC 2007

FWIW, I just booted my new system into Windows (which I pretty much use
exclusively for testing if things work in Windows these days), and it
works in windows.

This suggests to me that there ought to be some kind of setting I can
change on the Linux side that would make it work too.

But I don't even know how to begin to look.
mcnally
response 4 of 26: Mark Unseen   May 27 02:50 UTC 2007

 I'd start by googling the exact error message you get, plus "OpenBSD" plus
 (whatever version of Linux you are running at home.)

 If it's a well-known problem my guess is it'll have made it into a support
 forum somewhere.
cross
response 5 of 26: Mark Unseen   May 27 03:57 UTC 2007

Regarding #2; I've begun to strongly suspect that that is, in fact, the case.
vivekm1234
response 6 of 26: Mark Unseen   May 28 11:43 UTC 2007

Could you post the kernel and distro details? "fudge" had a similar problem
some time back (thread #622 oldunix) however no one mentioned a fix so i don't
suppose that thread would be of any value..
janc
response 7 of 26: Mark Unseen   Jun 24 16:07 UTC 2007

I don't get an error message.  The connection for http and ssh just
hangs.  If I try to telnet to Grex, it claims to have connected, but I
never get a login prompt.  If I hit keys they just echo back to me,
returns echoing as ^M and so forth.  I can't find any messages in log
files that seem to relate.  Dunno if any messages appear in Grex's log
files.

I'm running openSUSE 10.2.  The kernel version is 2.6.18.2-34-default. 
The processor is a 64-bit AMD processor, and this is a 64-bit version of
SUSE.
janc
response 8 of 26: Mark Unseen   Jun 24 16:21 UTC 2007

I tried watching the logs on Grex while I tried to connect via ssh.  Nothing
seems to get logged when I connect, but when I disconnect, I /var/log/authlog
says 

    Jun 24 12:13:40 grex sshd[21001]: Connection closed by 66.167.211.109

Since this doesn't seem to be an exceptionally common message in authlog,
I assume it means that the connection was broken while we were still trying
to exchange startup data for the ssh connection.
janc
response 9 of 26: Mark Unseen   Jun 24 16:25 UTC 2007

Here's an attempt at an ftp connection:

   % ftp grex.org
   Connected to grex.org.
   220 grex.cyberspace.org FTP server (Version 6.6/OpenBSD) ready.
   Name (grex.org:jan): janc
   331 Password required for janc.
   Password:
   230 User janc logged in.
   Remote system type is UNIX.
   Using binary mode to transfer files.
   ftp> ls

   421 Service not available, remote server timed out. Connection closed
   ftp>

Looks like some data was exchanged, but not much.
janc
response 10 of 26: Mark Unseen   Jun 24 16:30 UTC 2007

During an ftp login like the one above, Grex's xferlog says:

   Jun 24 12:26:14 grex ftpd[10154]: connection from
   h-66-167-211-109.sfldmidn.dynamic.covad.net Jun 24 12:26:42 grex
   ftpd[25363]: FTP LOGIN FROM h-66-167-211-109.sfldmidn.dynamic.covad.net as
   janc

The password I enter is successfully sent over, and I get confirmation that
it is correct (or not), but can't actually seem to do much of anything.
janc
response 11 of 26: Mark Unseen   Jun 24 16:33 UTC 2007

If I wait on an ssh connection for long enough it eventually times out. 
Grex's authlog file says

  Jun 24 12:23:32 grex sshd[11442]: fatal: Timeout before authentication for
  66.167.211.109 Jun 24 12:23:32 grex sshd[4689]: fatal: Timeout before
  authentication for 66.167.211.109

On my end, I get

  Read from socket failed: Connection reset by peer

I think the Grex end timed out well before my end timed out, but both took
a while.
mcnally
response 12 of 26: Mark Unseen   Jun 24 16:41 UTC 2007

 Though it might be difficult to distinguish what is happening, perhaps a
 trace from a sniffer like ethereal/wireshark might help determine what's
 going on.  The divergence between the unsuccessful connection attempt and
 the successful one will almost certainly occur very early in the 
 conversation.
janc
response 13 of 26: Mark Unseen   Jun 24 16:49 UTC 2007

I grep through all the logs for my IP address and didn't find any other hints
about what might be going on.  There are some successfull http requests
logged, but those are from one of the other computers in my house.


Here's the output from "ssh -v -l janc grex.org":

  % ssh -v -l janc grex.org
  OpenSSH_4.4p1, OpenSSL 0.9.8d 28 Sep 2006
  debug1: Reading configuration data /etc/ssh/ssh_config
  debug1: Applying options for *
  debug1: Connecting to grex.org [216.86.77.194] port 22.
  debug1: Connection established.
  debug1: identity file /home/jan/.ssh/identity type -1
  debug1: identity file /home/jan/.ssh/id_rsa type -1
  debug1: identity file /home/jan/.ssh/id_dsa type -1
  debug1: Remote protocol version 1.99, remote software version OpenSSH_4.2
  debug1: match: OpenSSH_4.2 pat OpenSSH*
  debug1: Enabling compatibility mode for protocol 2.0
  debug1: Local version string SSH-2.0-OpenSSH_4.4
  debug1: SSH2_MSG_KEXINIT sent

And then it hangs.

If I telnet to port 22 (the ssh port) I get a "SSH-1.99-OpenSSH_4.2" message.
I don't know enough about SSH protocol to know how to response, but at least
a bit seems to be working.
janc
response 14 of 26: Mark Unseen   Jun 24 17:22 UTC 2007

I think I have sniffer software on my Linux box, which I used once
before with limited success.  I'll have to give that a try.
cross
response 15 of 26: Mark Unseen   Jun 24 19:09 UTC 2007

Even seeing what packets get sent via, e.g., tcpdump might be very useful.
mcnally
response 16 of 26: Mark Unseen   Jun 24 19:13 UTC 2007

 I've put a packet capture from a machine which connects successfully
 (via ssh, running on Ubuntu Linux 6.10) in my home directory as 
 ~mcnally/ssh_capture.pcap, in case Jan wants something to use for 
 comparison purposes.  Ethereal or wireshark will happily open it
 (or etherpeek or just about any decent modern sniffer.)

 It's a capture of me doing "ssh janc@cyberspace.org" from my machine
 (why janc?  I figured it'd be easier to compare similar connection
 attempts.)  I killed the ssh process once I got a "Password: " prompt,
 under the theory that whatever's happening to people who can't connect
 seems to be happening prior to that point.
janc
response 17 of 26: Mark Unseen   Jun 24 21:59 UTC 2007

Since it's simple, I started with the tcpdump.  Here's a 'tcpdump -v' from
when I did 'ssh -v -l janc grex.org' up until the time when it was firmly hung.
"flounder.home" is my computer.

17:53:50.815541 IP (tos 0x0, ttl  64, id 63187, offset 0, flags [DF], proto:
TCP (6), length: 60) flounder.home.14926 > grex.cyberspace.org.ssh: S, cksum
0x42df (correct), 3372245805:3372245805(0) win 5840 <mss 1460,sackOK,timestamp
150705662 0,nop,wscale 7>

17:53:50.883599 IP (tos 0x0, ttl  56, id 33799, offset 0, flags [none], proto:
TCP (6), length: 64) grex.cyberspace.org.ssh > flounder.home.14926: S, cksum
0xc768 (correct), 3071614353:3071614353(0) ack 3372245806 win 16384 <mss
1452,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 3040392798 150705662>

17:53:50.883669 IP (tos 0x0, ttl  64, id 63188, offset 0, flags [DF], proto:
TCP (6), length: 52) flounder.home.14926 > grex.cyberspace.org.ssh: ., cksum
0x47ed (correct), ack 1 win 46 <nop,nop,timestamp 150705679 3040392798>

17:53:50.958296 IP (tos 0x0, ttl  56, id 35720, offset 0, flags [none], proto:
TCP (6), length: 73) grex.cyberspace.org.ssh > flounder.home.14926: P, cksum
0x07de (correct), 1:22(21) ack 1 win 17280 <nop,nop,timestamp 3040392799
150705679>

17:53:50.958504 IP (tos 0x0, ttl  64, id 63189, offset 0, flags [DF], proto:
TCP (6), length: 52) flounder.home.14926 > grex.cyberspace.org.ssh: ., cksum
0x47c5 (correct), ack 22 win 46 <nop,nop,timestamp 150705697 3040392799>

17:53:50.958714 IP (tos 0x0, ttl  64, id 63190, offset 0, flags [DF], proto:
TCP (6), length: 72) flounder.home.14926 > grex.cyberspace.org.ssh: P, cksum
0x9103 (correct), 1:21(20) ack 22 win 46 <nop,nop,timestamp 150705697
3040392799>

17:53:51.208457 IP (tos 0x0, ttl  56, id 38997, offset 0, flags [none], proto:
TCP (6), length: 52) grex.cyberspace.org.ssh > flounder.home.14926: ., cksum
0x019f (correct), ack 21 win 17280 <nop,nop,timestamp 3040392799 150705697>

17:53:51.208590 IP (tos 0x0, ttl  64, id 63191, offset 0, flags [DF], proto:
TCP (6), length: 804) flounder.home.14926 > grex.cyberspace.org.ssh: P
21:773(752) ack 22 win 46 <nop,nop,timestamp 150705760 3040392799>

17:53:51.477530 IP (tos 0x0, ttl  56, id 39564, offset 0, flags [none], proto:
TCP (6), length: 52) grex.cyberspace.org.ssh > flounder.home.14926: ., cksum
0xfe6e (correct), ack 773 win 17280 <nop,nop,timestamp 3040392800 150705760>
janc
response 18 of 26: Mark Unseen   Jun 24 22:03 UTC 2007

If I telnet to Grex, it connects, but I never get a password prompt.  It
just echo's back what I type.  Each time I type a character, 'tcpdump -v'
shows:

18:01:10.010262 IP (tos 0x10, ttl  64, id 36246, offset 0, flags [DF], proto:
TCP (6), length: 53) flounder.home.14468 > grex.cyberspace.org.telnet: P,
cksum 0x8870 (correct), 160:161(1) ack 67 win 46 <nop,nop,timestamp 150815453
2542462986>
18:01:10.245447 IP (tos 0x0, ttl  56, id 29727, offset 0, flags [DF], proto:
TCP (6), length: 52) grex.cyberspace.org.telnet > flounder.home.14468: .,
cksum 0x9bfd (correct), ack 161 win 17280 <nop,nop,timestamp 2542463137
150815453>

which looks like the character being sent and echoed back.
arthurp
response 19 of 26: Mark Unseen   Jun 24 22:50 UTC 2007

I have this problem sometimes from one computer running fc5+. 
Everything else in the house works.  I did a trace back a while and it
looked similar to Jan's.  Then I quit having the problem so I didn't
pursue it further.  That was quite some time ago.  Several months.  The
fix coincided with some staff activity, so I figured it was fixed.

For me it only effected SSL connections, but then I never connect to
grex without SSL even for backtalk.  I doubt I even checked to see if
insecure protocols worked.  At the time I decided that it was something
wonky with SSL/DF/grex as I don't have any other troubles at this end.
janc
response 20 of 26: Mark Unseen   Jun 25 15:24 UTC 2007

OK, I have a clue:

I do have wireshark on my computer.  I downloaded Mike's trace of his ssh
connection, and captured one myself.  I hardly needed to do the comparison
because there was a fairly obvious problem in mine.

After the DNS lookup of Grex, we see the following perfectly fine packets:

1  flounder -> grex     SYN
2  grex -> flounder     SYN
3  flounder -> grex     ACK
4  grex-> flounder      Server Protocol: SSH 1.99-OpenSSH_4.2
5  flounder -> grex     ACK
6  flounder -> grex     Client Protocol: SSH-2.0-OpenSSH_4.4

Then I get a weirdness.  Here's the full ascii dump about the seventh packet
from wireshark:

No.     Time        Source                Destination           Protocol Info
      7 0.402649    216.86.77.194         192.168.2.4           TCP      [TCP
      Previous segment lost] ssh > 28702 [ACK] Seq=726 Ack=21 Win=17280 Len=0
      TSV=1661941788 TSER=150966646

Frame 7 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: Cisco-Li_a5:65:48 (00:16:b6:a5:65:48), Dst: Micro-St_de:f5:34
(00:13:d3:de:f5:34) Internet Protocol, Src: 216.86.77.194 (216.86.77.194), Dst:
192.168.2.4 (192.168.2.4) Transmission Control Protocol, Src Port: ssh (22),
Dst Port: 28702 (28702), Seq: 726, Ack: 21, Len: 0

I'm not totally sure what this is, but the "TCP Previous segment lost" bit
on a packet sent from grex to flounder doesn't sound good to me.

In Mike's dump, at this point there was a "Server: Key Exchange Init" packet
sent from grex to flounder (number 19 in his dump).

Things keep going beyond this point though

8  flounder -> grex     Client Key Exchange Init
9  grex -> flounder     ACK

And then we hang.  In McNalley's dump, there was an ACK sent back to Grex
after the server key exchange init packet, but that never happened with
my computer because the server key init packet got mangled.  So the connection
hangs with my computer waiting for the server key init, and grex hanging
waiting for the ACK on the server key init packet it sent.

Looking through the packet sizes:

        Number   Size (me)     Size (mcnalley)
          1         74              74
          2         78              78
          3         66              66
          4         87              87
          5         66              66
          6         86             106
          7         66             770       <----

So it really looks like the first time Grex tries to send a larger packet,
we lose most of it.

I'm seeing many similar "Previous segment lost" packets when I try to make
other kinds of connections to Grex.  I'm pretty sure these things are the
problem, but I haven't a clue what causes them.
janc
response 21 of 26: Mark Unseen   Jun 25 15:48 UTC 2007

I don't know a lot about fragmentation.  I'm a bit surprised that a 770 byte
packet got fragmented at all.  I think mtu's are usually larger than that.
But something must be "unusual" about my computer, or more computers would
have trouble connecting to Grex.  Is the MTU discovery working right?

Don't really know either whether the packet was fragmented by Grex's computer
or whether it was sent with a "may fragment" flag and fragmented by something
further down the line.
janc
response 22 of 26: Mark Unseen   Jun 27 14:08 UTC 2007

So, I did "ifconfig" on both Grex and my computer, and both have MTU set at
1500.  So why is a packet of size 770 being fragmented?

I did a 'tracepath grex.org' on my computer (this does path MTU discovery)
and got:

 1:  flounder.home (192.168.2.4)                            0.205ms pmtu 1492
 1:  router (192.168.2.1)                                 asymm 106   0.544ms
 2:  h-72-245-37-1.sfldmidn.dynamic.covad.net (72.245.37.1) asymm  1  95.463ms
 3:  192.168.17.101 (192.168.17.101)                      asymm  2  87.163ms
 4:  ge-6-12-133.car2.Detroit1.Level3.net (166.90.203.1)  asymm  3  83.990ms
 5:  ae-11-11.car1.Detroit1.Level3.net (4.69.133.245)     asymm  4  80.903ms
 6:  ae-8-8.ebr2.Chicago1.Level3.net (4.69.133.242)       asymm  5  90.831ms
 7:  ae-2-54.bbr2.Chicago1.Level3.net (4.68.101.97)        83.206ms
 8:  so-0-1-0.mp2.Detroit1.Level3.net (64.159.0.198)      asymm 10  86.625ms
 9:  so-10-0.hsa1.Detroit1.Level3.net (4.68.115.2)        asymm  8  83.993ms
10:  unknown.Level3.net (63.209.134.18)                    81.766ms
11:  tnmi-170-200-54-69.ip.telnetww.com (69.54.200.170)   asymm 10  80.879ms
12:  no reply
13:  no reply
14:  ypsi-sfld.provide.net (216.86.64.2)                  asymm 13  87.456ms
15:  grex.cyberspace.org (216.86.77.194)                  asymm 14 150.206ms
reached
     Resume: pmtu 1492 hops 15 back 14

This shows a path MTU of 1492, which is pretty much what you'd expect, and
which doesn't explain anything.  Maybe a tracepath to me from Grex would be
more informative, but Grex doesn't have tracepath on it and I'm not convinced
that it is worth the trouble to build.
mcnally
response 23 of 26: Mark Unseen   Jun 27 17:18 UTC 2007

This is way out in left field, but what kind of device are you using for NAT
on your end, Jan?  And how much imposition would it be to hang your machine
directly on your incoming connection for a moment and give ssh a try, just to
eliminate your NAT as a possible cause of the problem?
cross
response 24 of 26: Mark Unseen   Jun 27 17:30 UTC 2007

Joe Gelinas observed some similar thing with Linux at umich talking to Grex;
something about TCP window optimizations in the Linux 2.6 kernel or something.
I wonder if this is related....
 0-24   25-26         
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss