|
|
| Author |
Message |
janc
|
|
Why Can't I Connect to Grex?
|
May 26 19:10 UTC 2007 |
I have an older computer and a newer computer both running Linux. They both
sit on my desk, and connect to the same router. However, I can't connect to
Grex from the new computer. I can ping grex just fine, but ssh, http, https,
and telnet seem to fail. Attempts to connect just hang. I can login with
ftp, but as soon as I attempt to do anything that sends a bigger hunk of
data, it hangs and eventually the connection times out. I haven't noticed
any computers other than Grex that the new computer can't connect to.
The old computer connects to Grex just fine. The newer computer is a
64-bit system, for what it's worth.
I'm guessing that it has something to do with fragmentation, but even if that
is true, I don't know what to do about it.
Any suggestions?
|
| 26 responses total. |
cross
|
|
response 1 of 26:
|
May 26 23:59 UTC 2007 |
I suspect there might be some subtle bug in Grex's TCP stack that could be
causing these problems, or perhaps its a bug in our firewall configuration.
I'd like to upgrade grex to the latest version of OpenBSD to see if that fixes
these sorts of problems (that we're hearing about more and more frequently,
but not at an alarming rate), as well as some of the mail problems we've been
experiencing.
|
mcnally
|
|
response 2 of 26:
|
May 27 00:39 UTC 2007 |
Offhand, I wonder whether this behavior has anything to do with the problems
that some mail servers (e.g. U of M's ITD) have establishing connections to
Grex's mail server.
|
janc
|
|
response 3 of 26:
|
May 27 02:25 UTC 2007 |
FWIW, I just booted my new system into Windows (which I pretty much use
exclusively for testing if things work in Windows these days), and it
works in windows.
This suggests to me that there ought to be some kind of setting I can
change on the Linux side that would make it work too.
But I don't even know how to begin to look.
|
mcnally
|
|
response 4 of 26:
|
May 27 02:50 UTC 2007 |
I'd start by googling the exact error message you get, plus "OpenBSD" plus
(whatever version of Linux you are running at home.)
If it's a well-known problem my guess is it'll have made it into a support
forum somewhere.
|
cross
|
|
response 5 of 26:
|
May 27 03:57 UTC 2007 |
Regarding #2; I've begun to strongly suspect that that is, in fact, the case.
|
vivekm1234
|
|
response 6 of 26:
|
May 28 11:43 UTC 2007 |
Could you post the kernel and distro details? "fudge" had a similar problem
some time back (thread #622 oldunix) however no one mentioned a fix so i don't
suppose that thread would be of any value..
|
janc
|
|
response 7 of 26:
|
Jun 24 16:07 UTC 2007 |
I don't get an error message. The connection for http and ssh just
hangs. If I try to telnet to Grex, it claims to have connected, but I
never get a login prompt. If I hit keys they just echo back to me,
returns echoing as ^M and so forth. I can't find any messages in log
files that seem to relate. Dunno if any messages appear in Grex's log
files.
I'm running openSUSE 10.2. The kernel version is 2.6.18.2-34-default.
The processor is a 64-bit AMD processor, and this is a 64-bit version of
SUSE.
|
janc
|
|
response 8 of 26:
|
Jun 24 16:21 UTC 2007 |
I tried watching the logs on Grex while I tried to connect via ssh. Nothing
seems to get logged when I connect, but when I disconnect, I /var/log/authlog
says
Jun 24 12:13:40 grex sshd[21001]: Connection closed by 66.167.211.109
Since this doesn't seem to be an exceptionally common message in authlog,
I assume it means that the connection was broken while we were still trying
to exchange startup data for the ssh connection.
|
janc
|
|
response 9 of 26:
|
Jun 24 16:25 UTC 2007 |
Here's an attempt at an ftp connection:
% ftp grex.org
Connected to grex.org.
220 grex.cyberspace.org FTP server (Version 6.6/OpenBSD) ready.
Name (grex.org:jan): janc
331 Password required for janc.
Password:
230 User janc logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> ls
421 Service not available, remote server timed out. Connection closed
ftp>
Looks like some data was exchanged, but not much.
|
janc
|
|
response 10 of 26:
|
Jun 24 16:30 UTC 2007 |
During an ftp login like the one above, Grex's xferlog says:
Jun 24 12:26:14 grex ftpd[10154]: connection from
h-66-167-211-109.sfldmidn.dynamic.covad.net Jun 24 12:26:42 grex
ftpd[25363]: FTP LOGIN FROM h-66-167-211-109.sfldmidn.dynamic.covad.net as
janc
The password I enter is successfully sent over, and I get confirmation that
it is correct (or not), but can't actually seem to do much of anything.
|
janc
|
|
response 11 of 26:
|
Jun 24 16:33 UTC 2007 |
If I wait on an ssh connection for long enough it eventually times out.
Grex's authlog file says
Jun 24 12:23:32 grex sshd[11442]: fatal: Timeout before authentication for
66.167.211.109 Jun 24 12:23:32 grex sshd[4689]: fatal: Timeout before
authentication for 66.167.211.109
On my end, I get
Read from socket failed: Connection reset by peer
I think the Grex end timed out well before my end timed out, but both took
a while.
|
mcnally
|
|
response 12 of 26:
|
Jun 24 16:41 UTC 2007 |
Though it might be difficult to distinguish what is happening, perhaps a
trace from a sniffer like ethereal/wireshark might help determine what's
going on. The divergence between the unsuccessful connection attempt and
the successful one will almost certainly occur very early in the
conversation.
|
janc
|
|
response 13 of 26:
|
Jun 24 16:49 UTC 2007 |
I grep through all the logs for my IP address and didn't find any other hints
about what might be going on. There are some successfull http requests
logged, but those are from one of the other computers in my house.
Here's the output from "ssh -v -l janc grex.org":
% ssh -v -l janc grex.org
OpenSSH_4.4p1, OpenSSL 0.9.8d 28 Sep 2006
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to grex.org [216.86.77.194] port 22.
debug1: Connection established.
debug1: identity file /home/jan/.ssh/identity type -1
debug1: identity file /home/jan/.ssh/id_rsa type -1
debug1: identity file /home/jan/.ssh/id_dsa type -1
debug1: Remote protocol version 1.99, remote software version OpenSSH_4.2
debug1: match: OpenSSH_4.2 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_4.4
debug1: SSH2_MSG_KEXINIT sent
And then it hangs.
If I telnet to port 22 (the ssh port) I get a "SSH-1.99-OpenSSH_4.2" message.
I don't know enough about SSH protocol to know how to response, but at least
a bit seems to be working.
|
janc
|
|
response 14 of 26:
|
Jun 24 17:22 UTC 2007 |
I think I have sniffer software on my Linux box, which I used once
before with limited success. I'll have to give that a try.
|
cross
|
|
response 15 of 26:
|
Jun 24 19:09 UTC 2007 |
Even seeing what packets get sent via, e.g., tcpdump might be very useful.
|
mcnally
|
|
response 16 of 26:
|
Jun 24 19:13 UTC 2007 |
I've put a packet capture from a machine which connects successfully
(via ssh, running on Ubuntu Linux 6.10) in my home directory as
~mcnally/ssh_capture.pcap, in case Jan wants something to use for
comparison purposes. Ethereal or wireshark will happily open it
(or etherpeek or just about any decent modern sniffer.)
It's a capture of me doing "ssh janc@cyberspace.org" from my machine
(why janc? I figured it'd be easier to compare similar connection
attempts.) I killed the ssh process once I got a "Password: " prompt,
under the theory that whatever's happening to people who can't connect
seems to be happening prior to that point.
|
janc
|
|
response 17 of 26:
|
Jun 24 21:59 UTC 2007 |
Since it's simple, I started with the tcpdump. Here's a 'tcpdump -v' from
when I did 'ssh -v -l janc grex.org' up until the time when it was firmly hung.
"flounder.home" is my computer.
17:53:50.815541 IP (tos 0x0, ttl 64, id 63187, offset 0, flags [DF], proto:
TCP (6), length: 60) flounder.home.14926 > grex.cyberspace.org.ssh: S, cksum
0x42df (correct), 3372245805:3372245805(0) win 5840 <mss 1460,sackOK,timestamp
150705662 0,nop,wscale 7>
17:53:50.883599 IP (tos 0x0, ttl 56, id 33799, offset 0, flags [none], proto:
TCP (6), length: 64) grex.cyberspace.org.ssh > flounder.home.14926: S, cksum
0xc768 (correct), 3071614353:3071614353(0) ack 3372245806 win 16384 <mss
1452,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 3040392798 150705662>
17:53:50.883669 IP (tos 0x0, ttl 64, id 63188, offset 0, flags [DF], proto:
TCP (6), length: 52) flounder.home.14926 > grex.cyberspace.org.ssh: ., cksum
0x47ed (correct), ack 1 win 46 <nop,nop,timestamp 150705679 3040392798>
17:53:50.958296 IP (tos 0x0, ttl 56, id 35720, offset 0, flags [none], proto:
TCP (6), length: 73) grex.cyberspace.org.ssh > flounder.home.14926: P, cksum
0x07de (correct), 1:22(21) ack 1 win 17280 <nop,nop,timestamp 3040392799
150705679>
17:53:50.958504 IP (tos 0x0, ttl 64, id 63189, offset 0, flags [DF], proto:
TCP (6), length: 52) flounder.home.14926 > grex.cyberspace.org.ssh: ., cksum
0x47c5 (correct), ack 22 win 46 <nop,nop,timestamp 150705697 3040392799>
17:53:50.958714 IP (tos 0x0, ttl 64, id 63190, offset 0, flags [DF], proto:
TCP (6), length: 72) flounder.home.14926 > grex.cyberspace.org.ssh: P, cksum
0x9103 (correct), 1:21(20) ack 22 win 46 <nop,nop,timestamp 150705697
3040392799>
17:53:51.208457 IP (tos 0x0, ttl 56, id 38997, offset 0, flags [none], proto:
TCP (6), length: 52) grex.cyberspace.org.ssh > flounder.home.14926: ., cksum
0x019f (correct), ack 21 win 17280 <nop,nop,timestamp 3040392799 150705697>
17:53:51.208590 IP (tos 0x0, ttl 64, id 63191, offset 0, flags [DF], proto:
TCP (6), length: 804) flounder.home.14926 > grex.cyberspace.org.ssh: P
21:773(752) ack 22 win 46 <nop,nop,timestamp 150705760 3040392799>
17:53:51.477530 IP (tos 0x0, ttl 56, id 39564, offset 0, flags [none], proto:
TCP (6), length: 52) grex.cyberspace.org.ssh > flounder.home.14926: ., cksum
0xfe6e (correct), ack 773 win 17280 <nop,nop,timestamp 3040392800 150705760>
|
janc
|
|
response 18 of 26:
|
Jun 24 22:03 UTC 2007 |
If I telnet to Grex, it connects, but I never get a password prompt. It
just echo's back what I type. Each time I type a character, 'tcpdump -v'
shows:
18:01:10.010262 IP (tos 0x10, ttl 64, id 36246, offset 0, flags [DF], proto:
TCP (6), length: 53) flounder.home.14468 > grex.cyberspace.org.telnet: P,
cksum 0x8870 (correct), 160:161(1) ack 67 win 46 <nop,nop,timestamp 150815453
2542462986>
18:01:10.245447 IP (tos 0x0, ttl 56, id 29727, offset 0, flags [DF], proto:
TCP (6), length: 52) grex.cyberspace.org.telnet > flounder.home.14468: .,
cksum 0x9bfd (correct), ack 161 win 17280 <nop,nop,timestamp 2542463137
150815453>
which looks like the character being sent and echoed back.
|
arthurp
|
|
response 19 of 26:
|
Jun 24 22:50 UTC 2007 |
I have this problem sometimes from one computer running fc5+.
Everything else in the house works. I did a trace back a while and it
looked similar to Jan's. Then I quit having the problem so I didn't
pursue it further. That was quite some time ago. Several months. The
fix coincided with some staff activity, so I figured it was fixed.
For me it only effected SSL connections, but then I never connect to
grex without SSL even for backtalk. I doubt I even checked to see if
insecure protocols worked. At the time I decided that it was something
wonky with SSL/DF/grex as I don't have any other troubles at this end.
|
janc
|
|
response 20 of 26:
|
Jun 25 15:24 UTC 2007 |
OK, I have a clue:
I do have wireshark on my computer. I downloaded Mike's trace of his ssh
connection, and captured one myself. I hardly needed to do the comparison
because there was a fairly obvious problem in mine.
After the DNS lookup of Grex, we see the following perfectly fine packets:
1 flounder -> grex SYN
2 grex -> flounder SYN
3 flounder -> grex ACK
4 grex-> flounder Server Protocol: SSH 1.99-OpenSSH_4.2
5 flounder -> grex ACK
6 flounder -> grex Client Protocol: SSH-2.0-OpenSSH_4.4
Then I get a weirdness. Here's the full ascii dump about the seventh packet
from wireshark:
No. Time Source Destination Protocol Info
7 0.402649 216.86.77.194 192.168.2.4 TCP [TCP
Previous segment lost] ssh > 28702 [ACK] Seq=726 Ack=21 Win=17280 Len=0
TSV=1661941788 TSER=150966646
Frame 7 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: Cisco-Li_a5:65:48 (00:16:b6:a5:65:48), Dst: Micro-St_de:f5:34
(00:13:d3:de:f5:34) Internet Protocol, Src: 216.86.77.194 (216.86.77.194), Dst:
192.168.2.4 (192.168.2.4) Transmission Control Protocol, Src Port: ssh (22),
Dst Port: 28702 (28702), Seq: 726, Ack: 21, Len: 0
I'm not totally sure what this is, but the "TCP Previous segment lost" bit
on a packet sent from grex to flounder doesn't sound good to me.
In Mike's dump, at this point there was a "Server: Key Exchange Init" packet
sent from grex to flounder (number 19 in his dump).
Things keep going beyond this point though
8 flounder -> grex Client Key Exchange Init
9 grex -> flounder ACK
And then we hang. In McNalley's dump, there was an ACK sent back to Grex
after the server key exchange init packet, but that never happened with
my computer because the server key init packet got mangled. So the connection
hangs with my computer waiting for the server key init, and grex hanging
waiting for the ACK on the server key init packet it sent.
Looking through the packet sizes:
Number Size (me) Size (mcnalley)
1 74 74
2 78 78
3 66 66
4 87 87
5 66 66
6 86 106
7 66 770 <----
So it really looks like the first time Grex tries to send a larger packet,
we lose most of it.
I'm seeing many similar "Previous segment lost" packets when I try to make
other kinds of connections to Grex. I'm pretty sure these things are the
problem, but I haven't a clue what causes them.
|
janc
|
|
response 21 of 26:
|
Jun 25 15:48 UTC 2007 |
I don't know a lot about fragmentation. I'm a bit surprised that a 770 byte
packet got fragmented at all. I think mtu's are usually larger than that.
But something must be "unusual" about my computer, or more computers would
have trouble connecting to Grex. Is the MTU discovery working right?
Don't really know either whether the packet was fragmented by Grex's computer
or whether it was sent with a "may fragment" flag and fragmented by something
further down the line.
|
janc
|
|
response 22 of 26:
|
Jun 27 14:08 UTC 2007 |
So, I did "ifconfig" on both Grex and my computer, and both have MTU set at
1500. So why is a packet of size 770 being fragmented?
I did a 'tracepath grex.org' on my computer (this does path MTU discovery)
and got:
1: flounder.home (192.168.2.4) 0.205ms pmtu 1492
1: router (192.168.2.1) asymm 106 0.544ms
2: h-72-245-37-1.sfldmidn.dynamic.covad.net (72.245.37.1) asymm 1 95.463ms
3: 192.168.17.101 (192.168.17.101) asymm 2 87.163ms
4: ge-6-12-133.car2.Detroit1.Level3.net (166.90.203.1) asymm 3 83.990ms
5: ae-11-11.car1.Detroit1.Level3.net (4.69.133.245) asymm 4 80.903ms
6: ae-8-8.ebr2.Chicago1.Level3.net (4.69.133.242) asymm 5 90.831ms
7: ae-2-54.bbr2.Chicago1.Level3.net (4.68.101.97) 83.206ms
8: so-0-1-0.mp2.Detroit1.Level3.net (64.159.0.198) asymm 10 86.625ms
9: so-10-0.hsa1.Detroit1.Level3.net (4.68.115.2) asymm 8 83.993ms
10: unknown.Level3.net (63.209.134.18) 81.766ms
11: tnmi-170-200-54-69.ip.telnetww.com (69.54.200.170) asymm 10 80.879ms
12: no reply
13: no reply
14: ypsi-sfld.provide.net (216.86.64.2) asymm 13 87.456ms
15: grex.cyberspace.org (216.86.77.194) asymm 14 150.206ms
reached
Resume: pmtu 1492 hops 15 back 14
This shows a path MTU of 1492, which is pretty much what you'd expect, and
which doesn't explain anything. Maybe a tracepath to me from Grex would be
more informative, but Grex doesn't have tracepath on it and I'm not convinced
that it is worth the trouble to build.
|
mcnally
|
|
response 23 of 26:
|
Jun 27 17:18 UTC 2007 |
This is way out in left field, but what kind of device are you using for NAT
on your end, Jan? And how much imposition would it be to hang your machine
directly on your incoming connection for a moment and give ssh a try, just to
eliminate your NAT as a possible cause of the problem?
|
cross
|
|
response 24 of 26:
|
Jun 27 17:30 UTC 2007 |
Joe Gelinas observed some similar thing with Linux at umich talking to Grex;
something about TCP window optimizations in the Linux 2.6 kernel or something.
I wonder if this is related....
|