|
Grex > Helpers > #134: Grex System Problems - Summer 2004 |  |
|
| Author |
Message |
| 25 new of 286 responses total. |
gelinas
|
|
response 225 of 286:
|
Sep 14 03:59 UTC 2004 |
Based on comments above, I took a look at /etc/resolv.conf, commented
out the loop-back address and restarted named. Now to see if it actually
makes a difference.
|
marcvh
|
|
response 226 of 286:
|
Sep 14 05:04 UTC 2004 |
So far it does not seem to make a difference.
|
mcnally
|
|
response 227 of 286:
|
Sep 14 06:50 UTC 2004 |
re #222:
> Sounds suspiciously like it's waiting for a dead nameserver to time out.
A not unreasonable guess but that doesn't seem to be the problem.
Assuming a reasonable caching policy on Grex it wouldn't have fit
the pattern anyway, as it takes a long time to get the login prompt
every time one tries, not just the first time (after which, presumably
the value would be stored in the local cache.)
re #225:
> Based on comments above, I took a look at /etc/resolv.conf, commented
> out the loop-back address and restarted named. Now to see if it actually
> makes a difference.
Doesn't seem to have.. (And I doubt that restarting named was necessary.)
|
dpc
|
|
response 228 of 286:
|
Sep 14 14:05 UTC 2004 |
It took a minute and 19 seconds to get a prompt just now. I really
hope this situation can be fixed soon.
|
aruba
|
|
response 229 of 286:
|
Sep 14 14:29 UTC 2004 |
Still a long pause for me.
|
gregb
|
|
response 230 of 286:
|
Sep 14 17:07 UTC 2004 |
This is one time I'm glad I use Backtalk.
|
marcvh
|
|
response 231 of 286:
|
Sep 14 17:13 UTC 2004 |
Re #227, caching failed lookups is not universal. Some systems only
cache success, not failure.
|
jor
|
|
response 232 of 286:
|
Sep 14 23:19 UTC 2004 |
I just measured the pause at about 78 seconds.
The temperature is 74. Winds are calm and
my disposition is sunny.
|
tod
|
|
response 233 of 286:
|
Sep 14 23:30 UTC 2004 |
Partly cloudy
hayz buddhist messiah thetan
|
mcnally
|
|
response 234 of 286:
|
Sep 15 07:55 UTC 2004 |
re #231: I was assuming a resolv.conf with more than one nameserver
listed (which I think is how we started) and also assuming only one
broken nameserver such that the resolver query would block until the
first (broken) nameserver timed out and then succeed upon querying
the second server.
I suppose it still could be a nameserver issue but it seems to be
something else. I wonder what's special about 78-79 seconds? Several
people have measured the same delay time at this point (and one or two
report longer times.)
|
jor
|
|
response 235 of 286:
|
Sep 15 12:36 UTC 2004 |
But I get the same delay via dialin.
|
davel
|
|
response 236 of 286:
|
Sep 15 13:16 UTC 2004 |
Dialin is really a telnet connection, from the termserver.
|
naftee
|
|
response 237 of 286:
|
Sep 15 16:39 UTC 2004 |
i just aspirated on some apple juice :(
|
mcnally
|
|
response 238 of 286:
|
Sep 15 23:41 UTC 2004 |
The mysterious 79 second delay also seems to affect outgoing mail.
For instance, if I start a clock ticking and then send mail to
my Gmail account it hangs for about 79 seconds (there's that number
again!) before connecting:
grex% ./tick.csh; sleep 2; echo "starting mail test now"; date \
| Mail -v -s "test mail" mmcnally@gmailDOTcom
[1] 17068
Wed Sep 15 19:33:02 EDT 2004
starting mail test now
Wed Sep 15 19:33:04 EDT 2004
mmcnally@gmail.com... Connecting to gsmtp171.google.com. (smtp)...
Wed Sep 15 19:33:10 EDT 2004
Wed Sep 15 19:33:13 EDT 2004
Wed Sep 15 19:33:16 EDT 2004
Wed Sep 15 19:33:18 EDT 2004
Wed Sep 15 19:33:20 EDT 2004
Wed Sep 15 19:33:21 EDT 2004
Wed Sep 15 19:33:22 EDT 2004
Wed Sep 15 19:33:24 EDT 2004
Wed Sep 15 19:33:25 EDT 2004
Wed Sep 15 19:33:26 EDT 2004
Wed Sep 15 19:33:28 EDT 2004
Wed Sep 15 19:33:29 EDT 2004
Wed Sep 15 19:33:30 EDT 2004
Wed Sep 15 19:33:32 EDT 2004
Wed Sep 15 19:33:33 EDT 2004
Wed Sep 15 19:33:34 EDT 2004
Wed Sep 15 19:33:36 EDT 2004
Wed Sep 15 19:33:37 EDT 2004
Wed Sep 15 19:33:39 EDT 2004
Wed Sep 15 19:33:41 EDT 2004
Wed Sep 15 19:33:44 EDT 2004
Wed Sep 15 19:33:45 EDT 2004
Wed Sep 15 19:33:47 EDT 2004
Wed Sep 15 19:33:48 EDT 2004
Wed Sep 15 19:33:50 EDT 2004
Wed Sep 15 19:33:51 EDT 2004
Wed Sep 15 19:33:53 EDT 2004
Wed Sep 15 19:33:55 EDT 2004
Wed Sep 15 19:33:57 EDT 2004
Wed Sep 15 19:33:58 EDT 2004
Wed Sep 15 19:34:00 EDT 2004
Wed Sep 15 19:34:01 EDT 2004
Wed Sep 15 19:34:02 EDT 2004
Wed Sep 15 19:34:04 EDT 2004
Wed Sep 15 19:34:06 EDT 2004
Wed Sep 15 19:34:07 EDT 2004
Wed Sep 15 19:34:09 EDT 2004
Wed Sep 15 19:34:10 EDT 2004
Wed Sep 15 19:34:12 EDT 2004
Wed Sep 15 19:34:14 EDT 2004
Wed Sep 15 19:34:15 EDT 2004
Wed Sep 15 19:34:17 EDT 2004
Wed Sep 15 19:34:18 EDT 2004
Wed Sep 15 19:34:19 EDT 2004
Wed Sep 15 19:34:21 EDT 2004
Wed Sep 15 19:34:22 EDT 2004
220 mx.gmail.com ESMTP 80si57756rnb
>>> HELO grex.cyberspace.org
250 mx.gmail.com at your service
>>> MAIL From:<mcnally@cyberspace.org>
250 OK
>>> RCPT To:<mmcnally@gmail.com>
250 OK
>>> DATA
354 Go ahead
>>> .
Wed Sep 15 19:34:24 EDT 2004
250 OK 1095291185
mmcnally@gmail.com... Sent (OK 1095291185)
Closing connection to gsmtp171.google.com.
>>> QUIT
221 mx.gmail.com closing connection
grex%
Wed Sep 15 19:34:25 EDT 2004
|
gelinas
|
|
response 239 of 286:
|
Sep 16 03:03 UTC 2004 |
Odd. The same thing happens with traceroute, to an IP address.
|
marcvh
|
|
response 240 of 286:
|
Sep 16 04:00 UTC 2004 |
How about traceroute -n to an IP address?
|
gelinas
|
|
response 241 of 286:
|
Sep 16 04:29 UTC 2004 |
Same delay. FWIW, the delay does NOT appear between hops, just on the
start-up.
|
mcnally
|
|
response 242 of 286:
|
Sep 16 06:36 UTC 2004 |
Are there any outgoing IP applications that are not effected?
|
remmers
|
|
response 243 of 286:
|
Sep 16 12:18 UTC 2004 |
Outgoing telnet isn't affected.
|
mcnally
|
|
response 244 of 286:
|
Sep 16 17:00 UTC 2004 |
Hmmm..
What do the affected programs have in common?
What is the significance of the common ~79 second delay?
What changed during the reconstruction of the boot drive
that might explain this?
Any ideas, anyone?
It really does not appear to be name-service related, as both
forward and reverse lookups seem to finish in normal amounts
of time using nslookup, which presumably uses the same underlying
resolver calls.
What else might have some timeout that's taking place?
|
naftee
|
|
response 245 of 286:
|
Sep 16 19:30 UTC 2004 |
It's faster to use fronttalk from M-net than to try to SSH into GreX and use
BBS.
|
drew
|
|
response 246 of 286:
|
Sep 18 04:59 UTC 2004 |
Ftp keeps timing out.
psftp gets me just past the Password prompt, then it says something to the
effect of sftp-server: command not found, could not connect.
|
krj
|
|
response 247 of 286:
|
Sep 19 04:31 UTC 2004 |
While using party tonight, I kept getting disconnected; eventually the
network connection appeared to go down solidly. I dialed in direct
to enter this. (It takes about 79 (?) seconds to log in on a direct
dial, too.)
|
charcat
|
|
response 248 of 286:
|
Sep 19 06:33 UTC 2004 |
|
krj
|
|
response 249 of 286:
|
Sep 19 17:14 UTC 2004 |
My telnet connections continue to be unstable, though I was able
to connect long enough to enter this...
|