Grex Helpers Conference

Item 113: Grex System Problems Item

Entered by i on Tue Sep 24 23:27:36 2002:

44 new of 248 responses total.


#205 of 248 by carson on Wed Dec 11 22:03:24 2002:

(both "tmobile" and "admin9" have existing accounts on Grex.  I find
the "last login" dates amusing, for obvious reasons.)


#206 of 248 by aruba on Wed Dec 11 22:25:42 2002:

Not bvious to me, I'm afraid.


#207 of 248 by carson on Thu Dec 12 04:23:24 2002:

(the "last login" dates are when the accounts sent the spam that put
Grex on SpamCop's blacklist, i.e., they've likely only been used once.)


#208 of 248 by gull on Thu Dec 12 14:11:58 2002:

The report wouldn't have appeared to come from carson.  Spamcop
deliberately munges the addresses, to prevent spammers from retaliating
or reselling them.


#209 of 248 by aruba on Thu Dec 12 15:24:36 2002:

Re #207: Ah.  Thanks.


#210 of 248 by dpc on Thu Dec 12 16:38:55 2002:

Over the past month or so, e-mail to me from non-Grex
sources is often *seriously* delayed.

This morning's collection has an e-mail sent to me from
umich.edu on Dec. 10 at 3:06 p.m. which did not arrive
at Grex until Dec. 12 (today) at 4:27 a.m.

My wife and I frequently get duplicate e-mails from
the same sender(s).  She has a Comcast account.  
Her e-mails arrive almost immediately after they
are sent.  Mine are delayed for up to a day!

What is happening here?


#211 of 248 by remmers on Thu Dec 12 16:46:52 2002:

Marcus talked about that problem a bit at last night's board meeting.
He's not sure but thinks it might be a (possibly inadvertent) form of
denial-of-service attack.  He's investigating.  I'll let him explain
more fully if he wishes.


#212 of 248 by gull on Thu Dec 12 20:46:18 2002:

Re #211: Thanks.  I don't know about other people here, but mostly I just
wanted to know that the problem was being taken seriously and investigated.
:)


#213 of 248 by dpc on Fri Dec 13 14:56:01 2002:

Thanx!  The problem is quite annoying.  An e-mail was sent to
me from aadl.org (the Library) on Wednesday, Dec. 11 at 5:36 p.m.
and I didn't receive it until today, Dec. 13, at 3:32 a.m.


#214 of 248 by dang on Sat Dec 14 00:46:07 2002:

It's causing havok on the Grex staff list as staffers answer questions
multiple times because the answers cross each other in flight by hours.


#215 of 248 by mdw on Sat Dec 14 11:29:52 2002:

I think I better understand why mail got slow.  We apparently have at
least 3 different spammers trying to connect to grex, but failing to do
so.  Presumably there's a firewall, possibly at their end, perhaps in
the middle, that drops packets from grex to the spammers.  The half-open
connections occupy slots in the input queue, and there are only a
limited number of such connections that are permitted to exist.  After
those slots fill up, other connection attempts are silently ignored.
Since the 3 spammers have at least 20 IP addresses between them, they're
doing a pretty good job of keeping those slots full.  And, since we
never actually get any connections from them, let alone mail, they don't
appear in our logs at all.  This is certainly going to be "interesting"
to fix.  So far, a few simple things I tried haven't been useful, but
there are certainly any number of not so simple fixes we can do.

I suppose we should be thankful that we don't get the spam.


#216 of 248 by mcnally on Sat Dec 14 13:43:49 2002:

  Are we running any sort of IP filtering SW that can simply block those
  IP addresses completely or don't we have the horsepower for that?


#217 of 248 by keesan on Sat Dec 14 16:11:17 2002:

Can you simply write to these spammers, point out to them that the mail is
not getting through to grex anyway but they are disabling our system, and ask
them to 'delist' us (stop sending things in our direction)?  


#218 of 248 by dpc on Sat Dec 14 17:19:04 2002:

Thanks for keeping us informed, Marcus.  What a problem!


#219 of 248 by cross on Sat Dec 14 21:20:47 2002:

This response has been erased.



#220 of 248 by mcnally on Sat Dec 14 22:58:51 2002:

  It's been a long time since I've dealt with something like this but it
  seems at first glance that a simple package like tcp_wrappers should be
  fine for this, I don't think you need full bpf-style flexibility.


#221 of 248 by dang on Sun Dec 15 01:04:02 2002:

We can completely block their IP ranges, and we routinely put such
blocks in place.  However, there appear to be quite a few legit users
who connect from those ranges, so we'd prefer not to block them entirely
if possible.  Besides, if Marcus can fix this problem, then this won't
happen in the future.  I'd say blocking the range is a last resort.


#222 of 248 by carson on Sun Dec 15 01:05:37 2002:

(I'd say blocking the range is a first, temporary measure, but that's
a discussion for another conference.)


#223 of 248 by keesan on Sun Dec 15 01:13:34 2002:

What is an IP range?  Can you block just specific IP numbers rather than a
range, and if legitimate users are also using those IP numbers can the ISP
be asked to cancel just those accounts or is this a free mail service where
they keep signing up for new accounts?  Or am I confused?


#224 of 248 by mdw on Sun Dec 15 01:32:04 2002:

Tcp wrappers can't help - they work at the user level; this problem
isn't even visible at the user level, so there's no way to use it to
reject connections.  The DSL router has a limited capability of
filtering packets, but we don't administer it, so that's not useful to
us.  obsd pf would be quite capable of that, but we're not on openbsd
yet, and an obsd pf firewall of some sort would also work, but that's
quite a bit of work.  It may be worth doing this anyways, but it's not
going to happen this weekend.  My current plan is to try to write
something that can "sniff" the wire, detect when grex tries to complete
a connect to a set of bad IP ranges, and send packets to grex to cause
the connections to be aborted.  I tried to do just this last night, but
I'm afraid lack of sleep was catching up fast with me and I didn't
succeed in getting this to work.  I'm going to try to debug this
separately outside of grex where I can better figure out what's
happening, and hope to have better luck making this work with grex
tomorrow.


#225 of 248 by gelinas on Sun Dec 15 03:33:50 2002:

(#217 assumes that spammers *care* that they are inconveniencing other.  If
they did, they'd not send the junk in the first place.  So why bother asking
them to stop?)


#226 of 248 by mcnally on Sun Dec 15 05:13:20 2002:

  No doubt my understanding of the problem is completely deficient but
  nothing said so far illustrates to me why this problem can't be dealt
  with by using tcp wrappers to block these IP addresses from connecting
  to just the SMTP port.

  I'm really not understanding what Marcus is saying in the first sentence
  of 224.  My recollecton concerning TCP wrappers is that the major reason
  to run the package is to allow selective blocking of incoming connections
  on certain ports.  I'll admit it's been a long time, though, maybe I'm
  thinking of another package entirely.


#227 of 248 by gull on Sun Dec 15 17:26:18 2002:

Re #225: They might care they're wasting time trying to connect to a machine
they can't get through to.

Re #226: I got the impression these were half-open connections -- a syn
flood, in effect, filling up the state table.  In that case TCP wrappers
wouldn't help because it doesn't kick in until the connection is
established.  Did I read wrong?


#228 of 248 by krj on Sun Dec 15 19:01:37 2002:

I have another report of lost mail.  I sent my sister e-mail on 
Wednesday; she tells me by phone today that she received the mail
promptly and replied, but the reply hasn't shown on up on Grex after
about three days.


#229 of 248 by cross on Sun Dec 15 19:11:43 2002:

This response has been erased.



#230 of 248 by gelinas on Mon Dec 16 01:01:10 2002:

(Three days is the usual "first notification" limit, so she may have gotten
her message back by now, or at least a warning that it has not yet been
delivered.)


#231 of 248 by gelinas on Mon Dec 16 01:21:26 2002:

Re #227: It's bulk mail; the individual recipient, or even the individual
machine, isn't important.  *IF* the hung connections were inconveniencing
the senders, they *might* be interested in fixing it.  But if they were
being inconvenienced, they'd have noticed, no?  They haven't fixed it,
so they (must not | probably don't) care.


#232 of 248 by mdw on Mon Dec 16 02:32:42 2002:

My guess is that the bulk mailers were in the "process" of being
disconnected; they still had the physical IP connection and could send
packets, but were using addresses that were no longer "routable", so it
was no longer possible to send packets back to them.

The tcp wrappers come with a library and it's possible to compile other
programs (such as sshd) to use the same basic blocking code.  It
wouldn't be useful for sendmail hwever as there are way too many mail
gateways that, when a connection fails for any reason, immediately retry
without even waiting.  So, in fact, you actually want to accept the
connection, wait until they try to send mail, and *then* deliver a 5xx
error code.  That *should* clear the queue entry out at their end and
avoid the retry.  Grex's sendmail isn't great at doing this, but it does
have support and we use it routinely against people we recognize as
spammers.  Last week, we rejected 1904 pieces of mail using this
facility.  The 5xx error also contains a URL, and 11 people read the
file referenced by the URL.


#233 of 248 by dang on Mon Dec 16 02:36:39 2002:

SYN floods require kernel level protection.  All modern kernels have
such protections built in.  Unfortuantly, Grex is not yet running a
modern kernel.  Hence our current problem.


#234 of 248 by mcnally on Mon Dec 16 04:36:58 2002:

  Ahh..  Ok, now I get it.


#235 of 248 by jazz on Mon Dec 16 14:22:36 2002:

        Conventional kernel-level SYN flood protection wouldn't help you here,
anyways, since it isn't a "flood" in the sense that there aren't many
connection attempts being made.


#236 of 248 by davel on Mon Dec 16 20:56:52 2002:

For whatever it's worth:  I send (from off site) several messages a day to
a couple of people on Grex.  From the system where I send them, I see them
hanging in the queue for long periods.  Usually the error message given is
  Deferred: Connection timed out with grex.cyberspace.org.
but sometimes it is
  Deferred: Interrupted system call

Given what's been said, I assume that it times out because there are too many
of these half-open connections.  But I have to wonder if the "Interrupted
system call" error doesn't occur because I somehow get a half-open connection.
At any rate, I thought I'd mention this message.


#237 of 248 by dang on Tue Dec 17 02:42:38 2002:

resp:235 That depends on how the particular kernel detects SYN floods. 
I would classify "sevaral" half open connections at at time from the
same IP as a SYN flood.  How you define "several" depends on the system
and it's load, I suppose.

Many kernels have tunable SYN timeouts, which could be useful here. 
Time them out quicker.  (Incidentally, my Linux 2.4 box has a max of
1024 backloged SYNs.)


#238 of 248 by mdw on Tue Dec 17 06:02:45 2002:

The half-open connections are typically from several *different* IP
addresses.  So, anything that thinks a syn flood comes from one IP
address isn't going to recognize this.  Another technique is just to
randomly close out several connections in the middle of the queue - that
would work better here, but would probably also randomly trash real
connections that were too slow to complete.

"Interrupted system call" isn't something grex can create directly.
This would result from an interrupt caused by some event on the remote
end (such as alarm(2)) that caused a system call to be aborted.
Probably that means something was too slow at grex's end, although that
seems difficult to believe, tcp & standard smtp timeouts are pretty big.
Perhaps it means something quite different.


#239 of 248 by russ on Thu Dec 19 03:28:42 2002:

I just got a piece of mail, delivered 2-something Wednesday morning.
It had been sent at 6:08 AM on Monday. 

I got another piece of mail, delivered 3:30 PM Wednesday afternoon.
It had been sent about 7 PM *last Saturday night*.

This mail-refusal problem is way out of hand.  Something Must Be Done.


#240 of 248 by gelinas on Thu Dec 19 03:47:14 2002:

Russ, can you look at the Received: lines and tell where it got hung up?


#241 of 248 by jep on Thu Dec 19 14:22:39 2002:

Grex is running really slowly.  The load average is down to around 6 right
now, but was up to 19+.


#242 of 248 by keesan on Thu Dec 19 15:29:58 2002:

Still running slowly, on and off.  I waited close to a minute to open my
mailbox with Pine (100 plus message, I admit).  BBS is okay.


#243 of 248 by russ on Thu Dec 19 23:50:15 2002:

(trying again after crash during upload)

The hangups are getting to Grex; what else?  Here are the headers,
with some info abridged.

Received: from FOREIGN.SYSTEM (FOREIGN.SYSTEM [NNN.NNN.NNN.NN]) by
grex.cyberspace.org (8.6.13/8.6.12) with SMTP id CAA17563 for
<russ@cyberspace.org>; Wed, 18 Dec 2002 02:41:57 -0500 Received: (qmail 13305
invoked by uid 0); 16 Dec 2002 11:13:00 -0000 Received: from unknown (HELO
OTHER.SYSTEM) (unknown)
  by unknown with SMTP; 16 Dec 2002 11:13:00 -0000
Received: from OTHER.SYSTEM (IDENT:25@localhost [127.0.0.1])
        by OTHER.SYSTEM (8.12.6/8.12.6) with ESMTP id gBGBAn6k019152;
        Mon, 16 Dec 2002 06:10:50 -0500
Received: (from majordomo@localhost)
        by OTHER.SYSTEM (8.12.6/8.12.6/Submit) id gBGB8n3o018402;
        Mon, 16 Dec 2002 06:08:49 -0500
Date: Mon, 16 Dec 2002 06:08:49 -0500

From USER@SOMESITE.COM Wed Dec 18 15:31:51 2002
Received: from ashd1-2.relay.mail.uu.net (ashd1-2.relay.mail.uu.net
[199.171.54.246]) by grex.cyberspace.org (8.6.13/8.6.12) with ESMTP id PAA05636
for <russ@cyberspace.org>; Wed, 18 Dec 2002 15:31:49 -0500 Received: from
smtp.SOMESITE.COM by mr1.ash.ops.us.uu.net with SMTP 
        (peer crosschecked as: SMTP.SOMESITE.COM [NN.NNN.NNN.NNN])
        id QQntgq19875
        for <russ@cyberspace.org>; Sun, 15 Dec 2002 00:11:46 GMT
Received: from Connect2 Message Router by smtp.SOMESITE.COM
        via Connect2-SMTP 4.32; Sat, 14 Dec 2002 19:17:31 -0500
Date: Sat, 14 Dec 2002 19:06:00 -0500

As you can see the first message took more than 44 hours to make the
last hop to Grex; the second took 6 minutes to get to uu.net (note
the time change from EST to GMT, and that the clock at the message
router is fast), and then it took until Wednesday afternoon to get
to Grex (over 92 hours).

That's pretty pathetic.  You could get a message round-trip to
Pioneer 11 in about half that time.


#244 of 248 by gull on Fri Dec 20 00:07:11 2002:

Reminds me of sending stuff via packet radio.  If it arrived faster than
snail mail, it was a good day.


#245 of 248 by gelinas on Fri Dec 20 00:39:29 2002:

Thanks, Russ.


#246 of 248 by krj on Fri Dec 20 18:44:24 2002:

The RoadRunner ISP has decided to block mail from Grex because
they think Grex is offering an open proxy server.
 
----------
From daemon Fri Dec 20 02:27:07 2002
Received: from localhost (localhost) by grex.cyberspace.org (8.6.13/8.6.12)
with internal id CAA19965; Fri, 20 Dec 2002 02:27:06 -0500 Date: Fri, 20 Dec
2002 02:27:06 -0500 From: Mail Delivery Subsystem
<MAILER-DAEMON@cyberspace.org> Subject: Returned mail: Remote protocol error
Message-Id: <200212200727.CAA19965@grex.cyberspace.org> To: krj@cyberspace.org
MIME-Version: 1.0 Content-Type: multipart/mixed;
boundary="CAA19965.1040369226/grex.cyberspace.org" Status: R

This is a MIME-encapsulated message

--CAA19965.1040369226/grex.cyberspace.org

The original message was received at Fri, 20 Dec 2002 02:27:01 -0500
from krj@localhost

   ----- The following addresses had delivery problems -----
XXXXXX@austin.rr.com  (unrecoverable error)

   ----- Transcript of session follows -----
... while talking to txmx01.mgw.rr.com.:
>>> MAIL From:<krj@cyberspace.org>
<<< 550 5.7.1 Mail Refused - 216.93.104.34 - See http://security.rr.com/mai
l_blocks.htm#proxy
... while talking to txmx02.mgw.rr.com.:
>>> MAIL From:<krj@cyberspace.org>
<<< 550 5.7.1 Mail Refused - 216.93.104.34 - See http://security.rr.com/mail_blocks.htm#proxy
----------

... and etc down a whole list of mail servers.


#247 of 248 by tpryan on Sat Dec 21 20:15:42 2002:

(remmers will now probably remind us how he used to send packets
by carrier pigeon, and his ISP number was 3).


#248 of 248 by tsty on Sun Dec 22 08:05:32 2002:

2


There are no more items selected.

You have several choices: