You are not logged in. Login Now
 0-24   25-49   50-74   75-99   100-106      
 
Author Message
25 new of 106 responses total.
mdw
response 25 of 106: Mark Unseen   Dec 2 01:01 UTC 1994

I'm still investigating this, so I may learn more, but so far, it looks
like sendmail is doing a very good job of not pushing the load average
up, at the expense of queueing a lot of mail for later delivery.  At any
rate, in testing, I've had it both deliver mail, and queue it, more or
less at random depending upon the load average when it tried to deliver
the mail.

I still need to do some more investigating - among other things, I want
to convince myself sendmail is processing the queue "often enough", and
I want to convinced myself that the extra "I can't deliver your message
right now" messages aren't spawning extra copies of themselves
endlessly.  I suppose another possibility is that we've been mail-bombed
and haven't figured it out yet.

If it's not something trivial like one of these problems, then it may be
time to dig deeper into the queuing code & make some harsher tradeoffs.
Perhaps it should reject outside connections sooner, or perhaps it
should tolerate a higher load average.
davel
response 26 of 106: Mark Unseen   Dec 2 01:12 UTC 1994

I got a message from the moderator-equivalent of one mailing list telling
me that I'd been dropped because messages to me were bouncing (to everyone
on the list, I expect).  I've re-subscribed to that one a few minutes
ago, & hope it's not going to happen *again* now.  The other list is still
silent, so I expect that the same thing has happened.
robh
response 27 of 106: Mark Unseen   Dec 2 02:09 UTC 1994

Of all the mailing lists I'm on, it looks like only one hasn't
sent me any mail since the link came back up.  That's a lot better
than I had expected.
steve
response 28 of 106: Mark Unseen   Dec 2 02:35 UTC 1994

   The real problem here is that thousands of peices of mail are
waiting to get into Grex's /var/spool/mail--we have someting like 2200
peices queued up now, so the total number processed is greater.

   We know that sendmail is working, and can deal with things no matter
how large the load is.  What we're seeing now are "tuning" issues, such
as how often sendmail should try and deal with the queue, when it
should reject messages, etc.  When I first talked to Marcus tonight
about all this, my sugggestion was to change how often sendmail deals
with the queue.  It is curently set at every 17 minutes, and I wanted to
see it ramped down to something like 5.  Marcus didn't think that was a
good idea, because of the way that sendmail has to read the queue in
order to figure out what to do with it.  Over dunner I was thinking about
all this, and he is right--which goes to show that sendmail operation
isn't easy, and what might look reasonable on the surface might not
really be such a good idea.

   So I'm sure that marcus will tame the beast, but this does point 
out that sendmail isn't perfect yet, and some amount of teweaking
is needed to make for optimal operation here.
marcvh
response 29 of 106: Mark Unseen   Dec 2 04:19 UTC 1994

If a down host causes mailing list mail to bounce to the entire
list, either the MTA doing the bouncing or, more likely, the list
relay is broken.
carson
response 30 of 106: Mark Unseen   Dec 2 04:22 UTC 1994

would it make sense to take Grex "down" for a couple of hours
to catch up with mail? when I say "down", I mean locking it
out to logins, not to SMTPlinks.
nephi
response 31 of 106: Mark Unseen   Dec 2 04:46 UTC 1994

Grex down again?  NOOOOOOOOOOOOOOOOO!!!
mdw
response 32 of 106: Mark Unseen   Dec 2 04:53 UTC 1994

It might make sense to do something of the sort if there is no other
solution; but I think the first step is to better understand what it's
doing & why.  The queue has definitely gotten a lot smaller now; so it
may well be that sendmail is doing exactly what we hoped it would:
deliver the mail a bit more slowly without killing the system.  When the
queue was large, I found that about 10% of the queue was consumed by
mail destined for only 2 users; I thought initially it might be some
sort of error message chain, but I think now the problem is that each of
those users had subscribed to some sort of high traffic mailing list -
something like 100 messages a day?...

Another problem seems to have to do with DNS lookups; a fair number of
messages had been queued because of temporary DNS failures.  That very
probably is a tuning issue; sendmail is expecting to do DNS lookups on a
fast ethernet connected machine, not over a slow congested serial line;
there may be configure parameters in named or resolver library issues
that should be changed a bit in sendmail.
scg
response 33 of 106: Mark Unseen   Dec 2 05:06 UTC 1994

If we could schedule the downtime for catching up on mail to be betwenn
four and five am, or something like that, I'm guessing it wouldn't be
nearly as harshly received as if it were prime time downtime.  Can
downtime for this sort of thing just be programmed ahead of time,or does
there have to be a human standing over the machine to do it?
steve
response 34 of 106: Mark Unseen   Dec 2 08:36 UTC 1994

   Well, Marcus and I have been staring at sendmail this last hour,
and we think we know an interesting problem here.
   It seems that we have 2000+ pieces of mail queued up for delivery
because once sendmail goes into its "deliver the local queued mail"
mode, it can only deliver somewhere around 63 pieces before it dies.
Since sendmail goes into the deliver mode only once every 17 minutes,
this means that about 200 pieces of mail an hour are normally delivered.
   We've been manually running sendmail to plow through the queue this
morning, so we should have it cleaned out by Friday morning or so.
   Why/what is the problem?  Well, it appears that the shadow code (the
stuff that deals with passwords) doesn't close the file "/etc/groups"
which contains various information about what groups people are in.
Since this data is used in sendmail, after sendmail has dealt with about
63 peices of mail, the number of "file descriptors" (that are associated
with the opening of a file) is exhusted, and sendmail can't run anymore.
   So, Marcus is currently looking at the shadow code to see if he
can't tame that.  As soon as I finish this I'm going to start up another
queue run.
   It is interesting to see how different subsystems on Grex interact
with each other...
steve
response 35 of 106: Mark Unseen   Dec 2 09:36 UTC 1994

   Since that last response about 208 pieces have been processed, and
Marcus has a fix for the shadow code, which should be welded into
sendmail hopefully today.
nephi
response 36 of 106: Mark Unseen   Dec 2 09:59 UTC 1994

Well, I think that all the mail must have been delivered now....
The load average is below one again and Grex seems to be faster than
I have ever experienced.  Good work, guys!!!
davel
response 37 of 106: Mark Unseen   Dec 2 10:53 UTC 1994

(I was only speculating that bounces went to the whole list; I don't know it
for sure.  But things like automatic acknowledgements & vacation replies
turn up with depressing regularity.  I'm told that this is unavoidable.
The list runs under VMS, & apparently no one involved really understands how
to configure it.)
steve
response 38 of 106: Mark Unseen   Dec 2 14:58 UTC 1994

  Given the complexity of mail systems, I can see why.

  Well, there is a new version of sendmail on Grex now, one that seems
to not run out of file descriptors any more.  Using that, we were able
to get the queue down to all the items who'd already had their first
chance at getting to the outside world, but couldn't, becuase of server
problems elsewhere on the net.  We have about 800 pieces of that kind
of mail last time I looked.
  But nearly all the incomming mail has been delievered.  Of course,
since the mail has been delivered, /var/spool/mail is chock full, so
it would help for people to clean out their mailboxes.
  (Bugs, like matter can neither be created nor destroyed.  They may
only be moved about, from subsystem to subsystem...)
rcurl
response 39 of 106: Mark Unseen   Dec 2 15:38 UTC 1994

I'm sure they reproduce.
popcorn
response 40 of 106: Mark Unseen   Dec 2 16:50 UTC 1994

(Grex's vacation program is bright enough not to send vacation messages
to mailing list mail.)

As I mentioned in the System Problems item:
1) "newmail" generally doesn't announce arriving messages
2) Mail I sent yesterday or several hours ago today to groups like
"staff" or "cfadm" hasn't yet arrived in my mailbox.  Interestingly,
I received a reply from someone's "vacation" mailer to a message I sent
to him and to cfadm, even though I haven't yet received the copy I sent
to cfadm.
3) Sendmail sent some "your mail could not be delivered in 4 hours"
messages to a friend of mine who sent mail from *off-site*.  It's like
it received her message and then queued it somewhere on Grex.
popcorn
response 41 of 106: Mark Unseen   Dec 2 17:11 UTC 1994

Another sendmail observation:  E-mail from offsite seems to be reaching
me instantly, now, even though I still haven't received that on-site
stuff I mentioned.
popcorn
response 42 of 106: Mark Unseen   Dec 2 17:26 UTC 1994

I ran "mailq" (which I'd never used before) and noticed that there's
outgoing mail queued up from as far back as the 28th.

Also, many outgoing messages had name server timeout errors.

Many entries said "(no control file)".

And I noticed a lot of mail that was neither from nor to anybody at
cyberspace.org.  I'm guessing that's due to people's .forward files.
A lot of these files were huge!  <sigh>
popcorn
response 43 of 106: Mark Unseen   Dec 2 18:49 UTC 1994

Odd...  I'm getting responses to messages that I sent to a user and to
staff, even though I haven't yet received my staff copy.
popcorn
response 44 of 106: Mark Unseen   Dec 2 19:26 UTC 1994

New theory: maybe sendmail doesn't send a copy of a message to the
person who sent it, if that person is on the list that the mail is to?
tsty
response 45 of 106: Mark Unseen   Dec 2 20:25 UTC 1994

ok, thankxx -- it does make sense to chut off the incoming deluge
every now and then. Glad to know why and how. 
  
Thnkxx to mdw for the effort(s) for sure. 
jep
response 46 of 106: Mark Unseen   Dec 3 01:06 UTC 1994

        This work seems amazing to me.  It gives me a headache to think of
the effort and expertise which were obviously required.  All I can do --
all I'm going to do, anyway -- is say "thanks" and probably join the rest
of Grex in getting so used to the changes that I won't even be aware of
them in a day or two.  The reward of the system administrator; at best, no
one notices what you've done.
steve
response 47 of 106: Mark Unseen   Dec 3 01:35 UTC 1994

   Thats right: it's a negative feedback system in most professional
places.  At least here on Grex there are lots of people that realize
the work that goes into things.  That isn't the case usually!
   It's been interesting see sendmail run.  I've been learning
sendmaileese in the last day, and perhaps within a few weeks I can
advance beyond apprentice in "sendmail control".
mdw
response 48 of 106: Mark Unseen   Dec 3 02:01 UTC 1994

Sendmail may well be smart enough to strip the sender out of mailing
lists.  I seem to remember a configuration setting somewhere to toggle
that - is that something people would like to see flipped the other way?

I don't know how newmail figures out when you've gotten new mail, but
it's certainly possible the mechanism is broken.  The current mail
delivery mechanism should be sending datagrams to comsat, which will
then notify users depending on their "biff" setting.  Perhaps smail &
newmail had a different mechanism...

I'd like to see the weird 4-hour message that was delivered
to the off-site user.
steve
response 49 of 106: Mark Unseen   Dec 3 03:05 UTC 1994

   Gads, sendmail is *fast*.  When the load average is down,
sendmail delivers local mail in under five seconds.  I've seen
this several times now, so I'm starting to actually believe it. ;-)

   Right now, we have about 280 items in the queue, all of which
have tried getting off Grex (at least) once, but couldn't because
of problems with other mail systems at other sites.  The reall cool
thing is, we can easily see whats where, and why its still here.

   I'm beginning to really like sendmail.  With having an experienced
person like marcus to get sendmail up, running and tamed helps
fantastically.
 0-24   25-49   50-74   75-99   100-106      
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss