You are not logged in. Login Now
 0-24   25-49   50-74   75-99   100-124   125-149   150-174   175-199   200-224 
 225-249   250-270         
 
Author Message
25 new of 270 responses total.
sidhe
response 50 of 270: Mark Unseen   Apr 11 22:19 UTC 1995

        I don't know about any of you, but I ised to rely on the swiftness
of e-mail being my only hope of contacting someone online, who 
has their write perms set to off, WHILE they're still on.
        It sounds as if this is now not a likely means of quick contact.
I'm with mdw.
remmers
response 51 of 270: Mark Unseen   Apr 11 23:18 UTC 1995

Right -- I think that is a common use of mail.  Are we sure that the
benefit of queuing mail is great enough to justify removing this feature?
steve
response 52 of 270: Mark Unseen   Apr 12 02:57 UTC 1995

   No, not completely.

   Greg did this as a test, which was certainly valid.  I think, but
haven't seen or dug up numbers to prove that we're basically getting
a little performance pickup by doing this, at the expense of having
hundreds of peices of mail in the queue at busy times (like, as I write
this).

   I too like the immediacy of mail.  If there isn't a real improvement
on queued mode we should probably go back to 'immediate' mode (sorry,
I forget the correct term).
mwarner
response 53 of 270: Mark Unseen   Apr 12 02:59 UTC 1995

If this has become a sort of quicky survey, I'd have to say "let the mail
flow".  The juggling doesn't *seem* to resolve fundamental load issues,
but introduces a load of uncertainty that didn't exist before.  I liked
the idea in principle.
steve
response 54 of 270: Mark Unseen   Apr 12 04:06 UTC 1995

  Heh.  The fundemental load issue is that Grex runs with 50 people
on every night now.
lilmo
response 55 of 270: Mark Unseen   Apr 12 05:43 UTC 1995

steve, I think "background" is the word you are looking for...  would
someone technical comment on #48, and could we distinguish between local 
and non-local e-mail?  Say, deliver mail to grexers at grex in the
background, and queue the rest?

Just trying to brainstorm, here...
carl
response 56 of 270: Mark Unseen   Apr 12 10:10 UTC 1995

I'm with Mike in 53.

nephi
response 57 of 270: Mark Unseen   Apr 12 10:39 UTC 1995

I liked mail much better before the test, as well.
popcorn
response 58 of 270: Mark Unseen   Apr 12 12:06 UTC 1995

If this is a quicky survey, my response is: The new style of mail delivery
is working fine for me.  The old style was fine too.
I very slightly prefer the new style because it batches up the mounds of
messages and dumps them into my mailbox all at once, with breathers in
between, rather than continually dribbling it in.
mdw
response 59 of 270: Mark Unseen   Apr 12 22:20 UTC 1995

It would be possible to redesign the delivery algorithm (using the rules
suggested in #48 or #55) but it would be a lot of work.  There's no
point in doing any of that work without objective numbers and
appropriate usage allowances - ie, scientific measurements such that the
effects of different mail algorithms can be quantified, analyzed, and if
necessary, dissected.

There are also some alternatives worth considering: for instance, if CPU
usage is a major concern, it would be better to sink the effort into
upgrading the CPU.  One of the major components of CPU load with mail is
not sendmail, but pine.  Indeed, so far as that goes, it would make
sense to put some serious research into why "pine is big/slow".
robh
response 60 of 270: Mark Unseen   Apr 12 22:34 UTC 1995

My, that's interesting.  Is Elm a comparable resource hog?
How about mh?
popcorn
response 61 of 270: Mark Unseen   Apr 12 22:51 UTC 1995

Pine leads them all.  Greg did some measurements with more details;
maybe we can get him to post them here....
davel
response 62 of 270: Mark Unseen   Apr 13 01:26 UTC 1995

Please do, Greg, if you've got them on line somewhere.

(I always *knew* pine was evil ... )
sidhe
response 63 of 270: Mark Unseen   Apr 13 01:44 UTC 1995

        Pine isn't evil.. just SLOW.
jweiss
response 64 of 270: Mark Unseen   Apr 13 06:11 UTC 1995

In response to #47.  Sendmail 8.6.12 has actually been out for a while,
so I assume you mean it will be comming shortly to grex.  In fact it only
has a few small patches, which I would not expect to affect delivery speed.
It would appear that grex has not yet taken 8.6.11, however, as I recall
that has only a few more patches than 8.6.12.  They both appear to be trying
to fix small bugs introduced in 8.6.10 which closed several security holes.
srw
response 65 of 270: Mark Unseen   Apr 13 07:12 UTC 1995

From info I heard at the Grex staff meeting, it seems that 8.6.11
fixes some security holes, as Jonathon says. It apparently introduced
some new bugs, which are what 8.6.12 fixes. So by jumping from 10 to 12
we hope to get the holes fixed without adding any bugs.

We are not expecting performance changes from this upgrade.

From responses here in coop regarding queued delivery, we will be
ending that experiment and going back to non-queued delivery.
No long-term performance changes expected there either.

Nothing short of a Sparc CPU or a separate box for mail is really
going to help much at this point.
popcorn
response 66 of 270: Mark Unseen   Apr 13 12:30 UTC 1995

Here are the statistics Greg put together, reprinted without his permission.




Item #205: Where have the cycles gone? Long time passing........
Greg Cronau (gregc) Mon, Mar 20, 1995 (07:04) - 129 lines


 
 
  ...sucked up by users every one. When will they ever learn.......
 
 
 We've been doing alot of speculating lately about "just *what* is slowing
 this machine down?". Most of it has been gut level guesses from most of
 us. I decided to get some hard data. I used Marcus's acctcom.x program to
 get a dump of the accounting file (pacct) for 03-16-95. Nothing special about
 that date, it's just when I decided to do it. That file is restarted at
 00:15 each night, I start the dump a little before midnight, so I got 
 almost a full 24 hours worth of data.
 
 Here's the relavant statistics:
 
     Start of sampled data: 03-16-95 00:15:08
       End of sampled data: 03-16-95 23:54:17
 Duration of sample period: 85149 seconds (0.986 days)
 
 Total number of programs run: 77410
    Total unique programs run: 225
 
        Total CPU seconds utilized: 61518.4
 % of total available CPU utilized: 72.2%
 
 Top 50 programs accounted for 95.1% of usage.
 
 Breakdown of top 50 programs sorted by CPU usage:
 
                 Number of        Total CPU         Percent of
 Command         Times Run        Seconds Used      Total CPU Used
 =======================================================================
 in.telne          1763            8684.62            14.12
 sendmail         12368            6167.86            10.03
 pine [Note 3]      378            4272.12             6.94
 bbs                658            4095.39             6.66
 find [Note 1]       17            3652.81             5.94
 party              553            3024.89             4.92
 du   [Note 2]     9612            2973.37             4.83
 finger            1001            2875.54             4.67
 csh               1950            2199.50             3.58
 sh                9213            1645.30             2.67
 w                 2014            1594.95             2.59
 mail.loc          2862            1473.34             2.39
 less              3031            1279.85             2.08
 elm  [Note 3]      498             942.88             1.53
 tcsh               532             798.03             1.30
 gzip                10             774.70             1.26
 lynx_              150             702.95             1.14
 mail               538             689.44             1.12
 cat               2587             658.33             1.07
 ls                 885             610.05             0.99
 login              317             605.86             0.98
 tset              1433             556.23             0.90
 mesg              2669             553.38             0.90
 stty              3225             527.02             0.86
 more              1656             513.59             0.83
 bash               155             468.33             0.76
 pico.rea           196             465.30             0.76
 ps                 172             452.86             0.74
 clear             2435             419.64             0.68
 echo              2648             410.78             0.67
 sort                25             403.96             0.66
 rm                1764             354.77             0.58
 write              418             319.59             0.52
 ntalk              188             300.66             0.49
 vi                 211             287.66             0.47
 who                838             271.88             0.44
 in.ident           224             271.28             0.44
 df                 363             252.70             0.41
 perl                28             200.94             0.33
 last                28             188.14             0.31
 nroff               23             180.36             0.29
 in.finge           327             174.49             0.28
 sleep             1016             170.97             0.28
 tty                797             170.37             0.28
 egrep              193             157.04             0.26
 screen              16             154.84             0.25
 cc1                 16             149.36             0.24
 ftpd                58             140.43             0.23
 newuser             55             136.76             0.22
 grep               293             126.97             0.21
 
 
 Note 1:
     Even though "find" was only run 17 times, one of those runs was the
 single largest use of cpu time during this period. It ran for a total
 duration of 12872 seconds and consumed 2827.81 seconds of CPU. This
 undoubtably is the nightly run of the updatedb that updates the database
 used by the "locate" program. We are going to have to consider some way
 to scale this thing back.
 
 Note 2:
     9612 runs of du?!? Either someone is obsessed with disk space or it's
 some kind of cron job. When I went back to the log file, I found that most
 of these occured as a "storm" starting around 5:33am. They were all run as
 root. We changed the root passwd recently and i forgot the damn thing, so I
 can't check root's crontab. I suspect this is being caused by some cron job
 that does a du on each user's home dir. This is wasteful, we could do the
 same thing with "du -s /home/*".
 
 Note 3:
     Elm was run 32% more often than pine and yet pine used over 4.5 times
 as much cpu resources as elm. What is pine doing???
 
 More notes:
     There is some data missing here. The data I have only shows jobs that
 ended during the sample period. Any job that was started, but didn't
 complete until after the end of the logfile, didn't get recorded. Also
 any job that was running *before* the log was started and continued to
 run through the whole period and never ended, was also never recorded.
 We have many background processes that are started at boot and never 
 exit. The named, ntpd, httpd, etc, etc. Fit in this class. So it's probably
 likely the "Percent of CPU utilised" is probably several percent higher.
 
     This is only half the problem. This information does not consider
 disk usage. A program that really hammers the disks like "find" will
 seem to slow the system down more that a program like "gzip" which does
 most of work in memory compressing data.
 
     The telnet daemon is our biggest cpu user. Not entirely surprising.
 I wonder if there is a faster telnet daemon we can install?
 
     If you add up the amount of time used by sendmail, pine, elm, mail,
 mail.local, and pico.real, you get 23% of the total usage. If you factor
 in the amounts from vi, more, less, and the various shells that are used
 while people read and write mail, it's a safe statement to say that at
 least %25 of our total processing is related to mail.
adbarr
response 67 of 270: Mark Unseen   Apr 14 01:19 UTC 1995

I should not be in this pool, but --

Grex needs cpu with much faster speed - regardless of
whether an ISDN line is plugged in?  Or will that (ISDN)
help drain the load, somehow?  Is it both, or one, or 
neither? Trying to understand priorities. Thanks.
robh
response 68 of 270: Mark Unseen   Apr 14 02:13 UTC 1995

I'm shocked to see that trn_real isn't on there, I thought
that even with the few people who use it, it would still be a
horrible resource hog.  Or is that uunder the domain
of the telnet daemon?
ajax
response 69 of 270: Mark Unseen   Apr 14 02:35 UTC 1995

  Arnold, as I understand it, both a faster CPU and ISDN would help
speed Grex up, but if you're dialed in directly through to a modems,
then most of the slowness you experience is due just to the slow
CPU, not to the Internet link.  ISDN would speed up the Internet
link, but wouldn't directly impact the system load.
steve
response 70 of 270: Mark Unseen   Apr 14 04:35 UTC 1995

   Yes Arnold, we need a faster CPU.  The Sun-4/200 SPARC CPU card
we now have at 2.5 (or 3?) times faster will let us deal with what
we currently do, at an almost reasonable rate. ;-)

   But, as we get news up, and then maybe (probably) increase the
size of our Internet pipe, the CPU factor will fall back behind
again, and we'll be slooow, only with maybe 70 users on instead of
40 for a reasonably slow day.

   In the end, its safe to say that Grex is probably never going
to have enough Internet bandwidth, and will only sometimes have
enough CPU for things.  Disk and memory we can grow as we jump
from platform to platform.
srw
response 71 of 270: Mark Unseen   Apr 14 07:10 UTC 1995

Yes, Grex needs both CPU and Link upgrades.
We have approved the CPU upgrade, but cannot afford a commercial
ISDN link upgrade based on our current membership level.
In fact, we'd have to double to afford it, and while that is possible,
I would say that it's more likely that we need to find a cheaper way to
obtain a better link.

Are the two connected? Yes. A better link will attract more users,
and that will bog the CPU down more. We will have to decide 
whether it makes sense to cap the number of incoming connections
to prevent this effect.

When Grex was a lot smaller, we decided to avoid this at all costs,
but I think we may be ready to reconsider it.


As a general rule, all load levels have a tendency to increase
until they are jbt (Just Barely Tolerable). This is not a phenomenon
limited to Grex, btw. It is the end result of human nature being
embedded in a giant negative feedback loop.
tsty
response 72 of 270: Mark Unseen   Apr 14 09:02 UTC 1995

Thank you for #66. REal information helps real perns.
davel
response 73 of 270: Mark Unseen   Apr 14 10:54 UTC 1995

The OBVIOUS solution, based on those stats, is to depermit inbound telnet,
right?
popcorn
response 74 of 270: Mark Unseen   Apr 14 14:48 UTC 1995

<chuckle>
 0-24   25-49   50-74   75-99   100-124   125-149   150-174   175-199   200-224 
 225-249   250-270         
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss