|
|
| Author |
Message |
| 25 new of 270 responses total. |
sidhe
|
|
response 50 of 270:
|
Apr 11 22:19 UTC 1995 |
I don't know about any of you, but I ised to rely on the swiftness
of e-mail being my only hope of contacting someone online, who
has their write perms set to off, WHILE they're still on.
It sounds as if this is now not a likely means of quick contact.
I'm with mdw.
|
remmers
|
|
response 51 of 270:
|
Apr 11 23:18 UTC 1995 |
Right -- I think that is a common use of mail. Are we sure that the
benefit of queuing mail is great enough to justify removing this feature?
|
steve
|
|
response 52 of 270:
|
Apr 12 02:57 UTC 1995 |
No, not completely.
Greg did this as a test, which was certainly valid. I think, but
haven't seen or dug up numbers to prove that we're basically getting
a little performance pickup by doing this, at the expense of having
hundreds of peices of mail in the queue at busy times (like, as I write
this).
I too like the immediacy of mail. If there isn't a real improvement
on queued mode we should probably go back to 'immediate' mode (sorry,
I forget the correct term).
|
mwarner
|
|
response 53 of 270:
|
Apr 12 02:59 UTC 1995 |
If this has become a sort of quicky survey, I'd have to say "let the mail
flow". The juggling doesn't *seem* to resolve fundamental load issues,
but introduces a load of uncertainty that didn't exist before. I liked
the idea in principle.
|
steve
|
|
response 54 of 270:
|
Apr 12 04:06 UTC 1995 |
Heh. The fundemental load issue is that Grex runs with 50 people
on every night now.
|
lilmo
|
|
response 55 of 270:
|
Apr 12 05:43 UTC 1995 |
steve, I think "background" is the word you are looking for... would
someone technical comment on #48, and could we distinguish between local
and non-local e-mail? Say, deliver mail to grexers at grex in the
background, and queue the rest?
Just trying to brainstorm, here...
|
carl
|
|
response 56 of 270:
|
Apr 12 10:10 UTC 1995 |
I'm with Mike in 53.
|
nephi
|
|
response 57 of 270:
|
Apr 12 10:39 UTC 1995 |
I liked mail much better before the test, as well.
|
popcorn
|
|
response 58 of 270:
|
Apr 12 12:06 UTC 1995 |
If this is a quicky survey, my response is: The new style of mail delivery
is working fine for me. The old style was fine too.
I very slightly prefer the new style because it batches up the mounds of
messages and dumps them into my mailbox all at once, with breathers in
between, rather than continually dribbling it in.
|
mdw
|
|
response 59 of 270:
|
Apr 12 22:20 UTC 1995 |
It would be possible to redesign the delivery algorithm (using the rules
suggested in #48 or #55) but it would be a lot of work. There's no
point in doing any of that work without objective numbers and
appropriate usage allowances - ie, scientific measurements such that the
effects of different mail algorithms can be quantified, analyzed, and if
necessary, dissected.
There are also some alternatives worth considering: for instance, if CPU
usage is a major concern, it would be better to sink the effort into
upgrading the CPU. One of the major components of CPU load with mail is
not sendmail, but pine. Indeed, so far as that goes, it would make
sense to put some serious research into why "pine is big/slow".
|
robh
|
|
response 60 of 270:
|
Apr 12 22:34 UTC 1995 |
My, that's interesting. Is Elm a comparable resource hog?
How about mh?
|
popcorn
|
|
response 61 of 270:
|
Apr 12 22:51 UTC 1995 |
Pine leads them all. Greg did some measurements with more details;
maybe we can get him to post them here....
|
davel
|
|
response 62 of 270:
|
Apr 13 01:26 UTC 1995 |
Please do, Greg, if you've got them on line somewhere.
(I always *knew* pine was evil ... )
|
sidhe
|
|
response 63 of 270:
|
Apr 13 01:44 UTC 1995 |
Pine isn't evil.. just SLOW.
|
jweiss
|
|
response 64 of 270:
|
Apr 13 06:11 UTC 1995 |
In response to #47. Sendmail 8.6.12 has actually been out for a while,
so I assume you mean it will be comming shortly to grex. In fact it only
has a few small patches, which I would not expect to affect delivery speed.
It would appear that grex has not yet taken 8.6.11, however, as I recall
that has only a few more patches than 8.6.12. They both appear to be trying
to fix small bugs introduced in 8.6.10 which closed several security holes.
|
srw
|
|
response 65 of 270:
|
Apr 13 07:12 UTC 1995 |
From info I heard at the Grex staff meeting, it seems that 8.6.11
fixes some security holes, as Jonathon says. It apparently introduced
some new bugs, which are what 8.6.12 fixes. So by jumping from 10 to 12
we hope to get the holes fixed without adding any bugs.
We are not expecting performance changes from this upgrade.
From responses here in coop regarding queued delivery, we will be
ending that experiment and going back to non-queued delivery.
No long-term performance changes expected there either.
Nothing short of a Sparc CPU or a separate box for mail is really
going to help much at this point.
|
popcorn
|
|
response 66 of 270:
|
Apr 13 12:30 UTC 1995 |
Here are the statistics Greg put together, reprinted without his permission.
Item #205: Where have the cycles gone? Long time passing........
Greg Cronau (gregc) Mon, Mar 20, 1995 (07:04) - 129 lines
...sucked up by users every one. When will they ever learn.......
We've been doing alot of speculating lately about "just *what* is slowing
this machine down?". Most of it has been gut level guesses from most of
us. I decided to get some hard data. I used Marcus's acctcom.x program to
get a dump of the accounting file (pacct) for 03-16-95. Nothing special about
that date, it's just when I decided to do it. That file is restarted at
00:15 each night, I start the dump a little before midnight, so I got
almost a full 24 hours worth of data.
Here's the relavant statistics:
Start of sampled data: 03-16-95 00:15:08
End of sampled data: 03-16-95 23:54:17
Duration of sample period: 85149 seconds (0.986 days)
Total number of programs run: 77410
Total unique programs run: 225
Total CPU seconds utilized: 61518.4
% of total available CPU utilized: 72.2%
Top 50 programs accounted for 95.1% of usage.
Breakdown of top 50 programs sorted by CPU usage:
Number of Total CPU Percent of
Command Times Run Seconds Used Total CPU Used
=======================================================================
in.telne 1763 8684.62 14.12
sendmail 12368 6167.86 10.03
pine [Note 3] 378 4272.12 6.94
bbs 658 4095.39 6.66
find [Note 1] 17 3652.81 5.94
party 553 3024.89 4.92
du [Note 2] 9612 2973.37 4.83
finger 1001 2875.54 4.67
csh 1950 2199.50 3.58
sh 9213 1645.30 2.67
w 2014 1594.95 2.59
mail.loc 2862 1473.34 2.39
less 3031 1279.85 2.08
elm [Note 3] 498 942.88 1.53
tcsh 532 798.03 1.30
gzip 10 774.70 1.26
lynx_ 150 702.95 1.14
mail 538 689.44 1.12
cat 2587 658.33 1.07
ls 885 610.05 0.99
login 317 605.86 0.98
tset 1433 556.23 0.90
mesg 2669 553.38 0.90
stty 3225 527.02 0.86
more 1656 513.59 0.83
bash 155 468.33 0.76
pico.rea 196 465.30 0.76
ps 172 452.86 0.74
clear 2435 419.64 0.68
echo 2648 410.78 0.67
sort 25 403.96 0.66
rm 1764 354.77 0.58
write 418 319.59 0.52
ntalk 188 300.66 0.49
vi 211 287.66 0.47
who 838 271.88 0.44
in.ident 224 271.28 0.44
df 363 252.70 0.41
perl 28 200.94 0.33
last 28 188.14 0.31
nroff 23 180.36 0.29
in.finge 327 174.49 0.28
sleep 1016 170.97 0.28
tty 797 170.37 0.28
egrep 193 157.04 0.26
screen 16 154.84 0.25
cc1 16 149.36 0.24
ftpd 58 140.43 0.23
newuser 55 136.76 0.22
grep 293 126.97 0.21
Note 1:
Even though "find" was only run 17 times, one of those runs was the
single largest use of cpu time during this period. It ran for a total
duration of 12872 seconds and consumed 2827.81 seconds of CPU. This
undoubtably is the nightly run of the updatedb that updates the database
used by the "locate" program. We are going to have to consider some way
to scale this thing back.
Note 2:
9612 runs of du?!? Either someone is obsessed with disk space or it's
some kind of cron job. When I went back to the log file, I found that most
of these occured as a "storm" starting around 5:33am. They were all run as
root. We changed the root passwd recently and i forgot the damn thing, so I
can't check root's crontab. I suspect this is being caused by some cron job
that does a du on each user's home dir. This is wasteful, we could do the
same thing with "du -s /home/*".
Note 3:
Elm was run 32% more often than pine and yet pine used over 4.5 times
as much cpu resources as elm. What is pine doing???
More notes:
There is some data missing here. The data I have only shows jobs that
ended during the sample period. Any job that was started, but didn't
complete until after the end of the logfile, didn't get recorded. Also
any job that was running *before* the log was started and continued to
run through the whole period and never ended, was also never recorded.
We have many background processes that are started at boot and never
exit. The named, ntpd, httpd, etc, etc. Fit in this class. So it's probably
likely the "Percent of CPU utilised" is probably several percent higher.
This is only half the problem. This information does not consider
disk usage. A program that really hammers the disks like "find" will
seem to slow the system down more that a program like "gzip" which does
most of work in memory compressing data.
The telnet daemon is our biggest cpu user. Not entirely surprising.
I wonder if there is a faster telnet daemon we can install?
If you add up the amount of time used by sendmail, pine, elm, mail,
mail.local, and pico.real, you get 23% of the total usage. If you factor
in the amounts from vi, more, less, and the various shells that are used
while people read and write mail, it's a safe statement to say that at
least %25 of our total processing is related to mail.
|
adbarr
|
|
response 67 of 270:
|
Apr 14 01:19 UTC 1995 |
I should not be in this pool, but --
Grex needs cpu with much faster speed - regardless of
whether an ISDN line is plugged in? Or will that (ISDN)
help drain the load, somehow? Is it both, or one, or
neither? Trying to understand priorities. Thanks.
|
robh
|
|
response 68 of 270:
|
Apr 14 02:13 UTC 1995 |
I'm shocked to see that trn_real isn't on there, I thought
that even with the few people who use it, it would still be a
horrible resource hog. Or is that uunder the domain
of the telnet daemon?
|
ajax
|
|
response 69 of 270:
|
Apr 14 02:35 UTC 1995 |
Arnold, as I understand it, both a faster CPU and ISDN would help
speed Grex up, but if you're dialed in directly through to a modems,
then most of the slowness you experience is due just to the slow
CPU, not to the Internet link. ISDN would speed up the Internet
link, but wouldn't directly impact the system load.
|
steve
|
|
response 70 of 270:
|
Apr 14 04:35 UTC 1995 |
Yes Arnold, we need a faster CPU. The Sun-4/200 SPARC CPU card
we now have at 2.5 (or 3?) times faster will let us deal with what
we currently do, at an almost reasonable rate. ;-)
But, as we get news up, and then maybe (probably) increase the
size of our Internet pipe, the CPU factor will fall back behind
again, and we'll be slooow, only with maybe 70 users on instead of
40 for a reasonably slow day.
In the end, its safe to say that Grex is probably never going
to have enough Internet bandwidth, and will only sometimes have
enough CPU for things. Disk and memory we can grow as we jump
from platform to platform.
|
srw
|
|
response 71 of 270:
|
Apr 14 07:10 UTC 1995 |
Yes, Grex needs both CPU and Link upgrades.
We have approved the CPU upgrade, but cannot afford a commercial
ISDN link upgrade based on our current membership level.
In fact, we'd have to double to afford it, and while that is possible,
I would say that it's more likely that we need to find a cheaper way to
obtain a better link.
Are the two connected? Yes. A better link will attract more users,
and that will bog the CPU down more. We will have to decide
whether it makes sense to cap the number of incoming connections
to prevent this effect.
When Grex was a lot smaller, we decided to avoid this at all costs,
but I think we may be ready to reconsider it.
As a general rule, all load levels have a tendency to increase
until they are jbt (Just Barely Tolerable). This is not a phenomenon
limited to Grex, btw. It is the end result of human nature being
embedded in a giant negative feedback loop.
|
tsty
|
|
response 72 of 270:
|
Apr 14 09:02 UTC 1995 |
Thank you for #66. REal information helps real perns.
|
davel
|
|
response 73 of 270:
|
Apr 14 10:54 UTC 1995 |
The OBVIOUS solution, based on those stats, is to depermit inbound telnet,
right?
|
popcorn
|
|
response 74 of 270:
|
Apr 14 14:48 UTC 1995 |
<chuckle>
|