|
Grex > Coop7 > #68: A Few Partly-Baked Thoughts About Where We Are and Where We're Headed (115 Lines!) | |
|
| Author |
Message |
| 25 new of 137 responses total. |
rcurl
|
|
response 25 of 137:
|
Jul 6 17:01 UTC 1995 |
That is the classic dilemma of the all-volunteer non-profit: the step
to hire their first employees is fraught with difficulty and danger,
but it is eventually inevitable if the non-profit continues to grow.
Some hire just a secretary to do office work (membership records, etc),
but that is not where Grex's problems lie. Other's hire an Executive
Director whose *job* it is to raise the money to pay his or her own
salary ( as well as to make sure that everything else the organization
is supposed to do gets done, which means nourishing volunteers ). In
some instances, non-profits have made a sufficiently good case that they
have attracted foundation support (grant) for, say, a five-year
development program, which pays for the initial staff, but with the
expectation that the non-profit will be self-sufficient when the grant
runs out (a model for this is the new Southwest Michigan Land Conservancy).
This route is based upon an expectation of being able to develop a LOT
of community support.
My impression is that as things are run right now, Grex is not ready
for such a step. What is lacking is a clear image - and a development
plan - for what Grex wants to be in five years. Implied within such
a development plan would be the near 100%, enthusiastic backing of the
members of Grex, especially the "doers". I certainly see a good number
of enthusiastic doers, but not a strong common vision and goal. But
then, isn't that precisely what Dan is asking about?
|
adbarr
|
|
response 26 of 137:
|
Jul 7 00:58 UTC 1995 |
Yes. Now. You have no idea how strong you are. I agree with
Rane, except for one thing -- you are ready for the next step.
rcurl - srw - Steve - popcorn - danr have the "image" - now translate!
|
srw
|
|
response 27 of 137:
|
Jul 7 04:33 UTC 1995 |
There is, alas, some truth in Jan's point that the Grex staff is
operating inefficiently in cobbling together dumpster divings.
The staff would like to have better material, but whenever it
comes to spend money on stuff, the staff mostly argues against
making the transition away from Sun computers toward intel.
I am neutral on this question.
I personally dislike the Intel architecture immensely, but you
just can't beat it for price/performance. More and more systems
are going on-line running Linux, or BSDI, on Pentiums. They are
doing fine, too, because Linux has come a long way from what it
used to be, and BSDI is a fine OS if you can afford the
unlimitied user license, which is what we'd need.
Otoh, any change at this point will put us on a different architecture,
forcing recompiles of everything. By going to the Sun-4, we have entered
an architecture family at the bottom, and there is a lot of upward
mobility, especially as faster hardware gets pushed off loading docks.
|
mju
|
|
response 28 of 137:
|
Jul 7 16:46 UTC 1995 |
I think moving to an Intel platform would kill reliability.
I've yet to see an Intel box (one that doesn't cost as much as
a mid-range SPARC, at least) that is as reliable as the Suns.
|
tsty
|
|
response 29 of 137:
|
Jul 7 18:54 UTC 1995 |
besides, Grex is populated with Sun experts (imo) and losing that
expertise (regardless of overlapping expertise) is not in the
best interests of the Grex community, imo.
rcurl's #25 is perceptive, as is adbarr's #26. I +think+ that Grex
is currently somewhre in between the two, moving toward #26 but I
don't know how fast.
|
remmers
|
|
response 30 of 137:
|
Jul 7 19:27 UTC 1995 |
Re #28: What reliability? :-/
|
rcurl
|
|
response 31 of 137:
|
Jul 7 19:34 UTC 1995 |
Its open season on reliability.
|
mdw
|
|
response 32 of 137:
|
Jul 7 23:03 UTC 1995 |
I have particular and serious reservations about using pentium machines
for a "login" server. I believe it's a virtual certainty that current
or future versions of the pentium chip will be found to have serious
holes in the protection logic, similar to the floating point bug.
Unlike the floating point bug, I believe Intel will be able to justify
ignoring these bugs, due to the "small" part of their market that would
be in fact affected by such a bug. In fact, the only people who are
likely to be seriously affected are systems such as grex, and that's
such a small part of Intel's market, it's not even worth talking about.
I do see plenty of uses for the intel architecture in other things grex
is doing, such as our router. But I very much do not believe it's
appropriate as our *login* machine.
Fortunately, there are a number of other architectures that do make
plenty of sense. The sparc architecture is one such. The newest
machines are certainly quite competive with the intel machines, both in
terms of price, and performance. A large variety of older machines are
available in the used market. The very oldest machines are even
compatible with the very cheap memory that we have currently invested
heavily in, for our current sun-3 CPU. All of these machines are
software-wise, compatible, so, we only really need bite the
"incompatible software upgrade" bullet once. This is exactly what the
sun-4 we're looking into represents - that's the next logical step on
our upgrade path, and it puts us into the first step of using the sparc
architecture. If we care to spend real $$$, we can then make a massive
jump onto a much newer machine in this series, and having made this
step, that step will be *much* easier.
|
jep
|
|
response 33 of 137:
|
Jul 7 23:56 UTC 1995 |
I don't want to start an M-Net vs. Grex war, or even an Intel vs.
everyone else war, but Marcus, M-Net running on a 486/33 has had much less
security problems than Grex has had. Maybe there's a bug in the 486 or
Pentium that causes a potential security problem, and some hacker out
there who knows about it and who is lurking in wait for Grex to get onto
an Intel architecture machine so he can exploit it, but it sounds
farfetched, and seems to me you are worrying about a bugbear.
I've seen you write of such a possibility several times over the last
year. I know you know much more than I about computers and especially
Unix. I wonder if you know of an actual bug with an application that's
so serious that Grex cannot afford to take a chance with a 486 or Pentium,
despite the fact that it would run at least 20 times faster than what Grex
has now, work with all hardware you are likely to need or want, and have a
lot less problems, generally speaking, than Grex has now.
M-Net has, right now, working: 20 14.4K modems (using compression and
error correction) running on very cheap internal serial boards, 1 2 GB and
1 3 GB SCSI hard drives, 48 MB RAM, an ethernet card, and BSDI 1.1. In 3
years we have had 0 root breakins. It is uncommon for the load average to
go above 2.5, and not especially noticeable to me, running most
applications, whether it is running at 0.5 or 2.5. There were 64 users
logged on the last time I was on there, and the load average was around
1.5. This, I think, ain't bad.
|
mdw
|
|
response 34 of 137:
|
Jul 9 08:08 UTC 1995 |
Without a detailed comparison of m-net vs. grex system crashes and
security incidents, I think it would be quite impossible to support any
such claim that either m-net or grex have had "more" incidents.
Certainly, I'd feel quite uncomfortable making such a claim, because I'm
not privy to any such information at m-net.
M-net has been lucky in that its always enjoyed better access to its
hardware. Until recently, grex has been significantly handicapped in
access to its hardware, and that's really hurt us in quite a number of
ways. The problems we have today, with the router, and with the disk
drive, are really problems we've inherited from that era. I know I've
heard stories about an m-net upgrade to a faster CPU, that didn't work.
That's where good access really wins big.
So far as the intel architecture goes, ok, let's dig in. When arbornet
originally went live, they used a 186 based Unix system. The 186
doesn't have hardware protection, which means any would be dullard can
readily write a program to crash the machine. Sure enough, they did,
and as an initial result, the people running arbornet limited access to
the C compiler & other facilities, and in the long run, they shifted to
a series of slower 68010 & 68000 based systems. Performance is only one
leg of a triangle that also includes reliability and cost, and all 3
corners matter.
When the 286 came out, I knew friends who were working with the chip -
and I remember that, for a while, they were practically dealing with the
"chip du jour", many different masks, with a succession of problems.
Actually, there was one bug that Intel didn't fix until the pentium,
that allowed you to enable paging but stay in real mode. One company
even introduced a product based on this "bug" in the 386 & 486.
That's all in the past, of course, and what really matters is the
future. Unfortunately, the pentium is not exactly encouraging in this
regard. With all past chip generations, intel has generally been quite
open about their chip design - if you pay them for all the manuals, and
read carefully, you can expect to know as much as anybody about what the
chip will do. With the pentium, there is an "infamous" appendix H that
documents extra instructions and features of the pentium chip. You only
get appendix H if intel likes you, and you sign a secret non-disclosure
agreement with intel. Now, probably, most of that appendix is concerned
with instruction set timing, and it may only be that intel was so
embarassed about that part of their documentation that they decided to
only show it to their most trusted customers. But, since the rest of us
can't look at it, there's no way for us to know if there's some special
bug or feature of the protection logic that's crucial to an air-tight
design.
Intel can only document their intended features. Unfortunately, with
the pentium floating point bug, we have plenty of evidence how intel
might treat a crucial flaw in the protection logic. This is a flaw that
is fairly rare and would not likely appear unless you sought it out, but
it's also a flaw that can be readily reproduced in one line of code from
any number of high level mass market applications. Even so, intel has
always tried to underplay the seriousness of this flaw, and it was only
in the face of a massive public relations disaster that intel relented.
With a similar flaw in the protection logic, it's quite clear, intel is
not likely to do anything at all.
At various points, I've played around with successive generations of
intel based Unix systems, to see how rugged they are. The 286 is really
the first to have any credibility at all; indeed, its protection logic
is basically quite similar to the pentium today. When I tried some
mildly exotic things, I was able to readily crash a 286, by generating
some unusual kinds of faults that the kernel didn't know how to handle
properly. I'm not greatly surprised by that, the intel documentation is
not especially clear on fault handling, and the chip is complex enough
that there are a large number of odd exceptions that can happen. I've
tried similar things on more recent chips and operating systems, with
similar results.
Fairly recently, I had an opportunity to see the source code to an old
version of BSDI. I was quite fascinated to see a way, deep inside the
guts of the kernel, that would allow any application to gain access to
the I/O space of the machine. That would certainly allow a malicious
intruder to crash the machine. With enough cleverness, I think an
intruder could use that to break root.
Now, I'll grant you, I'm probably a lot smarter than the average
cracker. I think, too, the number of public access sites that actually
allow access to the compiler and encourage people to play around with
the Unix shell and such, is vanishingly small. So I'm not really
surprised that m-net hasn't had any obvious problems with holes in the
pentium chip, or in BSDI. Yet. But I also think that some of these
crackers are fairly bright, and have a *lot* of time on their hands. I
don't think anybody living in 1985 could have predicted the rise of
computer viruses. I'm not sure what we can expect to see in the next
decade as more and more software takes advantage of memory protection.
But I think it would be foolish to assume there will never be any
incidents involving memory protection.
I think M-net has been quite lucky in having as few problems as you
claim, and can be justifiably proud of that. I also think there is
every chance that the horrible things I've discussed above will not be a
problem for m-net. I don't think it's a certainty however. I don't
think grex's present architectural path has served it that badly - we
may be behind on CPU, but in fact, we're ahead on RAM; and really, the
largest bottleneck that faces both grex & m-net is not CPU but network
bandwidth. I think there's an advantage to be had in terms of having
grex & m-net do *different* things (if nothing else, it means a bug some
cracker finds in one system may not be present in the the other). And
I'm pretty confident that we'll be able to shortly overcome some of the
historical bugaboos we've been having and become quite a bit more
reliable.
One of the things I'm interested in seeing grex do is explore the
possibility of putting up a number of different machines and offloading
quite a bit of processing onto those machines. Eventually, that might
take the form of something like this:
+----- 3 file servers
|
+----- db server
|
+----- dns, mail hub
|
+----- news server
|
+----- 8 login servers
|
+----- 2 routers, & connections to outside world
|
+----- 4 terminal servers, and local terminals
this would be a far larger system than either m-net or grex today. But
there is a limit to how far you can expand a single system, and past
that point, you really need more than one machine. There *is* a certain
penalty to distributing services, so it makes sense to pick a solution
that scales well, and to have enough extra machines to make up for the
penalties of distributing service. This is kind of a miniature version
of the sort of computing environment that exists today at CMU, or UofM,
and so, in a sense, I look at this as the natural direction of things.
Today, it's clearly much too expensive, but computing hardware is only
going to get cheaper, and grex & m-net both show every sign of growing.
|
gregc
|
|
response 35 of 137:
|
Jul 9 08:23 UTC 1995 |
Interesting news Marcus: In this week's EE Times there is an article about
a fellow in Europe that essentially "reveresed engineered" the information
for the Pentium's "Appendix H". He designed programs and test cases and
watched the registers to deduce what was going on. Most of the people in
the know who saw the document agree that it's pretty close to accurate.
Apparently Intel build performance monitoring registers into the Pentium.
Using those it's possible to dynamically monitor cache hits/misses, pipeline
flushes, real instructions per second, etc. Internal timing info. The document
is available on the Internet. Apparently Intel has announced that they are
considering making Appendix H available to the public now in a month or 2.
|
aaron
|
|
response 36 of 137:
|
Jul 9 08:54 UTC 1995 |
re #34: I wouldn't have expected that M-Net would have experienced any
problems with "holes" in the Pentium chip.
|
mdw
|
|
response 37 of 137:
|
Jul 9 10:56 UTC 1995 |
Well, yes, good point - considering they're running a 486. :-)
Ah yes, performance monitoring - dangerous stuff - who knows what might
happen if it fell into the wrong hands? It's good to know that there
wasn't any really scary protection stuff there. However, it does make a
nice illustration that somebody, somewhere, probably *is* crazy enough
to figure out all the nooks and crannies of the hardware.
|
jep
|
|
response 38 of 137:
|
Jul 10 01:52 UTC 1995 |
If you remember the I/O bug in enough detail, Marcus, I'd appreciate
it very much if you'd e-mail information about it to me, or better yet,
to jfk@arbornet.org.
I appreciate your insights, and also your concerns. I'm not aware of
the details of Grex's security problems, the causes or the solutions. If
having better access to the hardware will help a lot in preventing those
problems, that's great, of course. I agree that it's a good thing that
Grex and M-Net are not subject to all of the same security problems -- the
question for Grex has to be, is it a good enough thing to make it
worthwhile for Grex to remain on the hardware that it's on? Are those
security advantages enough that it's worth not being able to use your
modems at their maximum speed, or to have files disappearing off Grex's
hard drive all of the time because the hardware cannot support the hard
drives that Grex is using?
I'm not a security expert, not on your level at any case, but I'm a
cheeky enough guy that I can still question how the staff of Grex has
evaluated it's security concerns, as a knowledgeable user, anyway. I
would think security should be measured in part by it's effect on
functionality. Is Grex so much more secure than it would be on an Intel
based machine that it's worth the heavy cost in performance and
reliability?
By the way, I don't agree 100% that M-Net is limited greatly by it's
slow Internet connection. For one thing, we're getting an ISDN connection
soon, we expect. But for another, our emphasis is on local service, not
global service. We've been offered direct access to CICnet's T3. The
prevailing opinion is that this would be bad for M-Net because it would
change the way people would use M-Net very drastically. I don't know the
end of this story yet. I don't even know what I want to do yet. My point
is only that a really fast Internet connection isn't an unmitigated
blessing.
|
mju
|
|
response 39 of 137:
|
Jul 10 02:46 UTC 1995 |
There is more involved in moving to an Intel machine than just making
the decision to do it. Such a move would cost around $2000,
I think, judging from the cost figures we came up with the last time
we looked at the Intel platform. In addition to buying a 486 or
Pentium motherboard, we would also have to replace all of our RAM.
Getting only 32MB of RAM for the new system would cost $1200;
to go up to the 128MB of RAM we have on Grex now would cost $4800,
or about ten times as much as a new motherboard would cost. We could
still use our existing disks, tape drive, and modems, of course,
so we wouldn't be completely starting over. But we would have to get
a multiport serial board, something that would probably cost
$400-$600. The Sun4 architecture is also much more scalable than
the Intel architecture -- after we move to the Sun-4/360, if we wanted
to replace Grex with a SPARCstation 20, we could do so simply by
buying the hardware and rebuilding the kernel. All of our binaries
would continue working just like they did on the 4/360.
|
popcorn
|
|
response 40 of 137:
|
Jul 10 12:49 UTC 1995 |
I've heard that BSDi makes better use of memory than SunOS, so we could
probably get by on something closer to your 32 megabyte estimate, rather
than the full 128 megs.
|
gregc
|
|
response 41 of 137:
|
Jul 10 15:08 UTC 1995 |
Hardly. If anything, SunOS has one of the better memory management schemes
around. But even if not, there is absolutely *nothing* BSDI could do to
make 32meg of memory perfrom as well as 128meg.
|
marcvh
|
|
response 42 of 137:
|
Jul 10 17:38 UTC 1995 |
I find it unlikely that static linking would produce efficient use of
memory, though I'm told the CISC/RISC distinction helps some.
|
ajax
|
|
response 43 of 137:
|
Jul 10 19:49 UTC 1995 |
Regarding prices in #39, you can get 30-pin 70ns parity SIMMs for
$25/meg or less, so 32 megs would be more like $800 (M-Net seems to
do okay with 48MB with a few more users than Grex), and used 16-port
Digiboards for the PC go for more like $350 (you're probably on
target for new Digiboards though).
|
chelsea
|
|
response 44 of 137:
|
Jul 10 22:12 UTC 1995 |
M-net made a very good decision when they moved to a new 486 a few
years ago. They probably didn't even anticipate back then that
within a few years they'd be down to a very precious few techies
capapble of futzing with an older, non-standard setup. Yet that's
where they are. Lucky ducks.
I would think Grex could learn something from their experience.
That maybe right along with the money and reliablity we must
assume that our volunteer Sun staff might not always be so
generous with their time. Would it be easier to find qualified
staff on any other platform? Would buying new save us from needing
as much technical expertise?
In my humble opinion, our weakest link at this point is not
community support, supply vs. demand, or even available money.
It's the very small group of folks who essentially keep the old
girl going, in their spare time, as best they can. Grex needs
to move beyond what these few can do.
|
mdw
|
|
response 45 of 137:
|
Jul 11 06:27 UTC 1995 |
BSDI is the commercial form of 386bsd; the non-commercial forms are
freebsd & netbsd. From what I've seen of the memory management in
successive versions of freebsd & netbsd, I'd have to say it's definitely
not of the same quality as that in SunOS, and the usual rule of thumb
I've seen in both is "get enough memory to avoid paging". I'd guess the
BSDI folks have had to, and have put a lot of work into making the
paging work properly. I very much doubt however, that it is
significantly better than that of SunOS, which has, after all, had a lot
more flight time.
Freebsd has supported shared libraries for at least a year now. I
believe the present versions of netbsd & bsdi do as well.
I definitely would not expect 32 megs to do for more than a very small
number of users. There are too many modern applications, such as pine,
most of the gnu utility set, and so forth, that think memory is "free".
It is definitely the case that the majority of grex users have for some
time come in via the internet. If that's not true on m-net, than that
indicates a pretty fundemental difference in the users.
|
gregc
|
|
response 46 of 137:
|
Jul 11 13:40 UTC 1995 |
Just a minor correction Marcus. BSDI was orriginally called BSD/386. It
began life from the BSD4.3reno source code. At the same time, the Jolitz's
also began work on 386Bsd from the same sources. NetBSD and FreeBSD are
decendants of 386BSD, but are very different from BSD/386. They are
the product of 2 completely different development groups.
NetBSD currently supports Sun style Dynamic linking. BSDI announced that
it would support shared libraries in BSDI 2.0(the 4.4lite version), but when
it arrived, it turned out that they only implemented *static* linked shared
libraries. A technique that I find a PITA to use.
|
ajax
|
|
response 47 of 137:
|
Jul 11 14:08 UTC 1995 |
I don't know about the theoretical issues, but in real life,
M-Net supported quite a few users (40+?) on 32MB before things
bogged down, and it now handles 60ish users on 48MB well, while
Grex bogs down at 60ish users with 64MB (yes, *LOTS* of factors
in play here; my point is 32MB or 48MB go pretty far with BSDI).
Security and reliability issues also seem more theoretical than
real life when you compare M-Net's Intel/BSDI to Grex's Sun.
|
mdw
|
|
response 48 of 137:
|
Jul 13 10:30 UTC 1995 |
I'm not sure when BSDI & Jolitz split off; but it certainly wasn't
4.3bsd. It may have been the "networking 2" tape; or it may have been
at some other point. If the two did split off that early; they
certainly borrowed pretty freely. Jolitz is identifed by name in the
BSDI 1.x kernel source; and the vm subsystem in both bsdi & bsdj is not
the 4.3bsd vm code at all, but one derived from mach, and adopted for
use in bsd by folks at the University of Utah. Clearly, there are
differences - and especially in the device drivers you can tell that
bsdi branched off well before bsdj, and netbsd & freebsd in turn are
rapidly diverging from their common ancestor bsdj.
The most important difference between freebsd/bsdi/bsdj/netbsd/whatever
is not the actual code, or its history, but the price, support, &
development goals of the software. Netbsd has been ported to many
platforms, but it was not intended for heavy production use and isn't
particularly robust. The freebsd people have concentrated on the 386
platform, and their product is easier to install and more robust. Both
products are free, and you get no support other than what you can do
yourself. Bsdi cost money, but you get far better support. The bsdi
folks are willing to take responsibility to make their product work for
the customer, not something either the freebsd or netbsd folks are
prepared to do. M-net had money, users, and a burning need to replace
their suddenly defunct hardware, but a comparitively small technical
base. Bsdi makes perfect sense for them. Grex started off with quite a
few technically savvy people, hardware loans and donations, but little
money. I believe SunOS has done us well.
Rob mentioned "security/reliability"; they seem theoretical until you
have to actually deal with them. I've seem some fiendishly clever
crackers; I've spent plenty of time making things more reliable. One of
the things I've learned is, if you're having problems, you analyze the
problem, and you attack the root causes. If you don't attack the root
causes, trying to make massive changes in things has every chance of not
fixing the root cause, and is very likely to introduce new and even more
serious problems. In this case, the root cause is not sunos; if you
look around the country, you will find numerous universities, colleges,
and freenets that have found sunos is perfectly capable of delivering
massive amounts of computing power in a thrifty and reliable manner.
With grex, you're seeing trailing edge technology. Here's the leading
edge technology:
a SunStation 20
solaris 2.4
dual processors.
64 M ram.
I'm not sure of the cost -- $10k? perhaps 20k? This is a machine that
will readily deliver high quality computing (ethernet speeds) to 50-100
users. Or, if we downscale our expectation to grex speeds, probably
200-500 users. Solaris has some significant differences from sunos.
The important thing, though, is that it supports SMP, which means the
dual processors of this configuration will cream a pentium in terms of
throughput. Solaris has been less reliable than it ought to be. With
2.4, however, it seems they've finally getting things right. However,
that's one of the prices you pay for using leading edge technology, you
get to debug it. If you want to be less aggressive: back down to one
processor, & sunos. It's still a highly competitive machine, and it's
using extremely well debugged software. This is a well tested, well
respected configuration, and really, basically the standard in the
industry. Speaking of "well" tested- actually, the well has been using
solaris for some time now with no problems. This is the state of the
art; and it's really not that far out of our reach financially.
|
janc
|
|
response 49 of 137:
|
Jul 14 22:02 UTC 1995 |
If you were a fiendishly clever cracker, why ever would you be trying to
crack M-Net or Grex? We don't store any big secrets here. In any case,
fairly dopey crackers manage to bust into M-Net and Grex regularly enough,
so super-geniuses taking advantage of subtle architecture flaws is probably
not a dominant concern.
|