|
|
| Author |
Message |
janc
|
|
Next Grex
|
Jun 6 15:30 UTC 2001 |
I'm going to link this item between garage and coop, because though I think
it is basically a policy issue, the discussion is going to be deeply
technical.
I'm thinking it may be time to start a project to build the next Grex system.
I'm thinking it should have the following key characteristics:
(1) New Operating System.
Grex has run on SunOS 4.1 since it's inception. It has been
remarkable for its stability and we have a lot of work invested
in writing code for it. Moving to a new operating system is going
to take a lot of porting. But SunOS limits us in many ways:
- Can't handled disk partitions bigger than 2 Gig.
- Can only have 64,000 user accounts.
- Poor support for multiple processors.
- Disk quota thing doesn't perform well.
- Source code not available.
- Does not support much of any machines more modern the th 4/670.
(2) New Location.
I think we should be looking toward moving Grex into a colo situation
and getting rid of the pumpkin. There are a lot of issues we'd need
to resolve to do this, so I don't think it is a viable short range
solution for our current net connectivity worries, but I do think
it is the right way to go in the longer range. The next grex should
be designed and configured to run in a colo environment (though it
may end up still living in the pumpkin for a while).
(3) New Computer.
I actually think the software limitations of SunOS are bigger than
the hardware limitations of the 4/670. However, newer operating systems
are unlikely to support the 4/670, and the 4/670 is physically poorly
suited for moving to a colo location (yes, we could get it into a rack
mount, but it would be a lot of work and the result would still be a
great hairy beast).
The leading contenders for the nextGrex operating system are:
Solaris: This is Sun's successor to to SunOS. It differs a lot. If we
stay on Sun hardware, this is probably the least painful option.
It had a stinky reputation when it was new (which was why Grex
started out on SunOS instead of Soalris) but that has changed
and I think it is very dependable these days. Biggest problem
is that it is not open source. Kernel patches could be harder to
do that for SunOS (which was also not open source, but which was
so close to BSD unix, which is publicly available, that Marcus
could make kernel patches work anyway).
OpenBSD: Of the open source Unixs, OpenBSD has by far the best reputation
for security. There is a release for 32-bit sparcs, but it seems
like it might be substantially less stable than the x86 release.
There is no release for the 64-bit sparcs (UltraSparc). Does not
support multiprocessor intel systems. Dunno about Suns. Generally
the range of hardware supported is not spectacular.
NetBSD: Lacks the strong reputation for security that OpenBSD has, but has
support for a broader range of hardware, including Sparc and
UltraSparc. Support of multiprocessors unclear.
Linux: Wider range of hardware, lower reputation for security. Does
support multiprocessor Sparc and UltraSparc systems.
Part of the issue with the open systems is the shear number of people them
and working on them (these mostly correlate in open source). More usage means
faster development and support for wider ranges of hardware. It also means
more security holes being created, reported, and fixed. The sparc versions
are indicative of this. UltraLinux can be bought from Redhat or Mandrake or
several other vendors in a shrinkwrapped box. OpenBSD/sparc is more of an
experimentors toy.
I think the viable options are:
Sun hardware / Solaris
Only if Marcus can handle kernel mods needed. This is the most stable
operating system to use on sun hardware.
Sun hardware / Linux
My impression is that the other open source distribution's Sparc
versions have such a small user base that if we encountered problems
we'd find inadequate support. We'd have to do a very careful job
of securing the system, and would have to keep very much on top of
all new bug announcements. I think only UltraLinux has a development
community large enough to have a decent level of attaining the high
level of stability that we need. It may be that using a Sparc Linux
instead of an Intel Linux would be unusual enough to confuse some of
the more stupid hackers.
Pentium hardware / OpenBSD
If we are going to build a pentium system, might as well build one that
OpenBSD will work on.
I'd mostly prefer Sun hardware for its general high quality, but Pentium
boxes have a performance advantage.
To fit in colo space, I think we want to keep a fairly compact box, as much
as possible all in one box. Get rid of all the full-height 2Gig disks and
go with a smaller number of larger disks.
One thing I wonder about with colo - do we have to go there to do backups?
If we go to colo, we get rid of almost everything Cyberspace Communications
currently owns. Very little of the stuff currently in the pumpkin would be
relevant to the new machine. It should almost all be sold, given away, or
trashed.
|
| 250 responses total. |
keesan
|
|
response 1 of 250:
|
Jun 6 16:26 UTC 2001 |
Does it make sense to choose a system that requires a Marcus to operate?
|
jp2
|
|
response 2 of 250:
|
Jun 6 17:06 UTC 2001 |
This response has been erased.
|
jp2
|
|
response 3 of 250:
|
Jun 6 17:09 UTC 2001 |
This response has been erased.
|
devnull
|
|
response 4 of 250:
|
Jun 6 19:16 UTC 2001 |
Saying that openbsd has a better security reputation than netbsd is also
probably unfair. There have been cases of security bugs getting fixed
in netbsd before they get fixed in openbsd, I believe. It is accurate
to say that the openbsd folks run around saying that they're more secure,
but that's about as much as I'm convinced is actually true.
NetBSD has started running multiprocessor on several architectures within
the last couple months. It seems highly likely to me that NetBSD will be
running multiprocessor on whatever hardware grex would care to use by the
time grex actually gets around to changing hardware/operating systems.
Having a system that could run without Marcus is certainly desireable, but
is there anyone who isn't Marcus who's offering to do the needed kernel
patches for any operating system?
I'm also really unclear on why there's an assumption that hardware and OS
need to be choosen together. NetBSD certainly does an excellent job of
making all hardware look the same, other than performance differences; I'm
not sure about the others.
I run netbsd on both sparc and x86, and it works quite well for me. I haven't
used it on any sparcs newer than the sparcstation 5, though.
|
devnull
|
|
response 5 of 250:
|
Jun 6 19:27 UTC 2001 |
NetBSD appears to support the 4/600, which I assume is somewhat different
than the 4/670, but I see no mention of the 4/670, either in the list of
what NetBSD does support, or in the list of what it doesn't support.
Another question in my mind is whether grex should just buy a modern SCSI
disk at some point. On the most recent hard drive I installed in
fencepost.gnu.org (a 36GB drive), I read 100 megabytes off the front of
the drive as an initial test, and it finished in a bit under four seconds
(in single user mode). It took about 25 minutes or so to read every block off
the disk to verify it, before I started copying data to it.
I know that Marcus believes that having multiple disks is good for performance,
but I suspect that a single modern disk will give you better performance
than a half dozen 10 year old disks.
|
other
|
|
response 6 of 250:
|
Jun 6 19:56 UTC 2001 |
So long as backups are frequent and regular, we shouldn't have too much
problem with HD consolidation.
|
blaise
|
|
response 7 of 250:
|
Jun 6 21:51 UTC 2001 |
What is there about FreeBSD that makes it less worthy of consideration than
either NetBSD or OpenBSD? It seems to be more widely available/used.
|
gull
|
|
response 8 of 250:
|
Jun 6 23:13 UTC 2001 |
I wondered that, too. It doesn't run on Sun hardware, but I think it has
multiprocessor support on Intel hardware. I don't think it's as mature or
complete as the Linux multiprocessor support, however.
The main reason OpenBSD is considered more secure, I think, is that they're
the only *BSD OS to have actually completed a full security audit. However,
NetBSD generally benefits rapidly from OpenBSD's efforts; the code is enough
alike that bugfixes for one are quickly migrated to the other.
One has to be careful, when considering NetBSD, not to lump all the ports
together. Some of them are very good, but it runs on so many systems that
there are some pretty lousy implementations out there, too.
|
kaplan
|
|
response 9 of 250:
|
Jun 7 00:06 UTC 2001 |
Sun's web site doesn't exactly say that Solaris 8 is open source, but
they do say you can get the source from them. They don't come out and
say if it's OK for a non-profit organization to modify the kernel. Can
any one else make more sense of their web site than I can?
http://www.sun.com/software/solaris/source/faq.html
|
mdw
|
|
response 10 of 250:
|
Jun 7 01:31 UTC 2001 |
I think the 4/600 is merely a way to say the "4/6xx" series including
the 670 and 690. The 670 & 690 look the same to the software and it's
the same CPU card (which is really where *all* the smarts live); it's
only the physical box that's different. I'm not sure how much this
matters; the 6xx is *old* now, and one of the limiting factors with 6xx
is that to get equal performance to what we have, we need to support MP.
With a modern fast uniprocessor, this goes away.
I think it's wrong to go to "colo" at this point, and I think it's very
wrong to design a hardware decision around it. Perhaps that discussion
should go in a separate item.
One thing missing from this is any discussion of a distributed
environment. A distributed environment would give us the ability to use
lower cost hardware, and more redundancy and expandability. Towards
that end, we should be investigating kerberos (core technology for
distributed authentication), and I believe AFS (distributed file
system). Properly used, many parts of a distributed environment need
not be as secure and robust as a monolithic uniprocessor machine. Linux
on a cheap pentium might make a dandy AFS fileserver, for instance -
most of the things that make it less secure shouldn't be turned on for
such a machine.
For hardware, some important considerations are: cost, performance,
reliability, and expansion. Low end PC hardware looks cheap. There's a
reason. By the time you price out reasonable memory, reasonable box,
SCSI, &etc., much of the price advantage goes away. With memory, the 2
big issues are: enough? And ECC or at least parity. For disk, IDE
really only works for a 1-2 spindle system, with modest performance
needs (ie, one user). I'm sure I don't need to reiterate my concerns re
Intel. I wish I could point to a reasonable bare-bones PPC motherboard;
there are indeed not as many choices as I would like.
Historically, what we've relied on for the bulk of our hardware
acquisitions is strategic donations. The advantage is of course
incredible bargains. Not many places could use Sun-3/260's in weird
boxes, but we were certainly appreciative. This makes "planning" a bit
more of a chancy event.
I definitely don't think something should be designed with a dependency
on ME. We don't have that with SunOS today; it was actually Steve Weiss
that coded the TCP block code. Solaris, Linux, and the various *bsd's
are all available in source, so in a sense I should become less
critical.
I don't know that we want to get that much involved in a security
dispute of openbsd vs. netbsd vs. freebsd. Historically, netbsd came
first and was supposed to be more of a "research" effort; freebsd came
next and was intended to be more stable on x86 hardware; openbsd came
last and was intended to correct perceived security shortcomings in the
first two. At the moment, openbsd *sounds* like it best matches our
needs (ie, "security"), and while grex staff has historically
experimented with all of these, openbsd has worked the best for us.
We'd certainly be interested in evidence either of the other two or some
other other solution might better meet our needs.
One fact bears mentioning: in a sense, we have *more* rigorous security
needs than I think any of these groups recognize. We fully expect to
have hostile users with full shell access that we can't kick off the
system. Nearly all computer environments elsewhere have trusted *users*
and untrusted *outsiders* and can make it a legitimate goal to deny
shell access to hostile users. We also have some serious scaling
issues; 20000+ active users, with 8.5 new accounts per hour (plus 4.3
pwchanges/day). This means we have a rather large and constantly
changing password file. This isn't typical of even most universities,
which tend to have various batch processes (and not self-serve web
tools) to handle account administration. What this all means is we
shouldn't expect *any* solution to work off the shelf without changes
for us. We should expect changes, and we should expect some of those
changes will be hard and challenging, and we shouldn't expect any real
outside help (except source availability) to solve those problems.
|
janc
|
|
response 11 of 250:
|
Jun 7 03:37 UTC 2001 |
My assessments of the quality of various unix versions is not based on
a huge amount of experience. I'm quite likely wrong on all of them.
FreeBSD isn't on the list because I forgot about it. I'd forgotten that
Solaris source is available, though I had some discussions some years
back with a Sun rep who thought donating a copy of source to Grex might
be a possibility.
However, though I admit that I'm no expert, you'll need to do more to convince
me that OpenBSD/sparc is stable enough for us than tell me that it is Theo's
main server. It would help if Theo had anything like the number of users we
have including a similar number of users doing really nasty things. Alot of
operating systems that work fine for light usage would flounder under Grex's
load.
There are some things we are entirely dependent on Marcus for. A Picospan
version that runs on the new system would definately require Marcus. Kernel
blocks probably wouldn't, if we had a source license. The queuing telnet
daemon would be challenging for anyone but Marcus to port, I think.
I think the colo issue does tie into this. I think there are only two
compelling reasons for replacing our current hardware (1) wanting to move
to an OS that doesn't work on the 4/670, and (2) wanting to colo. And
the first reason isn't all that compelling, because most of the operating
systems mentioned here would work on the 4/670. There would obviously be
performance advantages to moving to newer hardware, but right now I don't
see any signs that Grex has an immediate need for better performance.
We pay about $120 a month in rent and electricity. Freeing up that money
might buy us a colo space with better connectivity. Doing a deal like the
one M-Net has for dialup lines has distinct advantages - the dialups are
local to a larger area. While perhaps discussing colo in a separate item
would be better, I do think that it relates strongly to the hardware choices
we make.
|
devnull
|
|
response 12 of 250:
|
Jun 7 04:49 UTC 2001 |
Re #10: Yes, it would be nice if you could get a decent non-apple
PowerPC motherboard cheap.
I've gotten the distinct impression that freebsd is not as stable as
netbsd. There was a period of at least several months within the last
few years when their -current branch ate its filesystem. Yes, it was
the ``development'' tree, but I'm not aware that netbsd has had
similar problems, and indeed netbsd *very* conservative about testing
the softdeps code. (It took a while before they'd tested it enough
that they even pulled it up into the -currrent branch.) I gather that
there are different attitudes in the two groups wrt stability of the
-current branch; while NetBSD doesn't promise that -current is
reliable, the developers are also generally supposed to not be
checking in destablizing changes on the -current branch.
I'm unconvinced that openbsd exists to correct shortcomings in freebsd
and netbsd. I thought the real reason it exists is to deal with
certain personality conflicts.
I also seem to recall hearing about security bugs that have been
discovered that are present in versions of openbsd that have allegedly
gone through a full security audit. And again, the netbsd and openbsd
groups tend to share bugfixes to some extent anyway.
(Of course, I know a number of the netbsd developers, which certainly
affects my perception of things.)
I'm a bit unclear on why a distributed environment is percieved as a
huge win. A distributed environment is very clearly a bigger
maintainence hassle in my experience. The FSF used to have a group of
machines that shared filesystems via NFS; that's now all collapsed
into fencepost.gnu.org, and we just don't bother with NFS. The
machine runs an ftp server for alpha.gnu.org, it handles all the mail
for gnu.org (more or less; we have a secondary MX that handles some of
the simple forwarding cases when fencepost is down, but if fencepost
is up, it handles all the mail; it handles mailing list archives; it
handles running emacs for people who want to read their mail by
logging into an FSF machine and running emacs; it probably does a
bunch of other random stuff I'm forgetting; and the machine basically
sits around and twiddles its thumbs every day. (Load average is about
.09 right now.) Someone installed a non-SMP kernel, and I complained,
but I'm not really entirely convinced that there's much point in
bothering to run an SMP kernel on the machine.
On the other hand, I'm not convinced that we can use a single machine
for ftp.gnu.org and get rid of the annoying problem that we have to
give a substantial number of users a message that the server is busy
rather than letting them connect.
|
mdw
|
|
response 13 of 250:
|
Jun 7 04:54 UTC 2001 |
There are lots of reasons to move to newer hardware: support for larger
drives; smaller footprint; takes less electricity; generates less heat;
more reliability; capacity to support more users; having source to the
OS; etc. None of these are pressing reasons to get rid of the 670
tomorrow. But, just as we moved from the sun-2, to the sun-3, to the
sun-4, to the 670, eventually we should move on to something else. We
don't want to stay on the 670 so long that it becomes a museum piece or
a strait-jacket - as m-net did with their Altos.
How many of the people in favour of colo remember the hassle when grex
was in Ken's warehouse, and how many think that hassle was acceptable?
|
spooked
|
|
response 14 of 250:
|
Jun 7 06:55 UTC 2001 |
I'm happy to house a server in Aussieland - and, I reckon distributing
hardware slowly and with caution is a good idea - we live in an
increasingly distrubuted world, and today Grex's staff is not solely
residing in Ann Arbor.
|
i
|
|
response 15 of 250:
|
Jun 7 13:41 UTC 2001 |
I'd guess that 'most everybody who supports colo means colo-with-real-
24x7-access.
We use FreeBSD on several servers here at work. We don't install the
beta-bleeding-edge versions and cherry-pick versions that we see lots
of people saying favorable things about. I can't recall if we've ever
had a FreeBSD server go down for software reasons.
Unfriendly users with shell access seem to be the motivation for the
jail features of FreeBSD, but we've no experience with such situations.
Any chance that someone with serious installed bandwidth is close enough
to the pumpkin to run wire?
|
keesan
|
|
response 16 of 250:
|
Jun 7 14:03 UTC 2001 |
Don't forget $30/month insurance at the present location.
|
janc
|
|
response 17 of 250:
|
Jun 7 17:29 UTC 2001 |
I'm not wildly thrilled about Marcus's Kerberos/AFS distributed system scheme.
The two things that worry me are (1) the only staff person know knows much
of anything about Kerberos/AFS is Marcus. Sure, we could all learn, but
mostly there is a lot to be said for using software that lots of people know
reasonably well. And (2) AFS file permissions are weird. In fast, they don't
exist, there are only directory permissions. A lot of software designed to
work on normal Unix file systems will need some adaptation to work on AFS.
Picospan is one example (it encodes that frozen/retired status of an item in
its file permissions so some substantial redesign would be needed.) So I
think going to AFS would be a lot of work for staff, and an ungoing source
of confusion to users.
|
gelinas
|
|
response 18 of 250:
|
Jun 7 18:29 UTC 2001 |
One option is to use local disk for things like Picospan and distributed
disk for user directories.
|
mdw
|
|
response 19 of 250:
|
Jun 7 19:00 UTC 2001 |
PicoSpan would need to be replaced. Yup, AFS file permissions are
different. There are actually some neat features - for instance, much
better group support, and ACLs. There are some Unix file permissions
that just don't translate well to AFS - "execute-only" - how do you
enforce that in a network environment? The SUID bit is another thing
that just does not translate - and that's why PicoSpan would not make
sense.
I'm not the only one that knows AFS - STeve has experience with it as
well. Also, here in A^2, we have a lot of potential experience in that
umich and engine both run AFS cells - so it's not really as alien as all
that.
Here are some of the potential advantages of AFS: ability to have a
separate web server & "login" machine see the same file space. Ability
to have more than one "login" machine. Reasonably secure without need
to use firewalls or smoke & mirrors. Potential ability to access data
that is not stored locally. Open source with support for several
popular OS choices. Can scale up dramatically in size.
|
jp2
|
|
response 20 of 250:
|
Jun 7 19:38 UTC 2001 |
This response has been erased.
|
gull
|
|
response 21 of 250:
|
Jun 8 00:55 UTC 2001 |
My experiences with NetBSD/sparc were not all that good. It was harder to
install than OpenBSD and seemed buggier and not very mature. You'll meet
some people who relentlessly plug NetBSD for "serious" projects, but I've
noticed that almost all of them just happen to be NetBSD developers.
It also amazes me that NetBSD and OpenBSD have managed to go all these years
without developing a version of 'fdisk' that doesn't require a pocket
calculator, but that's a minor issue.
|
richard
|
|
response 22 of 250:
|
Jun 8 01:50 UTC 2001 |
if the idea is a co-locate, why not co-locate with mnet-- use this as
the opportunity to link the systems-- build a new grex and a new mnet
at the same time, with different boards of directors but same staff
figure out the best ways to complement each other. Or maybe mnet and
grex could remain separate, but the new "project" would be co-sponsored
by both in the interests of attracting users from bothj boards
|
spooked
|
|
response 23 of 250:
|
Jun 8 01:55 UTC 2001 |
Richard, BAD BAD BAD idea!
|
other
|
|
response 24 of 250:
|
Jun 8 03:08 UTC 2001 |
It's not an idea. It is random blathering. Or should I say
blithering...?
|