Grex Oldcoop Conference

Item 11: Next Grex Hardware

Entered by janc on Mon Feb 17 17:16:40 2003:

At Saturday's meeting, STeve Andre proposed that Grex purchase hardware for
the next Grex system now, and that the remaining development work be done on
that system.  Most people seemed to be willing to buy that idea, so there was
quite a bit of discussion of what hardware to get.  I want to move that
discussion on-line.

First, universal agreement was reached on using an x86 system, not a SPARC.
A number of people strongly prefer an AMD Athlon over an Intel Pentium, and
nobody really objects to this, so we are likely going that way.

There is a lot of concern over quality.  I believe that in recent years the
PC marketplace has shifted from competition based on performance, to
competition based on price.  It used to be that new desktop machines held
price steady at a bit over $1000, while the performance steadily improved.
But lately the prices have been falling (while performance has still steadily
improved).  This has placed substantial pressure on all manufacturers to cut
cost where they can - power supplies and cases have been getting crappier,
mechanical components of drives have gotten less reliable, and so forth.

The feeling was that this trend had impacted a lot of companies that used to
produce good stuff.  Dell's servers, for example, aren't as solid as they sued
to be (though they are more powerful).  The best approach to acquiring a good
new computer was to carefully buy separate components and integrate it
ourselves.  STeve Andre is likely to take the lead on this, though there are
other staff members with plentiful experience building systems (Dan
Gryniewicz, for one).

STeve brought to the meeting a draft suggestion for a system.  He is still
working on refining this.  His suggestion was:

   Athlon XP 2800 (I think this is 2.2 GHz) - about $400

   Motherboard - STeve wants to buy two, keeping one as spare.  I don't think
   a particular model was discussed.  About $145 each.

   RAM - buy lots.  It's cheap.  Say 1.5G for $270 or so.

   Case/Power Supply.  STeve like Antec.  About $250.

   Misc parts, fans, etc.  STeve wants lots of cooling.  About $100.

   NIC - STeve likes Intel.  100 mbit.  $33

   SCSI controller.  Ultra 160 at least, ultra 320 if possible.  About $200.

   SCSI drives, two 18G ibm.  About $142.

   CD rom, floppy, this and that maybe $250.

Adding up to around $2000.  STeve also include in his list a monitor and
keyboard, but Dan says he can probably donate these.  He also suggested an
80G IDE drive for about $100.  This has lower performance and reliability
than the SCSI drives, but is fine for stashing non-critical or rarely used
data.  With this, and various additional slough factors, we were mostly
talking about something in the $2500 range.
547 responses total.

#1 of 547 by scott on Mon Feb 17 17:50:55 2003:

The spare motherboard is so that we can have two *identical* motherboards -
often the "same" motherboard a month later will have some minor revisions
which can cause problems with existing software configurations - I've noticed
this as well as STeve.


#2 of 547 by janc on Mon Feb 17 20:00:42 2003:

You know, if we ordered two motherboards from the same vendor at the same
time, I wouldn't be too amazed if we received two that were *not* the same.
It's probably worth specifying when we order them that we want identical
twins.


#3 of 547 by dang on Tue Feb 18 19:16:47 2003:

Incidentally, I'm not sure which Antec in particular it was that Steve wanted,
but you can get an Antec (SX1040BII) case with 400 watt power supply at CompUSA
for $120. I have this case, and it's a wonderful case. It's a full tower, and
easily fits my dual-athlon setup in it. It has good cooling (4 80mm case fan
slots, comes with two fans, I have three), is solid, and the power-supply has
been like a rock. I can understand if we're not sure 400 watts is enough.

Monitor and keyboard are not an issue. I have several of each I can donate.

As to motherboards, we might want to consider 64-bit/66Mhz PCI, as that will
give us much better performance out of our SCSI, especially if we get Utra 320.


#4 of 547 by janc on Sun Feb 23 16:30:55 2003:

Thanks Dan.


#5 of 547 by cmcgee on Wed Feb 26 00:51:23 2003:

Now linked to Coop as Item 176; Garage 147


#6 of 547 by aruba on Wed Feb 26 04:58:51 2003:

I suspect the board will start the process of buying the hardware for the
new Grex at the meeting on Thursday.  So if people have strong opinions on
what items we should buy, they should speak up soon.


#7 of 547 by cross on Wed Feb 26 13:39:39 2003:

One think I would suggest is a rack-mount case.  While it's a little
more expensive, it's also probably a little more rugged and can easily
be fit into a colocation facility if, at some point in the future,
that becomes desirable.

I would suggest that, as part of this, grex either move out of the
pumpkin, or try to do as much as possible to make it a more habitable
place for the grex machine's.  In particular, the descriptions I've
heard of it like it's just too hot during the summer.  I suspect that
has a lot more to do with any of grex's system reliability problems
than any concerns of component quality or load.


#8 of 547 by scott on Wed Feb 26 13:53:17 2003:

Grex has had extremely low hardware problems in the Pumpkin, though.


#9 of 547 by gull on Wed Feb 26 14:15:11 2003:

Nearly all new hardware supports internal temperature monitoring.  If
OpenBSD supports this, we could monitor the CPU core and case
temperatures and see if they really are reaching unreasonable levels. 
That would, in my opinion, be a much better indication than the ambient
room temperature.
I know Linux supports reading most sensor chips via the 'sensors'
package, but I don't know if OpenBSD has support for any of this yet.

I agree a rack-mount case would be a good idea, but I don't feel too
strongly about it because it would be relatively easy to shift the
hardware into a rack-mount case later.

What brand of motherboard are you thinking of using?  Abit has had a lot
of problems with defective capacitors lately and maybe should be
avoided.  I'm not sure who makes the best AMD boards right now.


#10 of 547 by keesan on Wed Feb 26 14:30:21 2003:

Jim asks if grex would be interested in his basement once he gets it
insulated, and the house rewired.  He could have a separate entrance
accessible at all hours.  Might be a few years though.


#11 of 547 by mdw on Wed Feb 26 15:31:38 2003:

Grex's hardware reliability problems in the past few years have mainly
been:
 (1) DSL line flakiness.  Almost certainly not heat related.
 (2) random power weirdness.  Almost certainly not heat related.
(Unless
        you count all air conditioners in the state of Michigan.)
 (3) random disk failures.  These probably are heat related.
 (4) weird modem problems.  Wide range of potential causes.

The only one of the 3 we can control is (3).  *However* -- we've gone to
some effort to secure the best cooling we can given our environment.
Some of this has included the use of extra large enclosures and a fair
amount of extra room.  In a colocation, we'd have much less
space--smaller enclosures, less room, etc. Right there our improvements
go out the window.  No doubt things are much better in NJ, but here in
SE Mich, it's not hard to collect interesting tales of various
colocation heating and cooling disasters.  Backups are important - and
we definitely want to maintain our current advantage in terms of making
removable tape backups; this isn't just for disk failures, but also
covers floods and fires (both known risks in the local colocation
market) and vandals (a special and unique risk we also have to deal
with, which makes mirrored disks, normally a useful backup strategy,
much less attractive to us.)  When you measure up cost, cooling, and
backup convenience, the pumpkin suddenly starts looking a lot less bad.

I said that our disk disasters probably are heat related.  I suppose I
should expand on that.  We've had several failures.  We used to have
lousy disk enclosures.  Eventually, we resorted to using box fans.  It
was noisy and crude, but worked.  We eventually got better disk
enclosures.  Those have been basically adequate.  We have been luckier
in our failures than perhaps we deserve -- our failures have generally
given us notice, often show up during backups, so we've generally been
able to simply restore that last backup.  Some have shown up as heat
sensitivity - letting the disk cool often eliminates the errors (at
least long enough for that last backup).  In at least one memorable
case, the completely dead disk turned out to simply be packed with dust
-- cleaning it throughly resulted in proper operation, although we got
nervous and replaced it before it had a chance to turn traitor on us.
I'd like to think our luck is mostly due to backups, observation, and
paranoia.  But, to the extent that heat has played a factor, it may have
actually worked in our favour, although I'd hardly recommend it as a
good strategy.

Keep in mind that we're running mostly used disks of elderly vintage,
and basically running them until they give up the ghost This strategy is
guaranteed to eventually produce 100% mortality -- but it may
paradoxically produce more reliable storage meanwhile than constantly
purchasing new disks even though most of those won't fail before being
replaced.  Perhaps this just shows that you can prove anything you want
with statistics.

I'll be the first to admit the pumpkin is far from perfect, but even so,
I'd have to say that in terms of dealing with disk disasters, it still
comes out way ahead of what we could manage for colocation deals.  If
you're looking for that huge advantage that's bound to convince us to
move to colocation, this isn't it.  The certain convenience in terms of
access and space is known to us.


#12 of 547 by scott on Wed Feb 26 15:38:55 2003:

Sindi - we really want Grex to be in a neutral property, instead of someone's
house.


#13 of 547 by jmsaul on Wed Feb 26 15:58:19 2003:

Besides, it would take Jim years to get set up to draw the copper wire for
the cabling.  And we don't want to run Grex on a refurbished 486, powered by
a bicycle generator.


#14 of 547 by cross on Wed Feb 26 16:27:25 2003:

Regarding #11; Well, you mention several things that disturb me.  Notably,
dust and heat conditions in the pumpkin.  If you're going to stay there,
I suggest you make an effort to mitigate those to whatever extent is
possible.  Perhaps that means putting in a wall-mount A/C unit, or a
bigger one if necessary; perhaps it means putting in a humidifier to keep
down on dust; perhaps it means going over the whole room with a dust mop;
perhaps it means throwing out old yellowing sheets of paper that have no
further importance; it almost certainly means going with a new, server
grade case with adequate cooling.  Perhaps it also means something else.
I don't know, but it strikes me, and has been stated by several others,
that grex could do a bit better to make sure the conditions in the
pumpkin don't kill your new servers.

I have no idea what the colocation facilities in New Jersey are like,
since I live in New York City.  Here, our colo facilities are, umm,
quite different from the way you describe your options.  That's fine,
but if putting in a wall-mount A/C unit and giving the pumpkin a thorough
cleaning and removing a bunch of garbage from it will help improve grexes
chances of not having a disk failure, I'd say go for it.  In fact, that's
all I'm saying.


#15 of 547 by jep on Wed Feb 26 17:00:47 2003:

How is the new system going to be financed?  Might it make some sense 
to look at how much money is going to be available?  Is Grex just going 
to write a check for the amount of the new computer?

I don't see a tape drive listed.

The computer I just ordered can have 2 GB installed.  Whatever Grex 
gets, it'd seem to me to make sense to max out the RAM.


#16 of 547 by scott on Wed Feb 26 17:04:24 2003:

Indoor dust comes from people - the Pumpkin is quite dust-free, actually. 
My guess is that the bulk of the dust in that drive came from its previous
life.


#17 of 547 by keesan on Wed Feb 26 17:21:56 2003:

The pumpkin does not have windows.  The owners might not appreciate a hole
in the wall made by grex for an air conditioner.


#18 of 547 by aruba on Wed Feb 26 17:47:56 2003:

Right, I think a wall-mount AC unit is not an option in the Pumpkin.

Re #15: We plan to have a fundraiser to help pay for the hardware.


#19 of 547 by mary on Wed Feb 26 18:12:44 2003:

Ideally it would have been nice to have a fundraiser and buy hardware
based on the money raised plus what we have already set aside for upgraded
hardware.  But instead what we have is a bit of a time crunch.  Staff has
time to put this together, nowish, but a big chunk of the work needs to be
done before May. 

So instead of fundraiser first, purchase later, we are going to make a
leap of faith that the users will want this badly enough to donate what
they can, and get the project started.  Do folks think this is a
reasonable thing to do? 



#20 of 547 by keesan on Wed Feb 26 18:35:26 2003:

Wasn't there already a fundraiser for the last grex hardware, which ended up
getting donated instead, plus a $1024 donation for new hardware that has not
been spent yet?


#21 of 547 by gull on Wed Feb 26 18:42:17 2003:

Re #11: I'd also add that modern disks tend to run cooler.  I bet the
Pumpkin will be considerably cooler when Grex's old hardware is retired.

Good airflow should definately be a consideration when picking a case,
of course.  Thanks to the overclocker market, you can now get cases with
truely awe-inspiring numbers of fans.  Since noise isn't much of a
consideration where Grex is, we should take advantage of that.

Re #15: I'd guess our current tape drive will work with the new system.
 If I remember right it's an external SCSI drive.  These are quite
standardized; it'll just be a matter of the right cable, most likely.


#22 of 547 by aruba on Wed Feb 26 19:17:52 2003:

Re #20: In 1998, we had a fundraiser for spare parts for the current Grex
machine.  Then most of the spare parts we needed were donated to Grex, so we
asked everyone who had donated what to do with what they had sent in - some
of it was refunded or converted into membership dues, the rest was converted
into miscellaneous donations.

The $1024 which is currently in the infrastructure fund came from a single
user in 2001.  Its purpose is indeed to upgrade Grex's hardware, so the goal
of a fundraiser would be to add to that fund.


#23 of 547 by jep on Wed Feb 26 19:31:27 2003:

Why is there a deadline of May, and what has to be accomplished by 
then?  Is the goal or plan to get Grex actively on the PC machine by 
then?

Why not start the fundraising plan now?  I bet we could get at least 
some idea how much money will be available by the time of the next 
Board meeting, if people are asked for pledges.  If there's a lot more 
(or less) money coming in for the upgrade than what's expected, it 
might affect what would be purchased.


#24 of 547 by aruba on Wed Feb 26 19:37:37 2003:

The board meeting is tomorrow.  I decided to wait until then to give people
a little time to discuss what hardware we'd like to buy.  I expect to start
the fundraiser on Friday.


#25 of 547 by jep on Thu Feb 27 17:16:40 2003:

Oops.  I thought you meant next month's Board meeting.

I'm not objecting to any deadlines, by the way, just asking for further 
information about them.


#26 of 547 by dpc on Thu Feb 27 18:40:25 2003:

I agree that we should buy the system as soon as we can, since
we have the cash on hand, and then ask users to help pay for it
through fundraising.


#27 of 547 by cross on Thu Feb 27 18:53:03 2003:

I think buying hardware now is wise.


#28 of 547 by steve on Fri Feb 28 00:31:29 2003:

   Here is the second cute of the list.  Sorry I'm so
late.  

             Approximate cost of a $2000 i386 Athlon box
                                         --STeve 2/27/03

   Here is the second cut of Grex's future hardware.
Quality always wins out over price.  Given the state
of the world of computers at the moment, spending
extra makes sense.  The amounts listed here are pretty
accurate, but things change.

Here is a short description of each item.

CPU - We want an Athlon over a Pentium.  There are lots
of reasons for this; the easiest is that they simply
perform better.  Given the cost differences, getting
the fastest one possible is reasonable.

Motherboard - We want two motherboards.  Why? If the
motherboard fails we'll have an exact duplicate, and
be able to get back online *knowing* that we have
the same exact motherboard.  Several times now, I have
been bitten by "small" changes in motherboards, ie
small revs on the artwork, etc that have made some
noticeable difference in the operation of the computer.
For $145 extra its an excellent investment.  The other
parts, ram, disk controller, etc aren't nearly as 
persnickety; we can get more of those on a demand
basis.

Ram - Ram is *cheap*.  Having hordes of ram means that
we won't run out, and can work on some optimizations
like keeping binaries in ram disks, etc.

Box/power supply - THIS IS IMPORTANT.  The power supply
is one of these overlooked thins which all too often
is of dismal quality.  Antec goes against the grain
and still builds good power supplies.  Spending half
this amount for a cheaper box is definitely possible
but is not reasonable.

Misc case fans - Heat is the killer of systems.  I want
to be able to get several extra fans and over-engineer
the machine.  It's cheap.

NIC - OpenBSD likes the intel nics.  I've had excellent
results with them.  This is a 100Mbit card.  If we go to
1G, we'll have to do some research on this, and spend
more money.  Good 1G cards cost a lot more than garden
variety ones do, so I'm assuming 100Mbit at this time.

SCSI card - We want SCSI over IDE for our main disks.
The card listed is a ultra 160 speed card.  We want to
try and go with an Ultra 320 system if possible.  This
is the biggest unknown for me at this time.  The card
listed is Ultra 160, not a slouch.

Floppy/CD-rom - standard items; I like Sony stuff.

SCSI disk - I like IBM disks; when kept cool they are
excellent disks.  These are Ultra 320 disks.

IDE disk - Perhaps not needed.  This would be very 
useful for file storage, such as /usr/local/src and
other things that we don't use often.

Monitor - I am really beginning to hate most of the
monitors out there.  The viewsonic is OK and not too
pricy.

Keyabord - generic keyboard, perhaps found for less.

Misc items - Those things like cable wraps, cables,
etc.

Item                 Price Source Comments
CPU - Athlon XP 2500 $180  Newegg
motherboard          $145  Newegg
motherboard          $145  Newegg
RAM  512m $89ea      $300  Crucial 3 512M, at $100ea
antec box/ps         $250  Antec
misc case fans, etc  $100  Antec
NIC Intel 100mbit    $ 33  Newegg
SCSI card adap 29160 $194  Newegg
floppy (sony)        $ 18  Newegg
CD-rom Sony, 48x     $ 51  Newegg
SCSI disk ibm 18G    $142  Newegg Ultra 160 15K rpm!
IDE disk IBM 80G     $ 98  Newegg
misc items           $150  misc

sub-total           $1948





#29 of 547 by cross on Fri Feb 28 03:48:11 2003:

One question about the disk subsystem; why not go with a SCSI RAID
card, and do hardware RAID?  OpenBSD supports quite a few RAID controllers
these days, and a RAID 5 setup would be a good investment.


#30 of 547 by rheaumea on Fri Feb 28 06:24:19 2003:

I would recommend Tyan or Asus boards I have always had very good luke
with them in the past.


#31 of 547 by rheaumea on Fri Feb 28 06:24:38 2003:

hahah luke=luck


#32 of 547 by mdw on Fri Feb 28 10:10:41 2003:

The pumpkin doesn't have any exterior walls.  Cutting a hole in the
wall is only going to get us into somebody else's space.

I don't think we've had any other drives die from dust.  So Scott may
very likely be right that this was mostly dust from a previous life.

While the space isn't ideal, I think it's far from the deathtrap some
people here seem to assume.


#33 of 547 by cross on Fri Feb 28 16:14:27 2003:

One can only go with what one's been told.  Having never seen the space
myself, I can't really judge except by what others have said.  From what
others have said, it runs too hot during the summer.  Perhaps newer
hardware will fix that; I really don't know.


#34 of 547 by mdw on Fri Feb 28 16:48:11 2003:

I think we have pictures up on the web somewhere.


#35 of 547 by remmers on Fri Feb 28 17:14:15 2003:

http://www.wwnet.net/~janc/grextech/pumpkin/


#36 of 547 by cross on Fri Feb 28 20:05:53 2003:

I wasn't aware that those pictures conveyed the sense of ambient heat in
the room.


#37 of 547 by jep on Fri Feb 28 20:18:44 2003:

re resp:28: STeve, you only listed 1 SCSI drive.  Doesn't Grex want two?

$51 for a CD ROM sounds like a lot to me.  I'll bet any number of us 
have one or more around that Grex can have for free.  Same with floppy 
drives.  Is there any disadvantage to getting such items donated?  
Neither would be used often.

You didn't list a price for a monitor.  Isn't that another item Grex 
could easily get donated?  I'm assuming few people use the console for 
anything other than booting the computer.  I wouldn't expect Grex would 
even require a color monitor, if a mono VGA one can be had any more.  
My point is that this is one more place Grex doesn't need to spend any 
money.

No arguments on such "invisible" parts as fans, power supply and 
memory; the things no one will ever notice unless there are problems.  
It is not possible to have too much or too good of such things.


#38 of 547 by cross on Fri Feb 28 20:21:12 2003:

Hey, while you're at it, why not get a PCI multi-port serial I/O card
and ditch the terminal server?  They're only a couple of hundred dollars.


#39 of 547 by drew on Fri Feb 28 21:37:09 2003:

Re cooling: Is there anything *below* the pumpkin? ISTR it's in the basement.
Maybe the floor can be used as a heat sink, if one is really needed.

Definitely get RAM, double whatever you planned to get. And make sure the
motherboards can handle as much as possible. A 32 bit processor should be able
to directly address 4 gigabytes.


#40 of 547 by keesan on Sat Mar 1 00:14:52 2003:

Steve likes new parts and he is putting in the actual time.  We offered TWO
40X CD-ROM drives and we have several high-quality floppy drives which he
turned down.  I would not try to argue with his personal preferences
considering he is the one going to be doing the work.  Make him happy,
it might get the new grex built sooner.  


#41 of 547 by russ on Sat Mar 1 01:19:28 2003:

Something to budget for might be some flanges to attach to equipment
fan outlets, and flexible tubing (dryer hose) to duct exhaust air
from equipment to whatever passage carries air out of the Pumpkin.

Probably wouldn't be more than about fifty bucks.


#42 of 547 by keesan on Sat Mar 1 01:22:24 2003:

Again this would mean a hole in someone else's wall.  If grex is surrounced
by air conditioned offices, it might be sufficient just to reduce the amount
of heat generated.


#43 of 547 by steve on Sat Mar 1 03:07:04 2003:

   Heating is not going to be a major issue.  The Sun-4/670 monster
consumes several hundred watts of power and uses 35w SCSI disks.
The new system will be at least 100w less than that (more like 200w
I'll bet) and uses disks which will eat less than 10w of power.
The Pumpkin isn't great, but its certainly no worse than the
Dungeon was, and actually better than Ken's warehouse was.  With
fans at the side of the case we'll be OK.  The heat that we won't
dissipate with the new hardware will make us cooler, too.  I
think we need to worry over the power more than heat, and we
have the Leibert UPS for that.  All in all I'm pretty happy with
the physical condition of the pumpkin.

   As for random hardware peices, I'm willing to use things like
donated monitors and keyboards, because if they fail the system
won't be affected.  Things like CD drives are a different matter.
We need to get good NEW equipment here.  Moving onto a new
platform is a major effort for Grex staff, and I simply do not
want to skimp on anything.  For one thing, PC equipment is now
just about commodity stuff.  It isn't that expensive, and while
finding high quality stuff can be a little drustrating at times,
it's out there.  The second and more important reason to be
picky about things is to try and prevent every kind of hardware
failure that we can.  Getting used equipment is more of a gamble
and more prone to failures.  I could supply Grex with everything
but an Athlon and motherboard, but Grex really doesn't want to 
go the cheap route.  We're going to do this the right way.

   The CD drive might not be a little used item, too.  There
are schemes that we will investigate where we may boot off 
the CD, and having that fail would not be good.

   So it isn't a case of making me happy, but building a
system that is as reliable as possible.  These days is means
building it yourself, using the best possible parts.  Grex
has never used inferior equipment, and it shows.  Our Sun-4/670
is a marvelous beast.  I can only hope that our i386 platform
will be as reliable as that has been.


#44 of 547 by jep on Sat Mar 1 04:34:59 2003:

But STeve, getting things working is what *makes* you happy!  I'd 
assumed the CD ROM would only be used for installation and then never 
used again.  I don't mind Grex spending money on anything that will 
make any difference.  I understand your goals, and support them.

This is going to be a single CPU?  No firewalls or mail servers?  I 
know little about either of those things, but am curious as to why.


#45 of 547 by keesan on Sat Mar 1 16:07:08 2003:

If there were TWO floppy drives and one failed you would have a backup.  Are
two currently good used ones as good as or better than one new one?
I have never had a floppy drive fail and I use mine constantly.  But we have
been given machines with bad drives.  


#46 of 547 by cross on Sat Mar 1 16:41:23 2003:

Regarding #45; This isn't 1987 anymore; grex isn't going to store data
on floppies.

A floppy drive is about 10 bucks; a CD-ROM about 30.  That's not going
to break the bank, so why bother worrying about it?


#47 of 547 by keesan on Sat Mar 1 17:49:23 2003:

Steve's figures were actually about double that.  But I am seriously
suggesting that two used drives would give more reliability than one new one
- put them both in the machine and you have a backup that does not require
opening up the case.  Same for CD-ROM drives, build with two of them.  Or does
this complicate the software?


#48 of 547 by steve on Sat Mar 1 18:57:19 2003:

   Sindi, you aren't getting what I'm going for here.  I want the
next hardware to be as reliable as I can make it.  I *don't* want
to have a backup for something because I think I'll need it; we
will have a backup for the motherboard because of its unique nature
in the list of hardware.  I want to get equipment which is new, that
we can break/burn in, and have the highest factor of confidence in.

   If we start using used parts here and there, we might be OK.  It's
certainly possible that nothing would go wrong, etc.  But if I can
get new parts for a piece of Grex which increases its reliability
then its the right thing to do.

   You'll notice that I omitted the keyboard and monitor from the
list of things to buy.  This is because of either fail, they won't
affect the running of Grex immediately.  We can boot the system up
without a monitor (indeed, nost of my OpenBSD machine have only
had a monitor on them during the initial install and upgrades),
and we have spare keyboards in the pumpkin already.  So those
are of a nature where a failure means we have to drag something
over to Grex and use that instead.

   But other components--including a CD and floppy--are different.
Those may not be used a lot, but when they do need to be used,
there is a fairly high chance that they are really critical.  And,
if we adopt a system where Grex boots off the CD, that becomes
even more critical.

   The bottom line here is that Grex needs to run as well as it
possibly can.  We don't have a full-time staff to work on things.
Any real disaster means that someone is going to spend quality
emergency time working on the system rather than sleeping, working,
or living their life, and I'd like to see that be a minimum.  By
using the Sun-4/670 we've done about as well as possible.  We have
a commercial quality system in terms of its reliability, and I'd
like to see us keep that level of on the new hardware.


#49 of 547 by steve on Sat Mar 1 19:02:28 2003:

   As for having two floppies in the unit, that just atrikes me
as "wrong".  If something like a floppy, which costs so little
is seen as being untrustworthy, it makes more sense to me to buy
a good one.  Again, the idea here is to do it "right".  Grex may
be a hobby, but a large number of people see Grex as a utility.
We should not cur corners when we do something right.  The one
large advantage of the i386 platform is that most parts are
fairly cheap.  We have the possibility of doing things pretty
well for not a huge sum of money, so I want to do that.


#50 of 547 by drew on Sun Mar 2 23:56:07 2003:

Consider booting from compact flash? I *think* I've seen a 512K one at Best
Buy for not *too* much.


#51 of 547 by russ on Mon Mar 3 04:08:30 2003:

I would suggest a removable IDE bay.

IDE, because it's cheap.

Removable, because big IDE drives are downright cheap these days
and we could use the same bays for doing backups.  Just dd copies
of the other partitions to the removable, and take it home for
off-site security.


#52 of 547 by gull on Mon Mar 3 14:08:57 2003:

Looks good to me.

Re #50: That wouldn't be very expensive, but I don't know as it's worth
the effort to set up when you can boot off CD and there's already a
CD-ROM specified.  You also tie up an IDE device position with a
CompactFlash adapter.  Additionally, CompactFlash is pretty slow as IDE
devices go, and if I remember right putting a slow device on an IDE
cable also forces the other device on that cable to run slow.


#53 of 547 by dpc on Mon Mar 3 21:37:26 2003:

A lot of thought has gone into this list.  It looks good to me,
too.


#54 of 547 by jhudson on Mon Mar 3 22:07:18 2003:

Re 45: if two floppies, the second DOES NOT back up the first.
The only purpose for floppies on servers these days is
interruption of the boot controler, and only the first one
is bootable anyway.


#55 of 547 by keesan on Mon Mar 3 22:34:51 2003:

My CMOS has a setting to switch A: and B:, would that help?


#56 of 547 by cross on Mon Mar 3 22:36:52 2003:

No, because there's no point in doing it.  If grex's floppy drive dies
(why do they even *need* a floppy drive, by the way?  Just boot off the
CD-ROM), the correct way to fix it is to go to the store and buy a new
one, and chuck the old one in the nearest trashcan.  Not pull out bubble
gum and string and coax it into ``working.''


#57 of 547 by gull on Mon Mar 3 23:44:42 2003:

I suspect they're installing a floppy because floppy drives are really
cheap, and every once in a while it's handy to have one.  That's why we
ordered them on the machines at work.


#58 of 547 by keesan on Tue Mar 4 00:34:07 2003:

We have been quite succesful in doing things the 'incorrect way' and recycling
used equipment rather than using the trashcan.  Have not needed any bubblegum.
Cleaning a floppy drive every few years is probably less time consuming than
a trip to the store.


#59 of 547 by cross on Tue Mar 4 01:55:22 2003:

Cleaning a floppy drive for two people who like to spend their time doing
such things is less time consuming.  Taking a system with >45,000 accounts
offline for an hour to take out the floppy drive and clean it is probably
a lot more time consuming, since you're no longer just wasting two people's
time.


#60 of 547 by keesan on Tue Mar 4 11:58:15 2003:

Does it take an hour to clean a floppy drive?  There are cleaning diskettes
that you can run without taking out the drive.


#61 of 547 by scott on Tue Mar 4 13:59:42 2003:

Sure, but then somebody has to *buy* a cleaning diskette and make sure it will
be easily available 2 years from now when we actually need it.  


#62 of 547 by gull on Tue Mar 4 14:01:23 2003:

I think we all appreciate that you're trying to save Grex money, Sindi,
but I think the decision has already been made to use new hardware this
time.  Our system administrators are volunteers, and if they don't want
to spend their time cleaning floppy drives and otherwise trying to nurse
elderly hardware back to health I think we should respect that.



#63 of 547 by keesan on Tue Mar 4 15:55:58 2003:

I already said that I think STeve should do this however he prefers.  I
thought we were talking theory here.  I don't throw out my clothing or dishes
when they need cleaning.  
And I do have two floppy drives in all my computers, which is handy when one
needs cleaning (every 3-4 years, with heavy daily use).  


#64 of 547 by janc on Wed Mar 5 04:59:28 2003:

You make heavy daily use of your floppy drive?  I'm not sure I've ever
used the floppy drive in my current computer.


#65 of 547 by remmers on Wed Mar 5 13:04:56 2003:

Floppies are a dying media format.  Current trend is for them to be
optional rather than standard equipment.


#66 of 547 by gull on Wed Mar 5 14:41:12 2003:

I used them fairly heavily until I started networking everything
together.  Now it's rare for me to use one.  When I have to I'm struck
by how small, slow, and unreliable they are.


#67 of 547 by robh on Wed Mar 5 18:02:12 2003:

I actually still use mine a fair amount, for carrying files between
home and my two workplaces.


#68 of 547 by drew on Wed Mar 5 19:32:26 2003:

I use mine to carry small stuff back and forth to a friend's place. Anything
huge gets burned to a CD.

I did indeed see 512M compact flash cards at Best Buy - retail for $179. This
is almost as big as a CDROM, and costs not all that much more than a top o'the
line burner. Advantage: no moving parts. Possible drawback: I don't know
whether there's a hard write-protect switch.

The idea of booting from CDR or CF is that the boot media should contain only
the memory resident kernel, plus a .gz of the root filesystem to be used. It
should expand to at least a gigabyte, possibly two depending on how good the
compression is. In this manner, the system should run blindingly fast with
most of it on a Ramdisk, and hard drives only used where absolutely needed.
In addition, you get proof against root system corruption, as whatever hacks
are committed get undone with a simple reboot.

As for CF slowing down other devices, most modern boards come with two IDE
chains, and besides, the hard drives are to be Scuzzy.


#69 of 547 by gull on Wed Mar 5 19:36:42 2003:

I don't think CF cards have a hardware write-protect switch.  At least,
I've never seen one that had one.


#70 of 547 by keesan on Wed Mar 5 19:49:35 2003:

What is grex planning to use a floppy drive for?
Most of the files that I produce are translations of under 20K and they fit
nicely on floppy disks.  I use one disk per translation agency.  My 360K disks
are 95% reliable (I lose maybe 1-2 per year).  I don't trust the higher
density ones, they are always going bad.  I move files between computers on
the 720K disks.  We have several computers at each of three locations.  This
is quicker and easier than uploading and downloading via grex or an ISP.  
Most of our DOS files fit on 720K disks.  I have a little file splitting
program that I have used on 2M files (such as a Linux distribution).  
A page of text, single spaced, is about 2K.  (page = screenful).  
I don't recall ever having a floppy drive go bad, they just get dirty and
won't read all the disks after a few years.  We have been given computers with
bad floppy drives (they will do a dir but won't read a file).  To replace them
you remove the monitor (unless it is a tower), unscrew a few screws, remove
a cover, unscrew a few more screws, unplug two cables, replace the drive, and
reverse the process.  Total time perhaps five minutes.  Before actually
replacing the drive you can plug in the new one and make sure it fixes the
problem.

In 50 Borders computers that were really heavily used for many years and were
full of large dust bunnies there were a lot of bad floppy drives.  Perhaps
the dust causes them to overheat.  It might be a good idea to vacuum out the
grex computer once a year.  You take them outside and blow the dust out.


#71 of 547 by pvn on Thu Mar 6 08:52:20 2003:

Just some random thoughts.

IBM is no longer in the drive manufacturing biz and hasn't been for
some time now.  They are sweet drives and stand up to lots of abuse
and it sure wasn't for technical reasons IBM is out of that biz.
Be careful of older 18G drives tho-, (sold as 'new' in 3rd party
channels). 

Adaptec is having financial problems, I seems to recall they just
laid off a lot of people recently. It kinda doesn't matter if you
don't expect support in the future anyway, but kinda makes you wonder
what manufacturing shortcuts might have taken place in the recent
past.

Nothing bad to say about AMD CPUs - more integer 'pute bang for the
buck.  (We are talking about delivering text to the Internet over
a slow pipe, not massive floating point calculations).

I'd like to see some analysis of the current hardware and where its
bottlenecks are.  Seems to me that the huge theoretical performance
allowed by Adaptec/fast scsi drives is still limited by the transfer
rate of the PCI bus.  Thus the delivery of little chunks of data that
constitute web browsing of raw ascii text (the primary function of grex)
might be equally well served by much larger cheaper IDE disks.  And
its easy to get 16 IDE disks in one system.  (You were talking only
two screamer scsi drives right?)  Fat disk trying to send data to 
slim pipe?  It doesn't matter if the PCI controller can read the
screaming disk as 166/320 in theory if the best it can deliver is 66
and with most MBs 33.

More memory is good.  Its cheap and a whole lot faster than even the
fastest disk.

Don't just buy a spare identical motherboard which sits on a shelf
gathering dust until it is needed.  Take the next step.  Invest
the little extra in case/pws/cpu/memory and put it online at the
same time.  Look at clustering/HA/failover/distributed computing.
Seems to me there are a number of well developed applications.
Look at what you are doing.  Sure writing a post to the disk is
atomic but reading sure isn't.  There is no reason that reading an
item via a web page (probably the vast majority funtionality) couldn't
be a distributed task.  

Why BSD?  I used to be a BSD bigot where serious stuff should be done
on it.  But now Linux, with major players spending tons of bucks, is
where everything is happening and backported to BSD et al.  With
linux you have things like MOSIX.

Any thoughts of approaching name hardware manufacturers to donate
for the tax right off?  I seems to recall that it has been mentioned
in the past.  Why spend yer own money in the first place?  So you don't
get the latest and greatest, but you get it for 'free'?  



Just my 2-cents worth.


#72 of 547 by gull on Thu Mar 6 15:23:59 2003:

Pvn, care to elaborate how you get 16 IDE disks in one system?  At two
disks per channel you'd need eight controllers to do that.  One will be
on the motherboard, but that still leaves seven, and most machines just
don't have that many PCI slots.


#73 of 547 by jep on Thu Mar 6 20:31:01 2003:

M-Net looked into asking for donations for a server, several years 
ago.  I don't recall Grex ever doing so.

I don't think processing power will be any problem with a modern PC-
based system.  M-Net has never had problems processing the amount of 
data they required, and Grex is not that substantially different.  M-
Net is still very much usable with load averages into the 20s.  In 
fact, as a user, one doesn't even notice such loads.  I don't think we 
need to worry about distributed computing just now.

Distributing the computing load would add complexity to a process 
that's already going to be complicated for the staff.  I'd hope someday 
we can get a mail server, but let's let staff get the new computer up 
and running first.  Then they can add frills.  We don't have to have it 
all at once.


#74 of 547 by gull on Thu Mar 6 21:26:17 2003:

Re #72: Err, sorry, two will be on the motherboard, generally.  Still,
machines with six PCI slots aren't terribly common, and you'll use up a lot
of precious IRQs...


#75 of 547 by jhudson on Fri Mar 7 16:26:34 2003:

Good point. SCSI needed.


#76 of 547 by mdw on Fri Mar 7 23:25:24 2003:

Grex has had equipment fundraisers for hardware before.  Since we've
previously gone with trailing edge CPU's, previous fundraisers have been
for memory, hard disks, etc.

I don't know if IDE commonly supports overlapped seeks yet.  With only 2
devices per channel, there of course less advantage to overlapping
seeks, but all other things being equal, a 2 disk IDE chain that can't
do overlapped seeks is going to perform less well than a 2 disk SCSI
chain which can.  With small block transfers, overlapped seeks and more
spindles per given capacity (ie, smaller drives) may be more important
to us than transfer rates.

There are 2, 4, and 6 channel IDE controllers.  A 6 channel IDE
controller can attach up to 12 IE disks using one PCI slot.  I've heard
people claim that some of these mega channel IDE based systems are very
fast disk machines, and that it even makes more sense to do software
RAID than hardware, on account of the CPU having so much more memory to
buffer things.  I don't know how much truth there is in all this.

One difference that is likely to be important to grex is that SCSI
drives today typically go into "server" machines.  IDE drives available
via retail channels are most commonly going into desktop machines.
There is a real split in the PC x86 world between server, desktop, and
home machines, with a corresponding descent in quality (and reduction in
reliability) between the three.  This is a recent development, so I'm
afraid Sindi won't have seen this in any of the machines she sees.
Since SCSI drives mainly go into server class machines, there is a
chance they'll be more rugged and reliable.


#77 of 547 by tonster on Sat Mar 8 02:51:25 2003:

Looking at the prices you guys are looking to pay for stuff, it really
seems like you're spending a hell of a lot more money than you need to.
 You can get a 54X CD-ROM brand new for $28 at Sky-Tech in Ann Arbor. 
And it's not some cheap knock-off drive.  Floppy drives cost no more
than $14.95 locally, and you don't have to pay shipping.  A lot of those
costs that were listed really are inflated for what you're getting.


#78 of 547 by pvn on Sat Mar 8 07:05:07 2003:

re#76:  IDE drives are currently optimized for large storage
(windoze bloat) and sequential access at high rate.  (Speaking
really generally, and it seems to me its been years since there
was any increase in rpm (like 7200 has been around for awhile now
and that is tops)).  I absolutely do not disagree that SCSI drives
have always been far superior at chunky random access.  However, the
reason IDE is not generally seen in the server class machines is
because it is not hot swappable - its not a problem or question of
being less reliable or lower quaility manufacture.  Indeed I think WD
for example has a 5year warranty on drives and I don't recall any SCSI
manufacterer offering more.  The major reason IDE drives are so much
cheaper is economy of scale - you sell a ton of them for every SCSI
drive.  PLus the IDE drives really are stupider, although that 
advantage has gone away over time.  

What you gain by using really fast SCSI drives you lose by going with
PC hardware (remember what PC stands for).  Your motherboard itself
is suboptimal as a 'server class' machine in the first place - and
I think this is an arguement you know full well.  That being said,
the pure integer computes that modern PC CPUs deliver overcomes
that by brute force - so what if you "share IRQs" when you can do it
so fast.  So for grex the PC motherboard is sure appropriate.  

Steve is theoretically correct in that if grex were a server delivering
indexed 36G of data - such as a database - over a fast media such
as GigE then absolutely the SCSI solution is the way to go.  I'm just
not sure that is a good model for what grex actually is or does.
With a couple gig of memory I wouldn't be surprised if most browsing
of conferences wasn't satisfied by memory read in what case disk
speed is irrelevent.  I'm also sure that the use of SCSI won't hurt,
I'm just not sure it will help as much as folk think.  Again, to your
users you are delivering ascii content over a thin pipe.

re#72: And as mdw already pointed out, yer modern MB already has
typically 2ide(2IRQ) for 4 drives total.  And there are PCI IDE
controllers that can be add ins (typically sharing IRQ). I personally
run one 'server' that has a total of 8 IDE drives (cheap Maxtor add
on controller).  Theoretically I could easily be serving close to a
TB of disk.  (In fact I'm running mostly a bunch of 500M drives that
I got from the trash of a firm that decided the proper method of
data destruction of old drives was to toss them about 50 feet across
a room into a dumpster. Of the 20 or so drives I 'dived' 17 were 
apparently good (data was probably intact although I simply built
linux filesystems on them).  My big difficulty was to sheet metal
screw together the frames of two old 'tower' cases front to back and
saw off the opposite sides of the cases in order to have enough
bays for all the drives - that and the y-cables...it sure don't look
too good, but it works and has now for on about 2 years at least.)

re#73:  HA/Clustering/whatever you want to call it has been around
a long time now. This is no longer rocket science.  Once set up there
really isn't that much more to do especially for something like grex
where generally the users all do the same thing they ever do, over
and over, and over again.  The advantage of spending a little more
time on the front end is that your single point of failure becomes
your upstream connection (which is a significant POF in my opinion
but one that you shouldn't bother to address - nobody dies if they
can't get logged into grex).  The other advantage is that it gives
you the ability to do rolling backups and rolling upgrades.  Unless
you are a hardware maintenance organization the concept of having
perfectly good hardware sitting gathering dust is silly.

Again, its just my 2-cents worth and a couple minutes of typing 
based on years of experience fixing problems involving systems a
little larger than grex.  And if you are gonna have an identical MB on
the shelf gathering dust, then you should make sure that you have at
least 2 identical PWS as well.  (And no, you really don't need to
pay for server class hardware either.  So what if grex is down for
a couple days?  You don't lose money and nobody dies.)


#79 of 547 by scg on Sun Mar 9 04:18:09 2003:

I'll put in another plea for a rack mount case.  Even if you don't want to
colocate somewhere now (a bad decision being made by refusing to look at
current information, but not really worth arguing about at this point), it
will keep that option open for the future.  More to the point, the rack Grex
already has could easily hold a rackmount PC server case, the DSL router,
modems, a spare server for development work, the keyboard and router, and have
lots of room to spare, freeing up the rest of the space in the Pumpkin for
whatever it is that people think we need the Pumpkin for.


#80 of 547 by pvn on Sun Mar 9 10:54:59 2003:

re#79:  Rack mount cases are appropriate for a lot of things, grex
isn't one of them.  First, they tend to be far less forgiving of
environment.  Second, they tend to be a lot more expensive.  Shelves
for racks holding standard PC cases are a lot cheaper and give the same
space saving quaility.  Throw out the rack and cheap plastic shelving
units perform the same function.  If grex ever has to colo then cheap
2RU case at that time is probably better.


#81 of 547 by cross on Sun Mar 9 15:52:54 2003:

Regarding #80; what do you mean they're less forgiving of environment?
It's been my experience that rackmount cases are far more rugged than
your average tower.


#82 of 547 by scg on Sun Mar 9 23:01:55 2003:

Without a rack, rackmount cases are less convenient than mini-tower cases.
With a rack, the rackmount cases become much more convenient.  I just worry
when I see people talking about getting a really expensive full tower case
that the case will become a big limiter of future options.  Full tower cases
are fine when you've got one or two of them in a room (Grex's current
situation), but they really don't scale.


#83 of 547 by gull on Mon Mar 10 03:26:22 2003:

Re #81: Rack mount cases are very restricted inside, which means cooling
is more difficult and the ambient room temperature is much more
critical.  We have some 1U rack-mount servers at work.  They have about
five fans each, and the air that comes out of them is pretty hot.

If we ever do decide to go with colocation, the hardware could be moved
into a rack case.


#84 of 547 by scg on Mon Mar 10 04:13:18 2003:

1-U rackmount cases are a special kind of beast, requiring special components
to go inside (normal PCI expansion cards don't have room to stick vertically
out of the motherboard).  1 U rackmount cases certainly aren't the only kind
of rackmount case out there.


#85 of 547 by pvn on Tue Mar 11 10:05:02 2003:

Here we go, re-arranging deck chairs on the Titianic ---
If you want hardware that will replace the current in its
current environment then cheap commodity PC stuff is the
way to go.  My only question is if high end 'server class'
SCSI drives on a PC platform is the way to go, nothing more.

(Is is RU or U?  I'm not clear.)

Point being if you don't have a reasonable temp environment than
rack mounted is not the way to go.  Which is meaningless drift.
I don't think anyone is suggesting spending the bucks for stupid
racks instead of PC boxes.  I merely suggest two things, that one
reconsider SCSI in the first place and the second is that one budget
for power supplies and have them on the shelf at least.


#86 of 547 by jared on Thu Mar 13 15:26:31 2003:

Just to give some of my experiences:
with PC hardware, you want to have your /var/mail (mail spool) and
swap on scsi disk.  The rest tends to be less relevant.

i "skipped to the end", so if backups haven't been discussed, i
suggest grabbing some cheap disk and having hot-backups available on
already spinning media in the same room.  I've found this invaluable
in my environment going across an x-over ethernet cable.

you probally want daily (or even hourly?) backups of /etc (hourly of
/etc/passwd perhaps?) in order to allow for easy recovery.  I might be
able to donate some hardware towards this.


#87 of 547 by pvn on Sat Mar 15 07:22:28 2003:

re#86:  With a lot of RAM why worry so about swap space?  And in grex's
case - all data at best over a thin pipe - with proper tuning even a
micro$oft OS can keep up with email over DSL or Cable speeds using IDE
drives (hopefully you don't have all that much local email over even
10mbs ethernet).  With high density IDE drives you have a lot of spare
space to do lots of backups.  And with mirroring IDE controllers or even
software RAID you have a lot of fault tolerance.  Even with neither and
simple more big disk (JABOCD) you still got lots of fault tolerance if
you put your mind to it.

As for RACK cases.  That dog don't hunt.  If you need to cram a lot of
CPUs into a small expensive space with climate control then rackmounts
are sure and the way to go.  Even in a modern office environment with
HVAC and cleaning services rackmount is questionable.  I don't know what
the grex current physical environment is but I bet its far more 'dirty'
and with a much higher range of tempurature than a modern office.  More
like 'home' and thus a cheap conventional PC case with lots of 'dead
space' is the way to go for that - big fan and lots of room for dust.


#88 of 547 by jared on Sat Mar 15 23:52:35 2003:

re #87
Virtually all modern unices swap out unused processes and with a high
smtp and other load on the system you will see continued need to swap
out a few processes as things are being used more efficently as disk
buffers.

I'd rather have my shell process be swapped out while i'm in bbs in order
for caching of the password file for background smtp delivery instead
of keeping it in memory and making the password lookups slower.


#89 of 547 by jkd on Sun Mar 16 04:54:20 2003:

Here are a few comments from a perspective that a) is West Coast, and b) has
developed over the past 25 years of building and operating large-scale data
centers on a daily basis.

First, I live in Silicon Valley, and visit the surplus sources here basically
every weekend with a couple of friends for entertainment purposes. We've been
doing this for nearly three years now. So, every time I hear about people
making regular purchases at CompUSA, Best Buy, and similar national chains,
I *CRINGE*. However, I lived in Ky. for 10 years or so before moving out here,
so I understand how people get into that. Anyway, if Grex wants me to compare
prices that I see out here with what's available in MI or via mail-order, let
me know. I'll be happy to.

Second, out here in Silicon Valley, the .com meltdown means that, literally,
200,000 jobs EVAPORATED. This has had an enormous impact on availability and
pricing of all manner of commercial hardware. Example, I just bought what I
refer to as my new "Not H-P" machine. It's "surplus." It has NO dust in it
and is comprised of a 3.06 Ghz P4, 512MB of PC2100 memory, A Radeon 9700 Pro
graphics card, on-board Ethernet, a 120GB Maxtor IDE drive, floppy, CDRW+DVDRW
combo drive, a DVD-RAM drive, and a modem card. My Price? $1200. No Kidding.
Why do I call it "Not H-P?" Because it was made for H-P and was an overstock
item. So, H-P surplused and forced the OEM to paste pieces of plastic on the
sides of the case where the H-P logo is normally visible. However, to anyone
who has ever seen an H-P PC, it's OBVIOUS what it is.

I saw an earlier mention of a "Liebert UPS" in this thread. I hope that means
that Grex is in possession of what used to be regularly known as a "True
Online" UPS. One where Utility AC power is converted by the UPS to DC, then
BACK to AC so that the hardware connected to the UPS is fed power that is
totally clean because it has gone through a complete AC->DC->AC conversion
and therefore has a perfect 60 Hz sinewave ALL THE TIME. No surges, no sags,
etc. The value of such a UPS design can not be overstated. It will prevent
countless numbers of problems from ever occurring. To me, this issue is even
more important than the details of the power supply and cooling within the
case. Any dollars invested in a *real* UPS will last far longer than dollars
invested in the computer itself.

Finally, I would like to disagree on the subject of tower vs. rackmount cases.
I'd vote for a rackmount unit. It doesn't have to be 1U. As someone mentioned
earlier that has undesirable side-effects. But consider that with rackmount
cases, you can easily get LOAD SHARING POWER SUPPLIES! Such a case will be
equipped with two of them and the load can be handled by only one. Should a
PS fail, the load is picked up by the surviving unit. THen, you just slide
out the failing one and replace it. No rebooting, etc.

John




#90 of 547 by gull on Sun Mar 16 05:08:07 2003:

There are tower cases available with hot-swappable power supplies, as well.

It's nice, but given the reliability of power supplies and the
non-critical nature of Grex, I think it's probably unnecessary.
(Heck, Grex is often taken offline just to run backups.)
It's worth doing if it doesn't cost much more, though.  The big
disadvantage I see, other than the cost of the case, is that you
generally have to use 'special' power supplies then, instead of standard
ATX ones.


#91 of 547 by pvn on Mon Mar 17 09:18:46 2003:

'special' meaning higher price?  Like the 'special' SCSI drives (36G
total) instead of the 360G that one could get for the same price or
less?  I mean if you are going 'commodity' PC hardware for the MB why
not go with commodity drives?  If you want, for about the same amount of
money as the SCSI drives you could do Fibre (using an obsolete
controller I'll grant you) and kick SCSI's butt.  You could
theoritically have 1Gbps over a media that could theoretically deliver
such over 10KM distance using optical fibre.  Odd thing is even at 66Mhz
64-bit PCI they all seem to be about the same when content is delivered
over yer average Internet connection....


#92 of 547 by jared on Mon Mar 17 12:55:10 2003:

Because your 'commodity' drives rely on the central processor for all disk I/O
whereby using scsi offloads that to a seperate processor (on the scsi
controller).  But the performance gains from scsi are clear to be seen.
On any system that gets the volume of mail and users as grex, you need
fast disk for the day-to-day operations.  swap, mail, /etc/passwd all
take quite a hit.  In my own personal mail/web/whatnot server once I
made a recent switch to scsi from ide (with the exception of my truly mass
storage, ie: /home and /mp3 partitions ;-) ) the system performance
increased greatly.  With the userbase the size of grex that
type of benifit can not be ignored.


#93 of 547 by gull on Mon Mar 17 13:36:05 2003:

Re #91: "Special" meaning proprietary to the particular case manufacturer.


#94 of 547 by mdw on Wed Mar 19 05:59:42 2003:

Re rack mount case.  I think the cooling thing is a non-issue.  There
are plenty of people who take short cuts on cooling.  A tower case with
"bad" cooling is no worse than a rack mount case with good cooling.
Any inherent disadvantage rack mount cases might have is probably going
to be cancelled out by the fact that rackmount cases go into
environments where more is expected of them.  *Neither* of these
cases--tower or rack mount, is going to have cooling anywhere near
equal our current sun hardware.  That's a reality we've already
accepted by going to x86 hardware.  I think for grex the real issues
are:

A rack mount case is going to be *slightly* more expensive.
        (the estimate I heard was $150 higher.)
I don't think grex is actually likely to move into a
        rack mountable space in the next 18 months
If we do move, the expense to buy another case and move our
        guts is probably the least of our "moving" expenses.
If we do rackmount, we should probably get 2U [ which would
        probably impact our rental costs slightly were we
        ever to colocate. ]

We could certainly do rackmount in the pumpkin today - we have the very
heavy sun rack mount case sitting there empty today.  It's even got
some very impressive fans of its own.  I see mostly small disadvantages
to the rackmount (slightly more expensive case, more electricity for
fans) that doesn't quite equal the potential "advantage" of moving to a
colocation space where we'd have to be rackmountable.  But frankly this
doesn't seem like a big point to me.

Perhaps we should commission John Doyle to find really cheap
rackmountable cases in SV.  If he can get cases no more expensive than
good tower cases, then I think that make the difference insignificant
and worth going rackmountable.  The negative to buying everything
surplus is much like our past bottom feeding habits - except the cost
is slightly more.

Jared is right that older IDE drives relied on the CPU to do
"programmed I/O".  But this is no longer true, besides which there were
also even stupider SCSI controllers that also did programmed I/O
(mostly for the scanner market, so that's thankfully all been replaced
by USB today.) Basically, SCSI and IDE have been playing leapfrog with
each other, so today's fast IDE subsystem will outperform yesterday's
best SCSI.  I don't think it's ever really been true that IDE drives
took fewer components than SCSI.  The main win IDE used to have is that
it required fewer components *overall*; but I suspect this is both no
longer true (with ide dma) and no longer important (with the degree of
component integration we have today).  What I think matters the most to
us is the relative markets SCSI and IDE aim for; SCSI aims for server
configurations, IDE aims for personal machines.  Server configurations
are going to have greater demands for reliability and random I/O
throughput - at a price.  We're going to have to pay attention to be
sure the avantage continues to be real, and that the price remains
acceptable.  We will also have to accept that whatever we buy today
*will* be outclassed by something out next year - which will be both
faster *and* cheaper, and maybe even more reliable.


#95 of 547 by jared on Sat Mar 22 22:02:35 2003:

Marcus,

I've noticed even modern systems that use the latest (E)IDE technology
still see a considerable hit on the CPU for any disk I/O

This is something that I think is important to keep in mind for
Grex and plan to get a good price-performance ratio.


#96 of 547 by lk on Tue Mar 25 04:19:59 2003:

IF the rational for SCSI is reliability, then perhaps you should also
consider mirrored IDE drives. (If at least one is in a removable bay,
a backup could be as simple as swapping drives and letting the new
drive be rebuilt. I haven't done this so I'm not sure about implemantation.
The same would hold for SCSI but it will be faster -- and more expensive.)

IF drive speed is a concern, get the 15K RPM U320 SCSI Drives.
(I don't believe this has been mentioned, so that might be the plan
rather then 10K rpm drives. I'm not sure if these are available in
18 GB denominations or just 36 and above.)

Lastly, as bdh mentioned, IBM doesn't really make their own SCSI
drives any more. I'm not sure if this is true across the line, but some
recent 36 GB 15K U320 drives I installed were actually Hitachi drives.

(You should be able to get IBM 18 GB U160 10K drives for about $100.)


#97 of 547 by mdw on Wed Mar 26 08:00:53 2003:

It would be interesting to know what the CPU bottleneck is with (E)IDE
these days.  I sure haven't had the time to actually look.  It shouldn't
be DMA, so a good kernel profiler would be entertaining to run.

IBM sold *all* their hard disk stuff to Hitachi.  They've been busy
getting rid of all their magnetic storage stuff.  Given the length of
time they've been in the field, the only reason I can see for them doing
this that they have good reason to believe magnetic storage is going to
become obselete fairly soon.  I don't think this is of any immediate
importance to grex, but if I were investing in the stockmarket, I might
consider this very interesting.

I believe STeve is looking for 15K U320 SCSI.

There's at least 2 problems with mirrored IDE -- proprietary
controllers, and performance during that "rebuild".  The most common
chipset seems to be adaptec "aac" - there are linux & openbsd drivers
for this, but it's not fully functioned.  The raid management stuff in
particular loses; I'm not 100% convinced we would necessarily even know
we lost a disk -- until we lost the 2nd one and were screwed.  Regarding
performance - I think the "7/24" shop most people have in mind with RAID
includes windows of relative idleness.  If you have a truely disk
intensive load with no letup, then the rebuild never completes.
Fortunately, grex doesn't have that, but we have seen that disk
intensive things start slowing everything else enough that the load
average starts to pile up and build.  If we had to go visit the machine
in person to install a new drive, then leave it in single-user mode
during the rebuild, I'm not sure we've really gained all that much vs.
the traditional "restore from tape" model, *especially* if this is
liable to happen more often.

There's another issue to think about too -- our reason for doing tape
backups is *not* just hardware reliability, but also to cover the case
of vandals destroying information.  Online backups don't protect against
this - if a vandal can destroy active filesystems, he can get at the
backup just as easily - and one of the more attractive attacks he can
make is to install a trojan in the backup then destroy the active
filesystem.  So, um, ya the mirrored IDE is an interesting option, but
I'm skeptical that it makes sense for us.  Sure, if we had extra time &
money, mirrored storage could be fun, but I don't see it as really
replacing the need either for reliable hardware in the first place, or
backups to cover the case of vandals in the second.


#98 of 547 by cross on Wed Mar 26 12:28:23 2003:

Most ``24/7'' shops I've seen really are 24 hours a day.  They use disk
subsystems a lot more interesting than what you think, though.


#99 of 547 by mdw on Wed Mar 26 21:30:55 2003:

Most activity I've seen is actually centered (somehow) around human
schedules.  Even in hospitals and in the travel industry this is true.
To get something approximating 24 hours of real activity you pretty much
need some sort of global presence (or some sort of artificial
constraints that causes humans to rearrange their schedule to suit the
computer).  Despite the recent ubiquity of the internet, and the even
more recent fall of the dollar in international markets, I doubt this is
nearly as true of US business in general as Dan's experience apparently
indicates.  And, of course, silicon valley isn't necessarily designing
for Dan's world either, despite the illusion their marketing droids
cast.  If they were, there'd be a lot more discussion about the possible
performance hit while rebuilding a portion of a raid array.


#100 of 547 by keesan on Wed Mar 26 22:00:14 2003:

Many factories in places like China run 24 hours to keep costs down.


#101 of 547 by slynne on Wed Mar 26 22:12:09 2003:

Lots of factories in places like the US run 24 hours a day too. 


#102 of 547 by gull on Wed Mar 26 23:35:53 2003:

I remember one site I had a mail account on that had some kind of
external SCSI RAID storage array.  One day they had a disk fail and the
rebuild, which had to be done offline, took a week to complete.  They
were not amused.


#103 of 547 by styles on Sat Mar 29 18:35:28 2003:

#101: :)


#104 of 547 by gull on Wed Apr 2 15:35:11 2003:

Re #97: Incidentally, isn't it equally likely that the reason IBM is
getting out of the disk business is that they feel there are no major
new bit-density breakthroughs to be made?  In the past they've done well
by being on the cutting edge, but if capacities are going to plateau
soon it's not going to be very profitable for them to compete with other
companies in that area.  I also recall they have a big class-action
lawsuit against them over failures of some of their drives, and the
selloff may be a way to get out from under that.


#105 of 547 by i on Thu Apr 3 00:42:42 2003:

I'd expect that the price IBM gets for the disk business would 
fully reflect the lawsuit liability. 

The version i heard was that IBM couldn't get the kind of
(financial) returns in the beaten-down, cut-price disk market
that it could in most of its other lines of business; so IBM
decided to move it's investments to where the returns were
better. 


#106 of 547 by mdw on Mon Apr 7 05:02:03 2003:

In the US, labour is expensive, and the economy is down.  I don't see an
expansion of economic activity at 3am happening in the near future.

I haven't heard of any technical barrier to higher density disk drives
in the near to medium term.  I imagine we will see a slow-down in disk
drive growth, but that will almost certainly be due to the economy and
demand affecting research and nothing more.


#107 of 547 by aruba on Wed Apr 9 00:30:47 2003:

I purchased the first component of NextGrex today - a CDRW drive from
CompUSA.


#108 of 547 by cross on Wed Apr 9 00:35:51 2003:

Hooray!  I feel like we should have a ceremony or something.


#109 of 547 by other on Wed Apr 9 01:01:46 2003:

Or a meeting!  ;)


#110 of 547 by davel on Wed Apr 9 13:42:15 2003:

I don't think Dan would be able to attend the meeting.


#111 of 547 by jep on Wed Apr 9 15:31:39 2003:

I didn't realize a CDRW was being considered.  I have no objection, but 
does Grex have something in mind for that?


#112 of 547 by carson on Wed Apr 9 16:01:13 2003:

(backups?)


#113 of 547 by other on Wed Apr 9 18:14:35 2003:

boot disks.  if you boot from a cd-r with your custom system on it, it makes
it impossible for anyone without physical access to the machine to make
unauthorized mods to the system.


#114 of 547 by aruba on Wed Apr 9 18:20:48 2003:

That's my understanding as well, from what STeve told me.


#115 of 547 by cross on Thu Apr 10 01:01:54 2003:

No, the commute is a little difficult.


#116 of 547 by aruba on Sat Apr 12 19:42:51 2003:

We now have a case to go with our CDRW drive - an Antec Plus 1080 AMG File
Server case, with 430 watt power supply, that I bought today at CompUSA,
at STeve's request.  ($159.99). 



#117 of 547 by styles on Sun Apr 13 16:44:31 2003:

#115: wimp. =D


#118 of 547 by lk on Sun Apr 13 18:01:06 2003:

Mark, we're not as convenient as the corner store, but we could probably
get you much better pricing on such components. For example, we normally
sell the 1080AMG for $140 and could do better for Grex.  I'm in town at
least once a week, so I could even deliver (though I'd rather not bring
out one component at a time).  If you have a list of stuff you need, feel
free to email me: LK @ stratcom.com


#119 of 547 by aruba on Sun Apr 13 20:28:26 2003:

Thanks Leeron.  I'll send mail.


#120 of 547 by polytarp on Mon Apr 14 11:13:10 2003:

Don't support Zionism, or I won't support Grex.


#121 of 547 by janc on Fri Apr 18 14:04:17 2003:

If we buy stuff from Leeron at a cut rate price that he could actually
sell to other people at list price plus making him waste his time
delivering it to us, then we are probably costing him more money than we
are making him, so if you think Leeron is synonymous with Zionism then
this is a stroke against Zionsim, and you should send us all your money
so we can buy more stuff from Leeron.  Assuming you have any money.

Thanks Leeron.


#122 of 547 by cmcgee on Fri Apr 18 14:14:21 2003:

*ROFLOL*


#123 of 547 by keesan on Fri Apr 18 16:07:09 2003:

Leeron is extremely generous with his time, as attested by him making numerous
trips to deliver at least 50 computers etc. to us for recycling that he had
saved over the years rather than putting out in the trash.  (Some of these
have now been donated to Eritrea, in working order, for use in schools there.)
He does get to Ann Arbor pretty frequently for non-delivery reasons, though.


#124 of 547 by lk on Sat Apr 19 17:36:00 2003:

That's really cool about Eritrea. Makes it all worthwhile.

In good socialist Zionist tradition, we subsidize our clients.

If I were a religious Zionist, I'd probably be doing something dorky,
like giving away free matzah with the purchase of a computer.

So to one-up them, we'll include Israeli chocolate.
With or without nuts, your choice....


#125 of 547 by keesan on Sat Apr 19 20:15:04 2003:

We provided 9 computers for Eritrea - some from Leeron, some from Tim, and
others that we had been using but were able to replace with Leeron's pentiums
(the half of them that worked) and a few other computers from other grexers
given to us as dead.  They will be used with Linux and a wordprocessor, 250M
or more hard drive, 16M RAM.  All 486s.  We were also able to upgrade several
friends with pentiums (thanks, Mary).  And we have more left.  Thanks Leeron.
We did not send chocolate.  

When I changed money at the border going into Greece from Macedonia, the banks
gave everyone a free Easter bread.  


#126 of 547 by aruba on Wed Apr 23 19:38:00 2003:

OK, we ordered the processor, motherboards, and SCSI controller from Leeron.
We still need the disks and memory.


#127 of 547 by aruba on Wed Apr 23 19:41:35 2003:

I hit a little snag trying to order from NEwEgg - they won't ship to a P.O.
Box or to an address that's not registered with our bank.  Unfortunately the
address registered with our bank *is* our P.O. Box, and they tell me they
have no way to add another address to our account.  I wrote to NewEgg for
suggestions on how to get around the problem.

I'd handle this by putting things on my credit card and then reimbursing
myself, but then we would technically owe use tax on what we buy, whereas if
the money comes directly from Grex we don't.


#128 of 547 by aruba on Fri Apr 25 17:43:48 2003:

I spoke with Monique at NewEgg just now.  I'm not convinced she knew what
she was talking about, but when I suggested sending a check with a letter
giving an alternate shipping address, she said that would work.  So I'll do
that.  I'm waiting now to hear back from STeve on whether we should get the
memory from NewEgg now too, 'cuz I don't really want to go through the
process of sending them a check and waiting for them to deal with it twice.

Can anyone explain to me the way memory is defined these days?  Our
motherboard has this in its description:

3 x DDR DIMM PC3200/2700/2100/1600 (DDR400/333/266) non ECC SDRAM 
(Note: PC3200 Max. to 2 banks only) 

which I think tells what type of memory it will accept.  Can someone
decipher that for me?  In the memory aisle, I see stuff like this:

Crucial Micron 512MB 64x64 PC2100 DDR RAM, 184-Pin, CL=2.5-Unbuffered 2.5V,
6-Layers
CT6464Z265 Requires DDR supported Motherboard - Lifetime Warranty. Model#:
CT6464Z265  -OEM

But there are lots of other options for 512MB as well, with different
numbers and different prices, and I don't know which we want.


#129 of 547 by aruba on Fri Apr 25 18:41:12 2003:

OK, Steve says to get the memory from Crucial, not NewEgg, so I went ahead
and ordered both of our disks from NewEgg.  $212 for the 18GB SCSI, $96
for the 80GB IDE.  Total $308.


#130 of 547 by cross on Fri Apr 25 18:55:37 2003:

Did the SCSI controller get ordered, too?


#131 of 547 by aruba on Fri Apr 25 19:53:26 2003:

We ordered the processor, motherboards, and SCSI controller from Leeron. 
He's going to deliver them to me this weekend.


#132 of 547 by cross on Fri Apr 25 20:02:17 2003:

Cool....


#133 of 547 by tod on Sat Apr 26 01:11:42 2003:

This response has been erased.



#134 of 547 by mdw on Sat Apr 26 01:19:10 2003:

I hope we got something that does ECC.  STeve was having trouble
locating this, and the description above has me slightly worried.  The
PC and macintosh world think ECC is unnecessary; in the "server" world,
ECC has been pretty much universal for at least a decade.

3x = 3 times   
DDR, SDRAM = different memory bus chip interfaces
DIMM = physical package style
PC3200/ etc. == probably different PC world standards for memory;
        in this case, sounds like there's a collection of related
        standards that probably only differ in speed.
banks = in this case, probably slots.  More generally, a section
        of memory that is addressed and reponds as one unit.
6-layers = number of layers in PCB - not generally important
        except as a measure of the cost and engineering in the design.
184 pin == # of conductors in a connector.  The "pins" are more
        often pads or fingers in modern designs.
unbuffered = no drivers.  Generally faster but less fan-out.
ECC = error correction code.  Generally such memory can
        fix single-bit errors and detect double-bit errors.
parity = error detection.  Can detect single-bit errors.
virtual parity = memory that lies and says it never has an error.
no parity = memory that can't detect any errors.

But I'd let STeve tell you what to get rather than spending too
much time trying to figure out what all the ciphers mean.  As long
as the computer and the memory like each other, it's not really
important whether they measure things in terms of pins and banks,
or squirrels and pints.  They'll have a different set of ciphers
next year anyways.
 
The number of memory slots has been declining in recent machine
designs - 2 or 3 slots is pretty common.  This maybe an indication
of where "unbuffered" becomes important; to get more slots they'd 
probably have to add additional buffering which might slow things 
down.


#135 of 547 by aruba on Sat Apr 26 03:34:20 2003:

Thanks Marcus.  Here's the full description of the motherboard that STeve
picked out, and we ordered from Leeron:

ASUS A7V8X 1000Mb/s LAN, Firewire IEEE1394, Serial ATA DDR400 AMD
Athlon/Athlon XP/Duron Socket A, Processor Mother Board
Specifications: 
Socket A - AMD Athlon/Athlon XP/ Duron 
Chipsets: VIA KT400/8235 
FSB: 333/266/200 MHz 
3 x DDR DIMM PC3200/2700/2100/1600 (DDR400/333/266) non ECC SDRAM 
(Note: PC3200 Max. to 2 banks only) 
Serial ATA 
Firewire IEEE1394 
LAN BroadCom 1000Mbs Network card 
Ports: 1 x AGP, 6 x PCI, 6 x USB 2.0 
Realtek 6-channel CODEC 
ATX form factor


#136 of 547 by cross on Sat Apr 26 06:27:16 2003:

NON-ECC?  Hmm, why?  And why bother with firewire and all the USB
interfaces?  Man....


#137 of 547 by gull on Sat Apr 26 12:38:37 2003:

I'm a bit disappointed we didn't go with ECC, too, but it probably won't
matter.

USB and Firewire are almost impossible to avoid; they're built into most
motherboards now.  You can disable them in the BIOS if you need the
interrupts for something else.


#138 of 547 by aruba on Mon Apr 28 00:59:49 2003:

Leeron didn't receive the shipment in time to get it to me this weekend, but
if all goes well his partner will drop it off with me tomorrow.


#139 of 547 by aruba on Mon Apr 28 17:29:14 2003:

Leeron's partner should be bringing the motherboards, SCSI controller,
processor, and floppy drive by this afternoon.

I ordered the memory from Crucial.  Since our motherboard can apparently
only handle 2 of the really fast memory chips at once, I ordered 3 of the
next fastest.  Here's the description:

Module Size: 512MB
Package: 184-pin DIMM
Feature: DDR PC2700
Configuration: 64Meg x 64
DIMM Type: Unbuffered
Error Checking: Non-parity
Speed: 6ns
Voltage: 2.5V
SDRAM Timings:CL=2.5


#140 of 547 by keesan on Mon Apr 28 18:12:42 2003:

Enjoy the chocolates ;)


#141 of 547 by aruba on Mon Apr 28 18:40:33 2003:

OK, I am now in possession of
/------------------------------------------------------------\
| 1 AMD Athalon XP 2800, with a big honkin' heatsink and fan |
| 2 Asus A7V8X motherboards with consecutive serial numbers  |
| 1 Adaptec 29160 SCSI controller card                       |
| 1 Sony floppy drive                                        |
\------------------------------------------------------------/
Courtesy of Leeron and his partner Matt.  No chocolates were included,
though I did get some bubble wrap.

I also ordered, at STeve's request, a 3-CD copy of Open BSD 3.3.  It will
ship Thursday, which is the release date.  ($40).


#142 of 547 by aruba on Mon Apr 28 18:48:23 2003:

STeve would also like to get a special OpenBSD keyboard - does anyone have
one they could donate?  I'm not sure exactly what's special about it - I
asked STeve to explain it here.


#143 of 547 by aruba on Mon Apr 28 19:45:45 2003:

Here's what STeve says he wants:

They're USB keyboards, as in universal serial bus keyboards.
These are new, which is why I think we'll wind up getting one
ourselves, but its always possible I suppuse that we'll get one
from someone.  The older ibm at standard keyboards are just
about a buck each, used, but usb keyboards are a different
beast.  And unforunately, the most common usb keyboards
are made by Apple, and they're really pretty bad (the one
bad thing about the new macs of the last couple years).


#144 of 547 by glenda on Mon Apr 28 21:54:34 2003:

Has STeve said anything about when all of this is going to invade my house?
(He says I get to help play with it and I need to figure fitting it into my
schedule.)


#145 of 547 by steve on Mon Apr 28 22:09:39 2003:

   No, I haven't, but thats because I don't know when all the parts will have
straggled in.  I think we're pretty close now.

Re #142, 'tis USB keyboard, not OpenBSD.  I wonder what an OpenBSD keyboard
would look like?  Would it be sutable for blowfish? ;-)


#146 of 547 by aruba on Mon Apr 28 22:25:43 2003:

Ah, I see I misread the email.  Why do we need a USB keyboard?


#147 of 547 by mary on Mon Apr 28 23:05:31 2003:

STeve, do you have the time to work on it at this point?
Originally you said you'd be busy starting May 1.  If you
can get to it, great.  But if you think it will have to 
be fit in around a very busy life then I'd like to look
and maybe another volunteer, like dang, taking this part
of the project on.

It needs to go forward.


#148 of 547 by glenda on Mon Apr 28 23:18:31 2003:

That's partly why I get to play.  :-)   It will get done!


#149 of 547 by steve on Tue Apr 29 00:09:52 2003:

   I'm at the end of the academic year, and thats going to make a HUGE
difference for me.  Yes.  This will be the fun part...

   We need a USB keyboard because the asus motherboard doesn't have
much in the way ot "legacy" I/O devices.  For instance, it has none
of the old ISA cards, only PCI slots.  This makes the board faster
since it doesn't need any glue logic to handle the old style 8MHz
286 AT-style cards.  We need a USB keyboard since it doesn't have
an old style keyboard port.


#150 of 547 by mdw on Tue Apr 29 03:13:35 2003:

There's a big push in the PC world to get rid of "legacy" 8-bit devices.
That includes serial ports, floppy drives, and of course keyboards &
mice.  Pretty much everybody (apple, sun, ibm) supports USB - this is
clearly the wave of the future, and AT keyboards will soon be about as
obselete as appletalk keyboards are today, or 8" floppy drives 15 years
ago.  Basically, having to get a USB keyboard is a consequence of trying
to get new high end hardware.


#151 of 547 by cross on Tue Apr 29 04:08:45 2003:

Get a Happy Hacking Lite 2 keyboard with a USB interface; it's about the
best Unix keyboard I've ever used.  And doesn't have any extra and useless
keys.  And it's tiny and only takes up a real small amount of space.

http://shop.store.yahoo.com/pfuca-store/haphackeylit1.html


#152 of 547 by cross on Tue Apr 29 04:16:04 2003:

Hum.  I guess I'm kind of bummed about the RAM; what was the rationale
with not getting *any* error checking on it?  (Not even parity checking!)


#153 of 547 by aruba on Tue Apr 29 04:36:05 2003:

$69 seems pretty steep for a keyboard.  I'll take a look at CompUSA and see
what they have.

STeve will have to answer the questions about memory.  He told me to buy
from Crucial, and they don't seem to offer any memory that isn't either ECC
or non-parity.  (At least not any that will fit our system.)


#154 of 547 by steve on Tue Apr 29 04:38:24 2003:

   Finding Athlon motherboard that use ecc ram is really hard thesedays.
In talking with hardware vendors, ecc is going away.  The reliability 
of standard memory has improved to the point that the economics have
driven ecc away.  The good news is that ram is truly more reliable
than it was even five years ago.  That, and the fact that ecc memory
is slower and has been pushed out of the way by people who want the
fastest systems possible.  In looking at high end motherboards I was
struck by the fact that gamers now drive the market.  Most people, and
most businesses don't need that Nth increase in speed, but gamers do.
CompUSA has changed their marketing scheme to accomodate this, for
example.  I couldn't get a figure out of some managers there when
talking about their gamer market, but they did say it was "a lot".


#155 of 547 by steve on Tue Apr 29 04:39:39 2003:

   The motherboard was not designed for it, and in fact crucial doesn't
make ecc memory of that kind.



#156 of 547 by steve on Tue Apr 29 04:45:57 2003:

   Tha happy hacking keyboard looks nice, but I wonder if spending $69
for it is worth it.  Remember that this is going to be used 1) at the
very start of installing OpenBSD, and then for emergencies and backups
when in the Pumpkin.  Newegg has several at $25 or less, as does compUSA.
I want to poke at one to see what they feel like.


#157 of 547 by mary on Tue Apr 29 11:09:47 2003:

Would folks see any use to having another meeting to include board, staff
and anyone else who cares to attend, prior to the assembly? 

The next BOD meeting is for next Tuesday, May 6th, at Zing's.  The next
Plan Grex meeting is for the afternoon of Saturday, May 18th. 

My only concern to simply adding this to the agenda for Tuesday's meeting
is if the involved people (staff) would be able to make it. 



#158 of 547 by glenda on Tue Apr 29 12:36:58 2003:

Saturday, May 18th doesn't exist, the 18th is a Sunday :-)  STeve will not be
available for either as we will be in Dayton attending the Ham Radio
Convention.


#159 of 547 by gull on Tue Apr 29 13:05:23 2003:

I got an IBM USB keyboard at Staples a while back.  I don't remember how
much I paid for it, though.  It's nice, but fairly large because it has
a lot of 'extra' buttons that you wouldn't need for Grex's console.


#160 of 547 by keesan on Tue Apr 29 13:16:02 2003:

Doesn't Leeron sell USB keyboards?


#161 of 547 by steve on Tue Apr 29 13:29:45 2003:

   Mary, you want to slow things down?  I don't, more than they already
have been.


#162 of 547 by mary on Tue Apr 29 13:39:19 2003:

It is indeed *Sunday*, the 18th.  

Nope, I don't want to slow this down.

I'll be frank, STeve.  Staff should be making decisions
regarding Grex, not any one person.  I don't see it as
a given that this project rests in your hands just
because you see it that way.

If the consensus among staff is that the project is best
given to you, great.  I'm looking for that consensus.
It wouldn't take a face to face for that to happen.

I'm sorry you can't make any of the planned meeting dates.


#163 of 547 by steve on Tue Apr 29 13:39:51 2003:

   I wished I'd remembered the date for Dayton a little better.  We will
be leaving for home in the early afternoon, so its still possible to 
make the next gen meeting, depending on when in the evening it is.  When
is it?


#164 of 547 by mary on Tue Apr 29 13:40:42 2003:

STeve, will you be available for the next board meeting,
this coming Tuesday evening?


#165 of 547 by steve on Tue Apr 29 13:42:32 2003:

   Well, as I said I think we can make that.  It's important and its
alawys better to leaver earlier than later.  I'd like to get the
components we have and start building what we have.

   Can I do that or are we going to have a meeting about it?  I'd
like to get them either tonight if I'm back in town early enough,
or tomorrow.


#166 of 547 by steve on Tue Apr 29 13:43:10 2003:

   Yes, I can make that.  I hope we don't delay things a week.


#167 of 547 by aruba on Tue Apr 29 20:56:25 2003:

I received mail from NewEgg, letting me know that they got our check.  So
hopefully they'll be able to ship our disks this week.  I also checked on
our memory order, and it did indeed ship today.


#168 of 547 by aruba on Wed Apr 30 14:11:17 2003:

Looks like the motherboards we got from Leeron aren't quite the right ones.
(There are a lot of different versions of this board, with different
options.  And everyone seems to have different model numbers for the
different versions, making it all very confusing.)

Leeron is looking into exchanging them right away, so this shouldn't slow us
down too much.


#169 of 547 by aruba on Wed Apr 30 19:38:26 2003:

Leeron's going to ship us new motherboards - they should arrive Friday. 
NewEgg's waiting for our check to clear, which should take 3-5 days.  Our
memory left Salt Lake City sometime yesterday, so it should be in the
midwest by now.


#170 of 547 by aruba on Wed Apr 30 20:51:47 2003:

Indeed, when I got home I found that the memory arrived this afternoon.  So
now we're just waiting on disks and replacement motherboards.


#171 of 547 by other on Thu May 1 22:10:13 2003:

Of the three places I know of that I should or want to be on the evening 
of Sunday 18 May, the one that wins is "on a plane from Denver."


#172 of 547 by aruba on Fri May 2 21:48:33 2003:

Leeron certainly is as good as his word - the replacement motherboards
arrived just now, and they appear to be just what we need.  Thanks Leeron!


#173 of 547 by aruba on Sat May 3 21:53:56 2003:

Today I bought two extra Antec fans at CompUSA, for $15.99 apiece.


#174 of 547 by aruba on Sun May 4 21:33:05 2003:

Today, janc, dang, danr and I got together and began assembling Next Grex.
All went well, and NextGrex is currently running an infinite memory test.
(Since it doesn't have any hard disks yet, there's not too much else it can
do.)

Jan's going to make up a web page describing everything we did.  But
basically, we

- Installed the CPU in the motherboard.  Since cooling is an issue, I
decided to spen $14.99 for a tube of "Arctic Silver" thermal grease, which
goes between the CPU and the heatsink.  All the web pages I looked at said
it works much better than the stuff which came already attached to the
heatsink. 

- Installed the two extra fans I bought yesterday.

- Installed the CDRW in the case.  This is very easy in our fancy case -
just screw some little plastic runners on the drive, and it slides in the
bay right from the front.

- Likewise with the floppy drive.

- Installed the port template which came with the motherboard on the back
of the case.

- Screwed the motherboard into the case.

- Put the memory in.

- Connected lots of power wires to the motherboard, drives, and all the
fans.  Connected the USB ports on the front of the case, and firewire
ports on the back.  Also the wires for the power switches, speaker, and
LEDs on the front of the case.

- Installed the SCSI card.

dang had brought a video card with him, and we put that in, plugged in a
monitor and keyboard, and booted it up.  dang set the processor speed in
the BIOS, and set the system to boot from CD.  He put in a copy of Linux
he had on CD and we successfully booted from it.  It recognized all the
hardware we had installed.

So, all in all, very successful.  Hopefully by next weekend the disks and
OS will have arrived, and we can take the next step.


#175 of 547 by cross on Sun May 4 22:11:52 2003:

Wonderful.  Sounds great!


#176 of 547 by steve on Sun May 4 23:54:20 2003:

This response has been erased.



#177 of 547 by steve on Sun May 4 23:56:42 2003:

   Is the CPU a retail or OEM unit?  If it was retail, the usage of heat sink
compound has voided the warranty.  For an OEM CPU, it isn't quite clear to me
what is what.

   There is an article on this at http://www.xtremetek.com/info/index.php?i
d=14&page=1
that talks about this.



#178 of 547 by steve on Mon May 5 00:06:15 2003:

   I'm glad to hear that it booted up and is running the memory test.  Booting
from a CD at least partly proves that the ide controller works, and that the
CD works, too.


#179 of 547 by steve on Mon May 5 01:02:46 2003:

   Wasn't the ide disk purchased?  Seems one might be able to boot from
that.  But the booting from the cd should indicate that the onboard
ide controller is OK.



#180 of 547 by aruba on Mon May 5 03:27:01 2003:

The hard disks haven't arrived yet, because NewEgg needs our check to
clear before they will mail us anything. 



#181 of 547 by aruba on Mon May 5 03:39:50 2003:

Re #177: That's an interesting article.  Based on it and the other things I
read on the web, I think using the fancy thermal compound was the right
thing to do.


#182 of 547 by janc on Mon May 5 03:58:57 2003:

First draft of a web page on the system construction is at

  http://www.unixpapa.com/newgrex/

Some mediocre photos are included.  This needs more work before it becomes
a staff notes page.


#183 of 547 by aruba on Mon May 5 17:02:26 2003:

When I left home this morning, NextGrex had run 15 cycles of the memory test
(it takes about 67 minutes per cycle), with no memory errors.


#184 of 547 by tod on Mon May 5 17:11:19 2003:

This response has been erased.



#185 of 547 by mary on Mon May 5 17:16:27 2003:

A huge thank you to all involved.  This
is too cool.


#186 of 547 by aruba on Mon May 5 21:58:05 2003:

NextGrex has been running over 24 hours now.  It's completed 22 passes of
the memory test, with no errors.


#187 of 547 by aruba on Tue May 6 14:13:47 2003:

NextGrex has now completed 38 cycles of the memory test without an error. 
The room it's in is noticeably cooler than the rest of the house - I think
those 8 fans are really pushing some air around.


#188 of 547 by jhudson on Tue May 6 15:43:29 2003:

lol at the fans


#189 of 547 by steve on Tue May 6 16:01:19 2003:

   I wanted to see USB and FireWire available because the future is not
clear, but having those abilities means we can use them if we want to.
OpenBSD already has some FW support in it; I've been playing with it
and while not stable, it definitely works.  Not quite yet ready for prime
time, but tats OK, since we aren't using it.  USB 2.0 work is evolving
as well.  Unless the specifics of the motherboard have changed again (it
gets confusing looking at Asus stuff), we also have serial ATA, should
we want to go in that direction at some point.  This means we have just
about all the hardware options that exist: FireWire, USB, IDE, SCSI and
SATA, for peripherals.


#190 of 547 by aruba on Tue May 6 17:36:30 2003:

Yes, we definitely have SATA.


#191 of 547 by aruba on Tue May 6 21:30:38 2003:

I pestered NewEgg, about our disks, and they have moved the process along a
bit; with luck they will ship today or tomorrow, and wiht a little more luck
we'll have our disks by the weekend.

I also pestered openbsd.org, which hasn't even charged our credit card yet.
They will be a little longer - they always experience backlogs around the
time of a release, and the version we ordered was released May 1st.  They
haven't gotten to our order date yet, and their ordering FAQ says they can
be as much as 10 days behind at release time.  Then, since it's coming from
Alberta, it will be a week to 10 days before we receive our CDs. :(


#192 of 547 by steve on Tue May 6 21:46:40 2003:

   But that doesn't matter -- we have what we'll install already.  Too bad
about newegg; I've never made an order by anything other than a cc; seems
they are really optimized for that and nothing else.


#193 of 547 by aruba on Wed May 7 03:04:12 2003:

OK, our disks have shipped.  We should receive them on Friday.


#194 of 547 by gull on Wed May 7 13:09:28 2003:

Re #177: That's kind of bone-headed of AMD.  I've seen evidence that the
phase change material simply isn't adequate in some situations, and not
just on overclocked CPUs, either.  A friend of mine put on a heat sink
with just the phase change material, no grease, and the CPU had
overheating problems until he went back and did it with thermal
conductive grease.


#195 of 547 by aruba on Wed May 7 16:42:09 2003:

At the board meeting last night we decided to order two more 18 Gig disks.
Since NewEgg is kind of a pain for us to deal with (though they're just fine
if you want to ship to the address on your credit card), we decided to order
from Leeron, even though his price is slightly higher.  I called Leeron and
did that, and he thinks the disks will be here by Friday.

Our disks from NewEgg are in LA, according to FedEx.


#196 of 547 by keesan on Wed May 7 18:03:46 2003:

How much did grex save overall by ordering from Leeron?  


#197 of 547 by aruba on Wed May 7 19:50:29 2003:

We saved $136 over NewEgg's price, on the stuff we ordered before.  We'll
lose a little bit of it on these disks, because NewEgg's price is lower. 
But we'll have them a lot faster, and returning them if there's a problem
will be a lot easier.


#198 of 547 by aruba on Thu May 8 23:39:08 2003:

The two disks from NewEgg arrived today - I picked them up at the FedEx
office by the airport.


#199 of 547 by aruba on Fri May 9 01:19:10 2003:

BTW, if anyone wants to see the list of what's in the new machine, go to
/----------------------------------------------------\
| http://www.cyberspace.org/~invent/item.cgi?num=256 |
\----------------------------------------------------/
That shows the data for the case, and at the bottom is a list of
everything inside.  You can click on those items for details about them.


#200 of 547 by aruba on Fri May 9 03:18:27 2003:

Looks like our SCSI controller card has 68 pins while our disks need an
80-pin SCA connector.  I wrote to Leeron to see if he can sell us some
adapters.


#201 of 547 by gull on Fri May 9 12:59:15 2003:

If Leeron doesn't have one, try www.atozcables.com.  That's where I got
mine last time I needed one.  They have them for either $20 or $28 each,
depending on whether or not you need one with termination.


#202 of 547 by aruba on Fri May 9 14:29:01 2003:

Thanks David.  Leeron says he can order some adapters for us, for about
$10 each, but they will take 7-10 days.  I understand the difference
between the cables now - the 80-pin cable (which our drive wants) includes
not only the data interface, but also power and SCSI ID setting.  (The
SCSI ID is set by the adapter via software, instead of being a jumper
setting right on the drive.)

These Seagate drives come in two versions, one with an 80-pin connection
and one with a 68-pin connection (plus power connection and SCSI ID jumper
block).  At the moment, I'm inclined to send back what we have and get the
68-pin version, so that our drives are compatible with our interface card.
Getting adapters for all the drives seems like a hack, and will make the
inside of the case more complicated than it needs to be.  (Here's a
picture of an adapter; it's got a little circuit board:
http://www.mycableshop.com/popups/SCA806850.htm)  Plus, we'd need two
types of cables. 

Unless, that is, there's an important advantage to having 80-pin drives.


#203 of 547 by dang on Fri May 9 15:55:06 2003:

I'd vote for sending them back and getting the correct drives. I've used the
adaptors, and they're usually fairly shoddy (although they *do* work). I have a
free 68-pin SCSI drive that I can temprorarily donate for testing/burn-in
purposes, so that this doesn't waste any time for us.


#204 of 547 by aruba on Fri May 9 18:01:24 2003:

OK, I called NewEgg and got an RMA number to send back the 80-pin drive.  It
was going to be a pain to re-order the right drive from them, so I called
Leeron and told him to send back the two he just got for us, and in their
place get us 3 with the correct connectors.  This will cost us about $17
more per drive than going through NewEgg, but Leeron is a lot faster and
more accomodating. :)


#205 of 547 by scg on Sat May 10 19:42:22 2003:

The general rule with non-obvious changes and warranties is that you void the
warranty if you tell them you made the change.


#206 of 547 by aruba on Sat May 10 23:25:50 2003:

I turned off NextGrex last night because some big thunderstorms were
approaching Ann Arbor, and I don't have a UPS.  It had been up for over 5
days, running the memory test, with no errors.

We'll put the IDE disk in tomorrow, and test the SCSI controller with a disk
of dang's.


#207 of 547 by gull on Mon May 12 12:52:52 2003:

Sounds like the right decision, disk-wise.  SCA connectors seem to be 
mostly made for plugging hotswappable drives into backplanes.  Any other 
use of them is kind of a hack.


#208 of 547 by scott on Mon May 12 13:50:09 2003:

Ditto.  Much better to fix it now than to forever curse the adapters.


#209 of 547 by aruba on Tue May 13 04:34:49 2003:

I put the IDE disk in yesterday, and installed Windows 98 on it in order to
test out our hardware.  (Don't panic, it's only temporary.)  I had to hack
system.ini because Windows gets confused by how much memory we have, but now
everything seems fine.  I installed a driver for the ethernet chip on our
motherboard, connected the computer to the LAN in my house, and created an
internet connection through the router in the basement, and voila, here I am
talking to OldGrex from NextGrex.  Everything looks good.


#210 of 547 by polytarp on Tue May 13 04:46:35 2003:

WE SHOUYLD HAVE OLDGRAX USEABLE EVEN AFTER NEWGREx, YOu're saying?


#211 of 547 by janc on Tue May 13 13:22:36 2003:

We, I guess.  No parts from old grex will be used in newgrex.  However, I
can't, off hand, think of any use for old grex, and don't think we have any
plans to keep it running.


#212 of 547 by other on Tue May 13 13:30:49 2003:

And, before anyone asks, once the user partitions are successfully copied to
nextgrex, the disks will be destroyed to insure the privacy of Grex's users.

As far as I'm concerned, anyone willing to cart away the current machine after
the new machine takes over (with appropriate transition period) is welcome
to it.  (Minus the user disks, of course.)


#213 of 547 by janc on Tue May 13 13:40:45 2003:

I can't imagine why we'd destroy the disks, and I can't imagine Marcus
and STeve agreeing that we don't need the old Grex anymore.


#214 of 547 by gelinas on Tue May 13 13:48:03 2003:

Sufficiently sophisticated disk-recovery tools can do some amazing things.
The only way to ensure these tools don't work is physical destruction of the
disks.  I can see an argument that nothing on grex should be that sensitive,
but we aren't talking about *my* data on grex.  As long as we retain physical
possession, there is no need to destroy the disks.


#215 of 547 by cross on Tue May 13 14:13:40 2003:

I can't imagine anyone being that interested in grex's user disks,
despite what some folks think.  I'd say scrub them and give them
away.


#216 of 547 by scott on Tue May 13 14:39:30 2003:

I can't imagine there being any real value in the old Grex hardware.


#217 of 547 by keesan on Tue May 13 15:26:22 2003:

What is it that is supposed to be kept private, the passwords?


#218 of 547 by scott on Tue May 13 16:00:16 2003:

Files in home directory, email, staff conference.


#219 of 547 by drew on Tue May 13 18:24:26 2003:

If you can get good enough random numbers, it might suffice to do a
dd if=/dev/random of=/dev/sdx.


#220 of 547 by hal9 on Tue May 13 20:25:12 2003:

`shred' (a GNU coreutils software) announces that it can prevent
recovery of erased data by writing sucessively several different
bit patterns over the files.  More details on the paper "Secure
Deletion of Data from Magnetic and Solid-State Memory", by Peter
Gutmann.  (http://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html).

The only drawback is that, since it overwrits the disk several times,
it is extremely slow.  But, after the transition, I don't think time
will be a problem for oldgrex.

Also note that nothing is 100% effective, of course.  Physical destruction
is the only guaranteed way of safeguarding the disk contents.  shred's
info page goes to the extreme of telling that the /only/ 100% way is
melting the disk on *-acid.


#221 of 547 by other on Tue May 13 22:26:34 2003:

Why the heck would we KEEP the current Grex one we complete the migration 
to the next Grex?

Do we still need the first Grex?  Or the second?  So why not give it away 
to someone else who might actually put it to use?  Including Marcus or 
STeve, if they want to take it away.

And, if we're not going to use the current disks on the new system, then 
why should we keep them?  And if we're not going to keep them, then we 
damn well ought to destroy them because it is the only way to absolutely 
insure that their contents are unrecoverable.

I don't think my comment was radical, and I DO think it was logically 
sound and consistent with both our past practices and our current 
philosophies.


#222 of 547 by keesan on Tue May 13 22:45:12 2003:

Can't you simply overwrite the entire disk with 0's?


#223 of 547 by other on Tue May 13 23:08:42 2003:

There are a lot of levels of sophistication of data recovery tools 
available, and I don't know how available products of any particular 
level are, but it is quite possible that no reasonable amount of 
overwriting with 1s, 0s and/or random ASCII values would entirely 
obliterate and render irretrievable someone's personal data on these 
disks.



#224 of 547 by styles on Tue May 13 23:11:39 2003:

/dev/zero is your friend.
dd if=/dev/zero of=/dev/whatever bs=8192 (blocksize on grex is probably lower)

there may be some concern about the disks being magnetic and the zero's not
doing enough.


#225 of 547 by gelinas on Wed May 14 00:34:23 2003:

(The question is not, "Who would be interested in the data on the disks?" 
The question is, "Who would be interested in _their_ data on the disks being
released or revealed?"  We've too many users to get ALL of them to answer that
question negatively.)


#226 of 547 by lk on Wed May 14 01:26:23 2003:

The question I'd ask: is it easier to (potentially) crack root and see the
data on the disks or to actually recover the data once reasonable precautions
are taken to erase it.  The point being that no one should ever expect that
their data on a public access system is 100% secure.

Of course, if STeve or mdw are interested in the old machine, that would
solve the problem given that the scrubbed disks would be in safe hands (for
some time to come).


#227 of 547 by cross on Wed May 14 02:01:19 2003:

I agree with #226; no one on grex has any sort of guarantee about the
safety of their data.  Indeed, grex is planning on using a password system
on next grex that inherently compromises the data of all users if someone
has managed to crack root.  Going and getting the disks from someone in
Michigan after they've been scrubbed is a lot more work than just getting
the data off the disks now or after the transition to the next grex.

I sympathize with Joe's sentiment about wanting to keep user data secure,
not it's not going to be any less secure on a scrubbed disk as it is on
grex now or in the future.


#228 of 547 by i on Wed May 14 02:03:51 2003:

With a clean room for disecting disk drives, some millions of dollars worth
of exotic high-tech instruments, and skilled staff to match, it should be
presumed that supposedly-totally-erased data can be recovered from drives.

Anyone *that* interested in the data could get it far faster, sooner, and
cheaper in a host of other ways, starting with simple physical break-in.

Thus, it's reasonable to assume that any data on grex worthy of such
efforts has already been stolen, and giving the hypothetical hostiles an
extra copy is actually *good* tactics - they waste resources to read it.


#229 of 547 by other on Wed May 14 02:17:21 2003:

Well.  I guess I'M the one being anal about security this time.  It's a 
rotating responsibility.  Someone else take over, 'cause it looks like 
I'm done.


#230 of 547 by polytarp on Wed May 14 02:40:32 2003:

WE NEED TO STOP THE SUBVERSIVEs...   SQUIRRLEy-Group?


#231 of 547 by scg on Wed May 14 04:58:00 2003:

The first Grex is (or at least was the last time I saw it) in Marcus's
basement.  As of a couple years ago, when I was last in the Pumpkin, Grex 3
was still there.  I think Grex 2 may have been as well, but Grex 2 may have
been harvested for parts (2 and 3 were similar enough for some hardware to
be interchangable.


#232 of 547 by cross on Wed May 14 13:08:48 2003:

Regarding #229; There's nothing wrong with being anal; but if you're
going to be anal about one thing, it's best to be anal about everything
else, as well.  For instance, not just merging the existing contents of
/etc/shadow into a Kerberos KDC for use as keys....

Security is all about tradeoffs.  If people really wanted their data to
be secure, they'd encrypt it, put it on some sort of tramper-resistant
media, enclose that in a cube of lead with two foot walls, enclose that
in a block of concrete, booby trap it so that if anyone tries to open it,
they die, and dump it into the Mariana's trench; all in secret so that
nobody knew they'd done it.  Even then, it wouldn't be totally secure.

One has to do a risk analysis, and determine whether the cost of
protecting the data from prying eyes is worth the value of the data.
If it is; great, do whatever you need to to make sure no one gets access
to it.  If not, then take some reasonable precautions, but don't lose
sleep over it.  Data from grex definately falls in the latter category.


#233 of 547 by jep on Wed May 14 14:47:58 2003:

Oh, I'd say para 2 in ersp:232 describes "total security" in real-world 
terms.  There's no way to recover anything 7 miles into the ocean.

Leeron in resp:226 and the next several comments describe my opinion 
about the need for disk security.  Grex needs to reasonably match the 
security presently given to that data.  That's all anyone has any right 
to expect.  A good formatting of those drives ought to be easily 
sufficient to keep the data as secure as it is now.

My goodness, how difficult would it be for someone to break into the 
Pumpkin right now and steal tapes, hard drives, or even all of Grex?  
Where else are backups kept?  Any of those places could be breached by 
someone with such sophisticated specialized training as we probably all 
got from our parents when taught how to use a screwdriver.  It'd be a 
lot easier to steal the data (and cheaper, and much more reliable) than 
to recover data from a formatted hard disk.


#234 of 547 by cross on Wed May 14 16:21:09 2003:

Yes, but one might throw out one's back trying to steal the current
grex.


#235 of 547 by other on Wed May 14 18:20:29 2003:

re #233:  Out of curiosity, do you actually KNOW the location of the Pumpkin?

When was the last time someone cracked root on Grex?  

What does it cost us to destroy the old disks?  What if a user who wants their
privacy doesn't know enough to know the real risks to the privacy of their
data inherent in placing it on Grex?  I'd say trashing the disks is less work
and more security than wiping them a few times, and eliminates the risk of
charges of carelessness with user data. (Whether that risk is real or
imagined.)

But I really don't care that much about it.  I don't keep my SSID and credit
numbers on Grex...  <shrug>


#236 of 547 by cross on Wed May 14 18:52:53 2003:

Regarding #235; No, I don't.  But I'm willing to bet that someone who's
going to go to the trouble of restoring user data off of the disks
(how are they going to locate them, anyway?) does.  When was the last
time someone broke root?  Well, how do you know that anyone other than
the person or persons who did so know?  Someone who cares enough about
Grex's data is likely to be able to find someone who could break in
without anyone knowing.  Besides, grex runs some insecure software.
The version of sendmail it runs is (last time I checked, anyway)
potentially vulnerable to some well-known holes.  If a user stored data
on grex without realizing that they had no expectation of the privacy
of that data; well, tough.  And besides, making a good faith effort
at protecting that data by scrubbing the disks is enough to avoid any
charges of negligence (which are purely hypothetical anyway).

Now, don't get me wrong.  If you want to destroy the disks; go for it.
But it's not necessary, and people should be educated about why that is.


#237 of 547 by tod on Wed May 14 19:22:37 2003:

This response has been erased.



#238 of 547 by aruba on Wed May 14 20:28:41 2003:

The SCSI disks arrived yesterday.  They have the right connectors.  Thanks
Leeron!  I'll be putting them in this week, and if I can, testing them with
Windows.


#239 of 547 by cross on Wed May 14 22:39:17 2003:

Aww....  At least test it with some variant of Unix.  :-0


#240 of 547 by aruba on Thu May 15 01:39:46 2003:

UNIX will get its chance, don't worry.


#241 of 547 by scott on Thu May 15 02:38:37 2003:

I've got plenty of Linux distros, Mark.


#242 of 547 by gelinas on Thu May 15 03:20:02 2003:

(I don't think the old disks would be vulnerable to targetted data recovery,
but they could cause unintended disclosure: someone put something they
really shouldn't have on the disk and then forgot about it.  If the disks
were sold to a user of grex, though, targetted data recovery becomes a
higher probability.  (Say, 30% instead of 15%, to pull some numbers from
the air.))


#243 of 547 by scg on Thu May 15 05:02:50 2003:

Being a pack rat, I'd be tempted to keep the data intact in case anybody
wants it for historical research in a hundred years, but that's just me.


#244 of 547 by cross on Thu May 15 12:42:59 2003:

Regarding #242; Joe, even if they scrub the entire disk?  Just curious.


#245 of 547 by gull on Thu May 15 13:08:14 2003:

I'd say the amount of time necessary to recover data from a scrubbed
Grex disk is going to be totally out of proportion to the value of any
data likely to be on those disks.  We're not talking about a situation
where you can just run 'undelete' and get it all back, this is an
expensive and time-consuming process.


#246 of 547 by jep on Thu May 15 16:23:09 2003:

re resp:234: I don't know the location of the Pumpkin, but don't 
imagine it would be difficult to find it out if I wanted to.  I might 
even send you an e-mail: 

   Hey, Eric!  Where is the Pumpkin?  Just curious.

Would you refuse to answer such a request?  If I sent it to 
staff@grex.org, shouldn't I expect to get an answer?  I don't think 
Grex is all *that* security conscious.


#247 of 547 by jhudson on Thu May 15 21:39:43 2003:

I can give you the street address if you wish.


#248 of 547 by cross on Thu May 15 21:45:48 2003:

Shh!  Don't *do* that!  The evil ones might go and steal grex.

At least it'll be easy to identify them at the hospital: they'll
have hernias.


#249 of 547 by aruba on Thu May 15 23:09:39 2003:

The address of the Pumpkin is not something Grex makes a point of
publishing.  For one thing, we don't want anyone to go to the Pumpkin (or
send mail there) if they need to contact someone about Grex.  For another
thing, well, I don't know what the other thing is.  But there's no real
reason for anyone but staff to go there. 

But, as several people have pointed out, I'm sure it wouldn't be hard to
find out if you wanted to.  I just typed the address into google and it
found someone who's listing Grex under that address.  Hmmm, we should
probably do something about that...


#250 of 547 by tod on Thu May 15 23:34:27 2003:

This response has been erased.



#251 of 547 by other on Fri May 16 00:45:42 2003:

No, no.  NORTH Huron!  (Grex moved to Ypsi...)


#252 of 547 by spooked on Fri May 16 01:24:45 2003:

*smiles*  We guard it by BIG nasty dogs - they won't get too far :)


#253 of 547 by gelinas on Fri May 16 02:17:41 2003:

Dan, yes, even if the disks were scrubbed first.

I know folks with lots of spare time on their hands.  I know folks
who have written their own disk-recovery software.  (To the best of my
knowledge, the intersection of those two sets, BTW, is the null set.)
I can see someone with the time and interest using the grex disks as an
experiment base for their own efforts.  (They'd probably settle for *any*
disk, not just grex's.)


#254 of 547 by lk on Fri May 16 04:30:32 2003:

Who's in charge of offing people who find out the address of the Pumpkin?

Mark, don't freak out, but I have your address. See that car parked
outside your house? The dark van, with the tinted windows? That's my
sister. She's Mossad so you might not be able to spot the van. Nonetheless,
since you have the new disks, it's only a matter of time before you deliver
them either to the Pumpkin or to someone else who will ultimately take them
there. Don't look over your back, she's following you. Take my word for it.
And then, when we discover the location of the pumpkin, we'll contact
G Gordon Liddy to break in and steal the tapes. Er, disks....

Actually, I got a good laugh from Walter's comment:

> it's reasonable to assume that any data on grex worthy of such
> efforts has already been stolen

So we're just discussing how wide to leave open the barn doors. (:
(Though obviously, some horses are still inside.)


#255 of 547 by aruba on Fri May 16 13:12:34 2003:

So *that's* why that woman was following me yesterday.


#256 of 547 by jhudson on Fri May 16 15:46:22 2003:

What's the matter guys, can't figure out how I know the address
even though I am ~2000 miles away.


#257 of 547 by tod on Fri May 16 16:16:03 2003:

This response has been erased.



#258 of 547 by cross on Fri May 16 16:34:52 2003:

It depends on whether Leeron's sister is smoking or not.  Leeron,
got a picture?  Nyuk nyuk nyuk.  Beware those Israeli women, though;
though they smoke, they're heart breakers.

Regarding #253; Joe, if you know someone who can recover data from a
properly scrubbed disk, I'd almost be willing to say, give them the
disks and see if they can get anything off of them.


#259 of 547 by lk on Fri May 16 17:13:57 2003:

My dad wouldn't let us drink colored food items in the 1960s, long
before the FDA would ban them. So despite all those bad influences
outside the house, none of the kids smoke and we all detest it.
As a child, when I was offered a puff (by an "uncle" who since died
of emphezyma), I exhaled to make the tip glow (every 8-year old is
a pyro).  Like Clinton (or not), in never occured to me that I was
supposed to inhale and ingest the smoke....

To this day I tell my clients that, if they love their computers,
they shouldn't smoke around them.

Which brings us back to the subject (whew!). Instead of scrubbing
the computers clean, can't se smoke them dirty?


#260 of 547 by aruba on Fri May 16 17:37:23 2003:

I installed the SCSI disks today - the controller and fdisk recognized them
right away, with no problems.  I'm running surface scans now.


#261 of 547 by jhudson on Fri May 16 18:36:11 2003:

good, can't beat w98 scandisk for that (though why I don't know)


#262 of 547 by gull on Sat May 17 00:35:27 2003:

Re #253: I've written disk recovery software, too, for retrieving
*deleted* stuff.  But that's different than recovering data from a disk
that's been wiped, because the electronics in the drive are unable to
read the data at that point.  Recovery is then a matter of removing the
platters in a clean room environment and using sophisticated equipment
to analyze the magnetic patterns left on them.  I *seriously* doubt
anyone is going to go to that expense with a home disk from Grex so they
can read a bunch of archived spam mail and obsolete copies of eggdrop. ;)


#263 of 547 by aruba on Sat May 17 00:43:31 2003:

I ran surface scans of all four of our disks (3 SCSI and one IDE), and there
were no bad sectors on any of them.


#264 of 547 by gull on Sat May 17 01:15:17 2003:

Good deal.

I haven't found bad sectors on a new disk in a long time.  Modern drives
have "spare" sectors they remap to hide bad ones, so by the time you
start seeing bad sectors on a disk it's already pretty sick, and
probably has been getting worse for quite a while.


#265 of 547 by janc on Sat May 17 01:42:32 2003:

Excellent, Mark.


#266 of 547 by cross on Sat May 17 05:27:34 2003:

Hooray!  So am I correct in understanding that all the hardware has
now been acquired and installed, and we're ready to go with configuring
NextGrex?


#267 of 547 by aruba on Sat May 17 21:18:37 2003:

Yup, that's correct.  All the hardware is in one box, and everything works.


#268 of 547 by cross on Sat May 17 21:46:16 2003:

Excellent....  Eeexxxccellet.  Proceed with the next phase of
the...operation...number two.


#269 of 547 by aruba on Sat May 17 22:16:32 2003:

BTW, I know we don't need them, but it looks like our OpenBSD CDs shipped
last Tuesday.  At least our credit card was charged then - I haven't
received a confirmation email.


#270 of 547 by valerie on Sun May 18 22:41:47 2003:

This response has been erased.



#271 of 547 by jep on Mon May 19 03:04:07 2003:

re the responses to resp:246: I'm not going to embarrass anyone by 
posting the messages I got in response to my e-mail to staff, and I 
think some were humorously intended anyway.  It took me 18 minutes to 
get an e-mailed response with the address of Grex.

I was asked to not post the address.  With all due respect, I think 
it's not a meaningful request.  However, I do have a lot of respect 
for the Grex staff and since they asked, I won't post it.

Certainly, anyone wishing to obtain anything from Grex could do so 
much more easily and certainly than by trying to recover data from a 
formatted hard disk.  Eric's concerns were well-intended but not a 
realistic concern.

If anyone really wants anything from the old Grex hard disks, enough 
to use data recovery techniques, Grex might entertain competitive 
bids.  If they gave Grex half the money they'd otherwise have to 
spend, there would never again be a financial crisis.  The staff could 
be paid, the Board could be paid, and if I were given an appropriate 
commission for coming up with the idea, I would never have another 
financial concern.


#272 of 547 by cross on Mon May 19 04:16:31 2003:

Yeah, true.  Also, Joe earlier posted some odds for getting data off
of a scrubbed disk.  He made it clear he was doing so without real
data, but I want to get those numbers closer to reality; he said
something like 15% odds of getting something good for a casual attacker,
30% for a determined attacker.  However, in both cases, the attacker
isn't expending a lot of financial resources; only time and clevernes.
No access to fancy clean-rooms or the like.  In that case, I'd give
a casual attacker something like 0.00001%, and a determined attacker
something like 0.0001%.  Note that determination gives you an order
of magnitude advantage.

Those numbers are probably conservative; real numbers are probably
a lot closer to zero.


#273 of 547 by janc on Mon May 19 04:38:02 2003:

Well, we had another Next Grex Meeting, somewhat sparsely attended - Mark
Conger, Joe Gelinas, John Remmers, Valerie Mates, Jan Wolter.  Steves Weiss
and Andre' called to say they couldn't make it.  Mark brought along the Next
Grex, which we set up temporarily and tried booting off an OpenBSD boot floppy
that John had brought along.  Mostly looked good, but although it found the
SCSI controller, it did not find the SCSI drives.  So we went about the proper
businesses of sitting around and talking about the computer.

After everyone left, I moved the machine down to my office, where it can be
plugged into the LAN and tried various things.  Second thing I tried was one
of the other OpenBSD boot floppies.  There are three in the distribution. 
One for standard systems, one with extra drivers for SCSI and raid and gigabit
ethernet, one for laptops.  John's was the second one, the one with SCSI
stuff.  I tried the standard one, floppy33.fs, and that found the SCSI drives
without a problem.

I have volunteered to take a first cut at partitioning the drives and
installing OpenBSD.  When I've done that, I'll get it on my LAN, open an
SSH portal to in through my firewall, and advertize it to staff.

I'm currently scratching my head of partitioning options, having had somewhat
less input from others than I would like, but I'll do something plausible.
If it stinks, we'll redo it.


#274 of 547 by aruba on Mon May 19 04:44:26 2003:

Thanks Jan - enjoy those seven fans. :)


#275 of 547 by cross on Mon May 19 05:56:15 2003:

Suggestions regarding partitioning.  This is what I would do:

        (a) Use RAIDframe across all the SCSI disks.  Partition them
            thusly:

                32MB    sd[0-2]a        (The second-stage bootstrap and kernel)
                1024MB  sd[0-2]b        (Swap, striped across 3 disks)
                rest    sd[0-2]d        (Everything else)

            Set up the RAIDframe partitions on sd[0-2]d.
            Use RAID-5 with an interleave size of 64KB
            (the size of an FFS1 ``extent'').

        (b) Configure the following filesystems.  You'll have to use
            disklabel, but it's not particularly hard:

                512MB   /               raid0a
                2048MB  /usr            raid0d
                4096MB  /usr/local      raid0e
                4096MB  /var            raid0f
                4096MB  /grex           raid0g
                512MB   /tmp            raid0h
                rest    /u              raid0i
                80GB    /scratch        wd0d

            (Yes, OpenBSD supports disklabels with more than 8
            partitions on them....)

        (c) Put mail in $home/Mailbox; that does away with the need
            for a seperate /var/mail partition.

        (d) Merge /suidbin into /.  / in 4.4BSD doesn't contain nearly
            as much ``non-system'' stuff as did / in 4.3BSD and prior
            versions.  It's easiest to think of it as a ``system''
            partition with a minimum of non-system related stuff in it;
            having suid tools in /, if one restricts it to system purposes,
            is just fine.  In this configuration / can remain the only
            partition that has suid binaries on it, and it can remain
            writable.  This has some advantages: (i) All the suid tools
            are available in single user mode.  (ii) It's writable, which
            means that a bug in a suid program can be quickly corrected
            by staff.  (iii) It cleanly keeps all ``system'' related
            files in one place.

        (e) Create symbolic links from /usr/src and /usr/obj to
            /var/src and /var/obj, respectively.  Also, create a
            /var/local hierarchy.  Create symbolic links from
            writable places in /usr/local to /var/local.  By doing
            this, you're able to (i) make both /usr and /usr/local
            read-only most of the time, while (ii) retaining the
            ability to keep the system sources up to date.

        (f) Move the BBS and associated files into /grex; this is
            the place for grex-specific software.  Party, the BBS,
            etc can go in there.

I'd further do the following:

        (a) Split /suidbin into /suid/bin and /suid/sbin.  This breaks
            up functionality a bit; user-software that is used by
            general users can go into /suid/bin.  Sysadmin stuff goes
            into /suid/sbin.

        (b) Create /local; put local stuff that's useful in single user
            mode in here.  Ie, Kerberos, Kerberized sudo, SSH, maybe a
            shell or something, etc.

        (c) Remove some of the goofy symlinks from /; why is /b a symlink
            to bbs?

        (d) Change the startup scripts to newfs /tmp every time the
            system boots.

The biggest changes here are putting more stuff in /, and doing away with
/var/mail.  The latter is for security and convenience; the former is
purely for convenience.

Oh, a node on the difference between sd[0-2]a and raid0a.  The OpenBSD
RAID software can get its root filesystem from raid, but cannot read the
kernel upon boot out of a RAID partition.  sd0a would be a *really*
small partition, mirrored on sd1a and sd2a as well, that contains the
second-stage bootstrap and the kernel to boot from.  A copy of the
exact same kernel would be in /; once the system started booting, it
would be transparent.  The system can be booted off of any disk, and
the loss of any single drive wouldn't impact grex much.  It could be
swapped out and the parity rebuilt while the system was operating.


#276 of 547 by janc on Mon May 19 12:52:46 2003:

Dan - Thanks.  Looks like I'm going to have to read up more on RAID.

One note of possible concern:  It's not clear how well our SCSI controller
is supported by OpenBSD 3.3.  We have an Adaptec 29160 controller which
apparantly stars the 7899G chipset.  In the OpenBSD 3.3 file INSTALL.i386
it says the following in the list of supported hardware:

  Adaptec AIC-789[29] chips and products like the
    AHA-29160 based upon it which do 160MB/sec SCSI. [C]
    (However, the 7899G card is currently not supported with
     more than one device attached)

Well, we have more than one device on our card.  Web searches show lots of
messages from people who had problems with OpenBSD and the 29160.  For some
of them, using only one device on the controller worked fine.  These messages
generally had followups saying that these problems were fixed in later
releases.  This is a fairly popular card, and you'd expect getting it to
work would have been a priority for someone.  However, as you see, they
didn't delete the note saying it didn't work with multiple devices from the
install document.

I'm guessing/hoping that this is just a reflection of their poor document
maintenance.  Though the 386 install document seems to me to be the single
most important document to maintain, it seems to be fraught with errors,
mainly places where it appears not to have been updated when the software
was.  In the section quoted above, for example, the [C] indicates that the
driver is not included on install floppy C.  However, it appeared not to work
on install floppy B either, and there is no [B] there.

Hmm...raid support is on floppy C, our SCSI driver support is on B.  Could
be an annoyance if we try to build a RAID system.


#277 of 547 by cross on Mon May 19 13:26:51 2003:

Sure thing, Jan.  I'm not sure what to say about the 7899G based card,
other than, ``try it and see if it works.''  I do see that a few
people on various mailing lists say things like, ``I beat the living
snot out of a 7899G with 20 drives on it and it worked just fine.''
(http://archives.neohapsis.com/archives/openbsd/2002-12/0019.html).

It seems they probably cut and pasted the supported hardware list from
the architecture specific hardware web page into the install document;
I'd suggest that you're correct in guessing it's an artifact of a less
than perfect document update process.

btw- Just because it's absent from the floppy doesn't mean it's absent
from the CD boot media.  For instance, the drivers for both the SCSI
card and RAID should be on ftp://ftp.openbsd.org/pub/OpenBSD/3.3/cdrom33.fs



#278 of 547 by janc on Mon May 19 14:00:07 2003:

OK, just starting to read about RAID.  Looks like the Promise RAID controller
on the motherboard is for IDE only (and may not be supported by OpenBSD
anyway) so we are talking about software RAID, which in the case of OpenBSD
is RAIDframe.   Apparantly OpenBSD 3.1 and later do support having the
root partition mirrored on RAID.

RAIDframe supports RAID levels 0, 1, 4 and 5 and miscellaneous other things.
It's not in the generic kernal.  We'd need to rebuild with it.  There are a
huge number of options here, just beginning with the question of which RAID
level to use.

My feeling is that RAID is a sensible answer for Grex.  It can win us
performance gains and added data security in the case of a disk crash.
It wastes a lot of disk space, but we have the space to waste.  It does
not protect us from someone accidentally deleting files, so it is no
substitute for backups.

However, going the RAID route means (1) spending some time weighing which
RAID configuration (if any) is right for Grex, and (2) spending some time
getting it all set up.  Doing this right is going to require a lot of time
and a lot of staff members in the loop.  I don't want to stall bringing
the system on line for code porting and other development while we do
this.  Maybe I should do an OpenBSD install onto the IDE disk.  We can
work there, build a raid kernal, configure RAID on the SCSI disks then
boot off that.

This may be the best choice right now.  It gets the system to a state where
I can do what I know a lot about (software), and defers the decisions about
disk setup a little longer to give other staff time to chime in if they want.
If I'm going to implement Dan's plan, then I'll need to do it in two stages
anyway, since it doesn't look like you can get a RAID system straight off the
CD.

The minus with doing the OpenBSD install on the IDE disk is that it doesn't
give the three SCSI drives on the 29160 controller a good work-out, and that's
questionable enough to be worth beating on.  However, I can create some
temporary partitions there, and start a program reading/writing stuff to them,
just to give them a workout.


#279 of 547 by cross on Mon May 19 15:05:02 2003:

The last time I did an OpenBSD install into RAIDframe, I think I did
something like the following:

        (1) Installed onto a single drive.
        (2) Recompiled the kernel and tested it.
        (3) Booted single-user.
        (4) Dumped /usr, /usr/local, /var and all the
            partitions I was RAID'ing to temp space
            somewhere.
        (5) Reclaimed the space of all the partitions
            I wanted to RAID'ify into one big partition
            using disklabel.
        (6) Configured and started up RAID.
        (7) Edited the RAID set disklabel and set up my
            partitions.
        (8) Rebuilt the new RAID set's parity (which went
            surprisingly quickly).
        (9) newfs'ed the new partitions and mounted them.
        (10)restored the earlier dumps to the new, RAIDed
            partitions.

This is slightly more complex, but I think you could do something similar
to get RAID working on the SCSI disks.  Certainly, installing onto the
IDE disk gives you the manueverability to bootstrap the SCSI drives.
Of course, that doesn't resolve the issue of deciding on an optimal
configuration.

Some more suggestions: Use RAID level 5.  You only have three disks;
if you had four, I'd suggest using 1+0, that's out.  Anyway, RAID 5 will
give decent performance (particularly if coupled with soft updates on all
partitions), will protect against dropping a disk, and won't waste *too*
much disk space.  With only three disks, you don't have much else in the
way of choices for RAID levels.  Striping won't but you any reliability,
RAID 4 is just dumb, and you don't have enough disk for mirroring.

The last real question is how big to set the interleaves.  I'd say 64KB,
and the reason for that is that 4.4BSD's FFS implementation defines a
weak concept of an ``extent''; basically, it'll try to read or write up
to 64KB in a single burst from/to the disk, if it can.  A 64KB interleave
size matches up with that idea of an extent as used by the filesystem,
and should give pretty good performance.


#280 of 547 by cross on Mon May 19 15:06:09 2003:

Oh, PS- Didn't remmers donate an OpenBSD machine that could be used
for software porting and things of that nature, leaving time to get
the nextgrex configuration right?


#281 of 547 by janc on Mon May 19 16:09:07 2003:

Yup, he did.

Well, ran into another snag in the OpenBSD install.  OpenBSD can't find
a network interface.

During the boot up, when it is polling the PCI bus, it lists:

  Broadcom BCM5702X rev 0x02 at pci0 dev 8 function 0 not configured.

That means it sees it, but doesn't have a driver for it.  On th list of
supported hardware it says:

  # Broadcom BCM570x (a.k.a. Tigon3) based PCI adapters (bge): (A) (B) (C)

The (A) (B) (C) business means that the driver isn't on any of the install
floppies.

I think this means I need the CD to do the install.  I can't very easily do
an ftp install without a network driver.


#282 of 547 by cross on Mon May 19 18:38:15 2003:

Wait; if you burn a CD with the CD-ROM boot image on it, does that have
the driver?  You should be able to boot with it and perform an installation
from there.


#283 of 547 by cross on Mon May 19 18:39:50 2003:

(PS- to Clarify.  The CD boot image is different from the OpenBSD CD
distribution, and can be downloaded from the OpenBSD web site.  Given
that there's a CD burner in Nextgrex, it shouldn't be hard to do.  The
URL for the CD-ROM image is: ftp://ftp.openbsd.org/pub/OpenBSD/3.3/cdrom33.f
s


#284 of 547 by janc on Mon May 19 19:06:01 2003:

Right.  Unfortunately, I blew away the Windows98 install Mark did on Next
Grex, so I'd need to first install something on NextGrex that can fetch
that file over the network and can burn a CD.  I have a number of different
old OS's on CDs that I could try, but none are painless.  I don't have another
computer with a CD burner.

The real live OpenBSD 3.3 bsd was shipped from Alberta on Tuesday, apparantly.
It should arrive in the next few days.  There are also probably lots of people
who could make me a CD with that file on it.  Either of these two paths seem
much easier than reinstalling Windows98 on NextGrex.


#285 of 547 by aruba on Mon May 19 19:20:33 2003:

We don't know for sure if our CDs shipped Tuesday, only that our credit card
was charged then.  I sent mail to the shipping guy at openbsd.org to ask if
they really did ship.

But anyway, dang offered to make Jan a CD, and he'll bring it over tonight.


#286 of 547 by remmers on Mon May 19 22:24:33 2003:

Re #280:  My machine is still online and available to any staffer who
wants access.  It's currently running OpenBSD 3.2.  If it's going to be
used to test out software, I should upgrade it to 3.3.  If I have time
to do that in the next couple of days I will, but to be honest free time
is in somewhat short supply this week.  I'll see how it goes.  Dang
installed a CVS server on it, the idea being to use that to document
our work.  The CVS server hasn't been used yet, and nothing much else
has been done with the machine yet either, so it might not be too
unreasonable to use the OpenBSD CDs, when they arrive, to install 3.3
from scratch on my machine, then ask dang politely to re-install the
CVS server...


#287 of 547 by janc on Tue May 20 00:49:17 2003:

Turns out Valerie can burn CD's.  I should have known that.  So I've got a
working boot CD now.

I tried logging into John's machine and failed.  I should give him a call and
see what I've got wrong.


#288 of 547 by janc on Tue May 20 01:54:52 2003:

Hmmm.  Got it installed on the disk, but boot from the disk seems to be
hanging when the kernel tries to initialize the audio drivers.  I'll
investigate more later tonight and report back.


#289 of 547 by aruba on Tue May 20 03:11:07 2003:

I had a little trouble with the audio in Windows 98, actually.  It mostly
worked, but occasionally produced static when it should have been playing a
sound. I figured it was because I needed a different version of the driver,
or it needed to be reinstalled.


#290 of 547 by other on Tue May 20 03:22:54 2003:

Why are we worrying about making the audio drivers on the next Grex 
machine work?


#291 of 547 by janc on Tue May 20 03:31:17 2003:

OK, some details.  As the kernal starts up, it prints out lots of messages
describing the various devices.  When booting from the CD or floppy, it finds
the audio device, but doesn't have a driver for it (of course, since this is
a install disk and it doesn't need audio), so it says:

 "VIA VT8233 AC97 Audio" rev 0x50 at pci0 dev 17 function 5 not configured

When we boot from the hard disk, it finds the device, and has a driver, but
the driver seems to fail to initialize.  It types the following, and then
hangs forever with the cursor at the end of the line:

  auvia0 at pci0 dev 17 function 5 "VIA VT8233 AC97 Audio" rev 0x50_

It should go on to finish the line by typing something like

  auvia0 at pci0 dev 17 function 5 "VIA VT8233 AC97 Audio" rev 0x50: irq 9

We never get the ": irq 9" part.  (It is IRQ 9, according to the bios).

One fix would be to build a kernal without the audio driver.  It's not like
Grex needs audio.  Any better ideas?


#292 of 547 by janc on Tue May 20 03:39:25 2003:

Eric slipped in.  I don't care very much about making them work.  Right now
they are keeping us from booting, which I do care about.  I'd slightly prefer
to know what is causing it to fail.  

We are going to have to do an OpenBSD install on this machine every year or
so.  We need to figure out how to do it smoothly.  It's worth a little effort
to find the *best* way to deal with problems, not just some workaround kludge.


#293 of 547 by cross on Tue May 20 03:42:41 2003:

Hmm; can you disable the onboard audio in the BIOS?  It sounds like
it's hanging in the probe routine; perhaps it's having difficulties
disambiguating the audio device from something else it might share
an interupt line with?  Maybe there's a bug allocating an IRQ?
Weird.


#294 of 547 by janc on Tue May 20 03:45:39 2003:

Found this:
  http://www.netsys.com/openbsd-misc/2003/01/msg00734.html

Appears to be someone having the same problem.


#295 of 547 by janc on Tue May 20 04:26:58 2003:

The discussion of this problem above didn't find any sensible solutions, so
I'm willing to just disable the device.  Looks like there are two ways to do
this without recompiling the the kernal.

http://www.openbsd.org/cgi-bin/man.cgi?query=boot_config&sektion=8&arch=i38
6

To make this work, you need to tell it to boot "/bsd -c" instead of /bsd.
However, it doesn't prompt for a kernal to boot, and I don't know how to
make it do so.

http://www.openbsd.org/cgi-bin/man.cgi?query=config&sektion=8&arch=i386

To make this work, I need a reasonably running system.  Booting off the
CD and mounting the / partition under /mnt doesn't do it.  The config program
is not on the install CD and the copy on the hard disk wants ld.so which it
can't seem to find while booted off the CD.  I might be able to figure out
how to make this work, if I was less sleepy.

Either way, I just need to do "disable auvia" and that should kill the
audio card.

I'm going to bed.


#296 of 547 by lk on Tue May 20 07:00:07 2003:

Maybe you'll dream up another solution, like disabling the audio in the
BIOS so OpenBSD will never see it in the first place...?

(I suppose it could be a jumper on the motherboard if not a BIOS option.)


#297 of 547 by janc on Tue May 20 11:24:23 2003:

Yeah Leeron!  There is a thing in the BIOS to disable the Audio Controller,
and with that turned off, we can boot.

I'd still like to know how to tell OpenBSD to boot off something else.
It seems to sometimes show a "boot>" prompt briefly, and if you type
something then it won't fill it in itself.  However, when I started
booting off the CD, typed "boot wd0a:/bsd -c" at the boot> prompt,
it went ahead and booted off the CD anyway.  Well, I'll get plenty more
chances to experiment with this.

In the true OpenBSD spirit, after all this work, it greets me by telling me
I'm an idiot:

    Don't login as root, use su

Root's the only account on the system, and that's a comma splice, you idiots.
Sorry, I have personality conflicts with OpenBSD.


#298 of 547 by cross on Tue May 20 12:49:38 2003:

Most of the world has personality conflicts with OpenBSD.  Hey, disabling
the audio device in the BIOS; I said that in #293!

Comma splices are bad, use semicolons.


#299 of 547 by cross on Tue May 20 12:51:52 2003:

So Jan, just so I can be sure I understand what's going on; a minimal
OpenBSD installation is on the IDE drive, and it's seeing all the
devices now?


#300 of 547 by janc on Tue May 20 14:45:37 2003:

Yup.  Staff has been informed, John Remmers has successfully logged in.
My next step is to write some little scripts to copy data around fiercely
on the three SCSI disks, just to increase my confidence that the controller
really is working right with multiple drives.

Yeah, you did say that didn't you?  I was way too sleepy last night.


#301 of 547 by remmers on Tue May 20 14:50:49 2003:

Yep, I logged in and created myself a "remmers" account.


#302 of 547 by janc on Tue May 20 15:42:42 2003:

Thought I'd so a survey of suid/sgid programs, many of which might have to
be moved if we do an /suidbin directory.  There's a number of them, but many
don't actually need to be SUID on Grex (the ones marked 'X' in the list
below should probably lose their suid bits or not be moved to suidbin).

SUID files:

 X  -r-sr-xr-x  1 root  bin       /sbin/ping
 X  -r-sr-xr-x  1 root  bin       /sbin/ping6
 X  -r-sr-x---  1 root  operator  /sbin/shutdown
    -r-sr-xr-x  3 root  bin       /usr/bin/chfn
    -r-sr-xr-x  3 root  bin       /usr/bin/chpass
    -r-sr-xr-x  3 root  bin       /usr/bin/chsh
 X  -r-sr-sr-x  1 root  daemon    /usr/bin/lpr
 X  -r-sr-sr-x  1 root  daemon    /usr/bin/lprm
    -r-sr-xr-x  1 root  bin       /usr/bin/passwd
 X  -r-sr-xr-x  1 root  bin       /usr/bin/rsh
    -r-sr-xr-x  1 root  bin       /usr/bin/su
    -r-sr-xr-x  1 root  bin       /usr/bin/sudo
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_chpass
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_krb4
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_krb4-or-pwd
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_krb5
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_krb5-or-pwd
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_lchpass
 ?  -r-sr-xr-x  1 root  auth      /usr/libexec/auth/login_passwd
    -r-sr-xr-x  1 root  bin       /usr/libexec/lockspool
    -r-sr-xr-x  1 root  bin       /usr/libexec/ssh-keysign
 ?  -r-sr-sr-x  1 root  authpf    /usr/sbin/authpf
 X  -r-sr-xr--  1 root  network   /usr/sbin/ppp
 X  -r-sr-xr--  1 root  network   /usr/sbin/pppd
 X  -r-sr-xr--  1 root  network   /usr/sbin/sliplogin
 X  -r-sr-xr-x  1 root  bin       /usr/sbin/timedc
 X  -r-sr-xr-x  1 root  bin       /usr/sbin/traceroute
 X  -r-sr-xr-x  1 root  bin       /usr/sbin/traceroute6

SGID files:

 X  -r-xr-sr-x  4 root  crontab   /usr/bin/at
 X  -r-xr-sr-x  4 root  crontab   /usr/bin/atq
 X  -r-xr-sr-x  4 root  crontab   /usr/bin/atrm
 X  -r-xr-sr-x  4 root  crontab   /usr/bin/batch
 X  -r-xr-sr-x  1 root  crontab   /usr/bin/crontab
 ?  -r-xr-sr-x  1 root  kmem      /usr/bin/fstat
    -r-xr-sr-x  1 root  auth      /usr/bin/lock
 X  -r-xr-sr-x  1 root  daemon    /usr/bin/lpq
 ?  -r-xr-sr-x  1 root  _lkm      /usr/bin/modstat
 ?  -r-xr-sr-x  1 root  kmem      /usr/bin/netstat
    -r-xr-sr-x  1 root  auth      /usr/bin/skeyaudit
    -r-xr-sr-x  1 root  auth      /usr/bin/skeyinfo
    -r-xr-sr-x  1 root  auth      /usr/bin/skeyinit
    -r-xr-sr-x  1 root  _sshagnt  /usr/bin/ssh-agent
    -r-xr-sr-x  1 root  kmem      /usr/bin/systat
    -r-xr-sr-x  1 root  kmem      /usr/bin/vmstat
    -r-xr-sr-x  1 root  tty       /usr/bin/wall
    -r-xr-sr-x  1 root  tty       /usr/bin/write
    -r-xr-sr-x  1 root  games     /usr/games/atc
    -r-xr-sr-x  1 root  games     /usr/games/battlestar
    -r-xr-sr-x  1 root  games     /usr/games/canfield
    -r-xr-sr-x  1 root  games     /usr/games/cfscores
    -r-xr-sr-x  1 root  games     /usr/games/cribbage
    -r-xr-sr-x  1 root  games     /usr/games/hack
    -r-xr-sr-x  1 root  games     /usr/games/robots
    -r-xr-sr-x  1 root  games     /usr/games/sail
    -r-xr-sr-x  1 root  games     /usr/games/snake
    -r-xr-sr-x  1 root  games     /usr/games/tetris
 ?  -r-xr-sr-x  4 root  _token    /usr/libexec/auth/login_activ
 ?  -r-xr-sr-x  4 root  _token    /usr/libexec/auth/login_crypto
 ?  -r-xr-sr-x  1 root  _radius   /usr/libexec/auth/login_radius
 ?  -r-xr-sr-x  1 root  auth      /usr/libexec/auth/login_skey
 ?  -r-xr-sr-x  4 root  _token    /usr/libexec/auth/login_snk
 ?  -r-xr-sr-x  4 root  _token    /usr/libexec/auth/login_token
    -r-xr-sr-x  1 root  smmsp     /usr/libexec/sendmail/sendmail
 X   -r-xr-sr-x  1 root  daemon    /usr/sbin/lpc
 X  -r-xr-s---  1 root  daemon    /usr/sbin/lpd
    -r-xr-sr-x  1 root  kmem      /usr/sbin/pstat


#303 of 547 by janc on Tue May 20 15:43:42 2003:

You know, this is the next Grex hardware item.  I should probably move
future comments to a "software" item.


#304 of 547 by janc on Tue May 20 15:46:23 2003:

I've got six processes busily copying files around on the SCSI drives.  No
problems at all yet.


#305 of 547 by polytarp on Tue May 20 18:14:15 2003:

You're great, janc.


#306 of 547 by cross on Tue May 20 21:31:06 2003:

Regarding #304; Cool.

Regarding #303; My suggestion is to leave most of the `normal' binaries
that aren't on / (only ping, ping6, and shutdown are) alone and put
copies in /suid/{s,}bin, then put those directories in the user PATH
before the system default directories.  As long as /usr and /usr/local
are mounted nosuid, it won't hurt anything if (mode & 06000) != 0 on
some files therein.  It also makes it easier than trying to find what
was symlinked where when the system is upgraded.

Some suggestions: there's no reason to keep pstat, vmstat, systat,
modstat, netstat, etc executable by normal users.  Definitely that's true
of fstat; that's a privacy violation waiting to happen.  ssh-keysign
can be restricted to members.  I don't see the need to strip the suid
bit off of shutdown, as that's only executable by root or users in
group operator, and there aren't likely to be many of the latter.
I would only put login_krb5 in /suid/sbin, as that's the only one
that's likely to be useful, if grex really moves to krb5.  I would
further disable the skey stuff; I doubt anyone uses or would use it.
Certainly, authpf doesn't need to be executable by normal users, either.
Wall shouldn't be available to normal users, I don't think.  If you go
with an alternate mailer like Postfix, then there's no reason to worry
about keeping sendmail sgid.  Finally, the games are only sgid to write
high score files.  Probably harmless, and I don't see why not to put
them in /suid/bin (or /suid/games, so they can be shut off easily if
there's a problem).  Also, going with Kerberos means that stock `passwd'
can be disabled.  /usr/bin/lock isn't likely to be useful if Kerberos
is used, either.

I wouldn't worry about ping and ping6 being suid since the only reason
they are is to send ICMP packets, and those would be blocked by the PF
rules preventing non-member users from sending random stuff out onto
the Internet.  It's conceivable someone could find another hole in it,
but I think it's highly unlikely.


#307 of 547 by gull on Tue May 20 22:07:21 2003:

Generally Grex doesn't let users use 'ping' at all, from what I've been
able to tell.


#308 of 547 by cross on Tue May 20 22:18:37 2003:

Yes; my point is that the PF filters will take care of that without
modifying anything in the base-system filesystem.


#309 of 547 by gull on Tue May 20 22:19:24 2003:

Ah, I see.  But if no one who doesn't have root privilages is allowed to
use it anyway, why keep the setuid bit?


#310 of 547 by cross on Tue May 20 22:21:51 2003:

So that you don't have to remember to turn it off the next time you
upgrade the system.  :-)  In general, it's more of a hassle to turn
off the setuid bit if it doesn't do anything than to ignore it.


#311 of 547 by lk on Tue May 20 22:25:09 2003:

Dan, Re#298, 293, and others:  Dreams can be like that. You never really
know who said what, if it was real...  but hey, it worked.

Nonetheless, since you've provided so much other helpful information and
I have not, I'm going to claim credit for this "fix".  Afterall, you
phrased it as a question. I said do it!  (:


#312 of 547 by cross on Tue May 20 22:37:25 2003:

Another thing....  Grex might have turned off ping to avoid the problem of
a malicious user using the `flood ping' -f option against another host.
This mode sends packets to a remote host as fast as it can; effectively
clogging the network link between the two.  On grex's slow connection,
this could clearly be a problem.

However, OpenBSD's version of ping checks that the real user ID is 0 (ie,
you're root) before allowing you to use the -f option for flood pinging.
Given that any program that wants to create an ICMP socket must be running
as root, and that the standard ping doesn't joe user flood ping anymore,
perhaps it'd be acceptible to stop restricting access to ping.  Still,
someone might be able to DoS grex by sending a ping request to some big
broadcast address, so maybe it's a good idea to keep restricting it.


#313 of 547 by cross on Tue May 20 22:38:54 2003:

Hrmph, Leeron!  :-)


#314 of 547 by janc on Wed May 21 00:19:39 2003:

I have no intention of "remembering" to turn off suid bits.  I'm for
documenting it, in this case in the form of a script that does it.

I'd turn off all the suid-root bits that don't need to be on (or leave them
on a nosuid partition where the suid-bit doesn't matter).  It's hard to
imagine a security hole turning up in 'ping', but anything is possible.

I'm much less inclined to be aggressive about the sgid scripts.


#315 of 547 by janc on Wed May 21 00:29:57 2003:

I'm not sure if you can use effectively use different RAID strategies on
different partitions without having different disk sets for them, but I'm
still thinking different RAID strategies make sense for different partitions.

I think /usr is another example where data redundancy seems of less value.
I you lose /usr, and you restore it from a month-old backup, you are probably
fine.  Just striping seems perfectly adequate for partitions like that.

Where RAID 5 pays off mostly is in places like /var, /bbs, /home.


#316 of 547 by cross on Wed May 21 01:22:03 2003:

Regarding #314; Well, if you keep / as the only partition that honors
the suid bit, then you only have to change permissions on two binaries:
ping and ping6 (I still say ignore shutdown, since only users in group
operator can run it, anyway).

Regarding #315; The thing is that if you lose /usr, the system is
unusable; similarly with /usr/local, /, etc.  RAID isn't just about
data security, it's also about availability.


#317 of 547 by janc on Wed May 21 03:05:52 2003:

Right, but that isn't Grex's highest priority.  We aren't amazon.com that
can't be off line for a few days without making headlines.  Heck, we currently
shut down for backups.  If a disk melts down, taking a few days to come back
up is no disaster, if we can do it without loss of data.

If I'd have been at the last board meeting, I'd have argued against the third
SCSI disk.  Grex doesn't need that much disk in the near future.  But, we've
got the disk, so we might as well use it.  I think the best use may be to do
a RAID setup and win a bit better performance and a bit more data security.

Right now I think the strongest argument against using RAID on Grex is the
KISS argument.  RAID certainly has benefits, but it adds complexity, and extra
complexity is always a minus.  Using RAID means one more potentially buggy
piece of software in a critical function.  It means one more complex subsystem
staff members need to understand, administer, and reinstall on every upgrade.

I think a sound argument could be made that the benefits aren't worth the
complexity.  Skip RAID.  Divide the partitions among the disks and hope the
loads balance out approximately.  rsync critical partitions to the IDE disk
frequently.  Remember to do backups.  We don't lose much by taking that
easier path, and it is significantly simpler to install and administer.

You could do ccd on some partitions, if you want the same performance benefits
(slightly more even) at a lower complexity level.


#318 of 547 by janc on Wed May 21 03:07:52 2003:

All reports about problems with multiple drives on our SCSI controller seemed
to be about really frequent ones.  I've had all three drives busy reading and
writing for all they are worth for a day now and have seen no problems at all.
So I think we can probably consider that problem solved.  I'll let them grind
for a bit longer though.


#319 of 547 by cross on Wed May 21 05:12:53 2003:

Regarding #317; Well, to me, splitting up partitions is more complex.
Maybe I'm smoking my hair, but it seems a lot simpler conceptually to
think of a RAID as one giant partition that you can chunk up as you
like, and the performance issues and load balancing are yours free.
You get some modicum of resistance to failures as a side benefit.

As for reliability....  RAIDframe has been in OpenBSD for several
years now.  It seems just as solid as FFS itself or even soft updates.
Could it go wrong?  Yeah, but there could also be bugs lurking in FFS.

Complexity of configuration is pretty simple.  One or two configuration
files, and you're basically good to go.  The only really annoying thing
is that you can't directly boot from it.  But, at the end of the day,
I'm not on grex staff and aren't challenged with keeping it running.
It seems simpler to me, and RAID-5 everywhere seems to fit grex like
a glove (especially if it's planned on a few partitions already), but
that's just me.


#320 of 547 by janc on Wed May 21 12:57:57 2003:

To some degree I'm arguing all sides of the question to make up for the lack
of people arguing.  But the large number of Grex staff who have no opinion
on RAID is a bit worrisome.

Administrative complexity hits at three points - first, right now - decide
which RAID setup to use, and implementing it.

Second, on each system upgrade, which we need to reinstall the kernel
customizations and config files.  Mostly we can document this and make
a step-by-step procedure that most anyone can follow.

Third, when a disk has a problem, or when we want to change the disk
configuration.  RAID can help with problems like this, but only if you know
what you are doing.  Doing the wrong thing can hose your data.

In a volunteer run system, the level of knowledge that may be on hand on
the day when a disk dies is unpredictable.  It may make sense to keep things
simple so lots of people feel like they can help.  This argument applies
equally to Kerberos.  Both confer modest benefits that I'm not sure we need,
at the cost of complexity that makes the size of the hump you have to get
over to become an effective system administrator substantially larger.
I fear they will reduce the size of the pool of potential system
administrators.

Or maybe they we make the system cooler and more interesting, thus attracting
more potential system administrators.

I think I may experiment with setting up RAID on Grex2003, just to get a better
feeling for the complexity.


#321 of 547 by gull on Wed May 21 13:09:16 2003:

I'm in favor of RAID because I think it has the potential to *reduce*
the amount of staff time needed for recovery if a disk fails.  Assuming
you don't have a multiple failure, recovery is reduced to a few steps,
presumably simple ones, although I'm not familiar with RAIDframe
specifically.  Generally there's some way to tell the RAID subsystem
you're going to offline a disk (this may have been done automatically if
the disk failed), then you'd shut down the system, swap the disks, then
boot and tell the RAID subsystem to rebuild the failed disk.  During all
of this except shutdown the RAID array is generally still usable, just
in a "degraded" mode.  (It will be slower.)  There are some RAID systems
where that isn't true, but I'd hope RAIDframe would have implemented
online recovery.

I think, if we have time, the concerns about complexity could be
addressed by developing a step-by-step disk failure recovery procedure
that any staff member could follow.  It shouldn't really be any more
complex than restoring from a backup, just different.

If you're going to use RAID, I think it's best to look at the RAID array
as one big disk, and not try to spread things out with different RAID
strategies for different partitions.  That seems unnecessarily complex
to me.


#322 of 547 by cross on Wed May 21 13:34:26 2003:

Regarding #321; I concur.  A barebones recovery plan should be developed
in any event, regardless of whether RAID is used.


#323 of 547 by janc on Wed May 21 15:34:31 2003:

Thanks David.  I definitely value input on this question.

I built two new kernels.  The first is simple GENERIC minus a mess of stuff
we don't need - mostly device drivers for devices we haven't got.  The second
is the same, but turns RAID on (and does various stuff to make sure SCSI
drives don't get renumbered when one fails).

I also pushed the "maxusers" parameter from 32 to 64.  Maxusers isn't really
the maximum number of users.  It's a voodoo number that is used to estimate
sizes for all sorts of system parameters, which can be fine-tuned separately
by editing lower level definitions.  I saw various posts by people who had
set it higher than 64 and got a warning message about that.  One seemed to
have some crashes after that and thought it might be related.  However, one
of these guys got no response that is in the archive, and the other was only
told that he was an idiot.  (These OpenBSD mailing list archives are such a
valuable resource.)  So for the moment I thought I'd set it to 64.  It'll be
easy enough to fine tune it later if we have problems with that setting.

The OpenBSD FAQ discourages building new kernels without a danged good reason,
threating lack of technical support for problems with non-generic kernels.
However, since their technical support is laughable anyway and and Marcus is
guaranteed to have changes to make to the kernel anyway, I decided we might
as well get started, even if we don't end up using RAID.

The stripped down GREX kernel is about half the size of the GENERIC kernel,
which is a plus, if not a great big one:

  -rw-r--r--  1 root  wheel  4579691 May 21 07:01 /bsd.generic
  -rwxr-xr-x  1 root  wheel  2719734 May 21 07:03 /bsd.new
  -rwxr-xr-x  1 root  wheel  3133519 May 21 06:59 /bsd.raid

It is currently running on the bsd.raid kernel, and that is the default.
I haven't however, set up any RAID array yet.

I've also now got a draft document on kernel building.


#324 of 547 by janc on Wed May 21 17:19:13 2003:

OK, I've created a RAID array on new Grex - just for experimental purposes
at this point.  First, I sliced up the three scsi disks into two partitions
each, each disk identically:

 sd0a:  20479825 blocks  = ~10 Gig
 sd0d:  15361127 blocks  =  ~7 Gig

The sd0a, sd1a, and sd2a partitions are clustered into a RAID5 array, with
just one partition, /dev/raid1a, on it (it can be sliced into smaller
partitions).  This is mounted as /raid.  The sd0d, sd1d, and sd2d partitions
are mounted as /sd0, /sd1 and /sd2 respectively.  My idea was that if we
want to do any benchmarks, this lets us access the same disks, with or without
raid.  All four partitions are rw-all so anyone with an account can create
stuff there and look at the stats.

df looks like this:

  Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
  /dev/sd0d     7438613        1  7066682     0%    /sd0
  /dev/sd1d     7438613        1  7066682     0%    /sd1
  /dev/sd2d     7438613        1  7066682     0%    /sd2
  /dev/raid1a  19852909        1 18860263     0%    /raid

Note that the available space (18.8 Gigs) is about 61% of the disk we put
into this (30 Gigs), most of the rest being used for parity, some of the
rest being eaten by filesystem overhead of various sorts.


#325 of 547 by gull on Wed May 21 17:33:11 2003:

Yeah, from what I've seen a lot of OpenBSDers are a bit elitist and
don't suffer newbies gladly.  It's an unfortunate attitude.


#326 of 547 by janc on Wed May 21 17:37:38 2003:

Hmmm...I'm trying to run the bonnie benchmark
(http://www.textuality.com/bonnie) on the raid disk, but I'm not sure it
will work.  Bonnie wants me to use a file size several times larger than main
memory.  Main memory is 1.5 Gig, so I told it to use 4 times that: 6144 Meg.
But the first thing it said is: 
   File './Bonnie.28521', size: -2147483648
Uh-oh.  Someone may be using signed longs for the file size.  If that's the
case, then the biggest file size I can use is probably around 2048 Meg, which
isn't several times the size of our memory.  Well, I'll let it run and see
what happens.


#327 of 547 by janc on Wed May 21 18:00:02 2003:

So, if we went with RAID, what would we do?

On the disks we'll have partitions

   sd0a   - pretty tiny.  A place to store kernels.  We'll boot from here.
   sd1a   - A copy of /dev/sd0a, so we can boot if sd0 dies
    
   sd0b   - swap partitions, one Gig each.  You can put swap on raid, but
   sd1b     it doesn't appear to be a great idea.  We'll trust OpenBSD to
   sd2b     balance swap load over the three spindels.

   sd0d   - the remainders of the disks, about 16 Gig each.
   sd1d
   sd2d

Now, sd0d, sd1d and sd2d will be clustered together into a RAID 5 array, called
raid0.  To all intents and purposes, this appears as a single big disk.  It
should come out at about 29 Gig, a more than adequate amount of space for all
of Grex's needs for a while.  Raid0 gets partitioned into all the various
partitions we need, with root on raid0a, usr on raid0d and so on.

The 80 Gig IDE disk doesn't participate in this.  We could put the boot
partition on this, but I'd want copies on two disks anyway, so we'll need at
least some non-raid partitions on the SCSI disks anyway, so let's leave
everything critical off the IDE.


#328 of 547 by cross on Wed May 21 18:38:48 2003:

Why not make sd2a a copy of sd0a as well?  It wouldn't hurt anything,
and might help, since each disk would be exactly like every other disk
in terms of how the partitions are layed out.  That makes partitioning
easy; you can keep a copy of the disklabel for one of the disks around
in a file somewhere, and just write it to a new disk with the disklabel
command if necessary.  Then, just plop the new disk in, tell RAIDframe
to rebuild it, and let it go on its merry way.


#329 of 547 by janc on Wed May 21 19:08:06 2003:

Probably would.  Actually, you don't even have to keep the layout in a file.
You can just copy it from one disk to another:

  disklabel sd0 | disklabel -R sd1 /dev/fd/0

That's how I built the current setup.


#330 of 547 by janc on Wed May 21 19:08:56 2003:

Bonnie croaked while doing some seeks.  Try it again with a smaller file to
see if that works better.


#331 of 547 by other on Wed May 21 19:53:17 2003:

I thought the IDE was mainly for a comprehensive backup of the boot 
partition plus storage for sources.


#332 of 547 by cross on Wed May 21 19:55:14 2003:

Yeah, you can do that, but if you also keep the disk label around in
another file, you can label the disk on any machine with a SCSI controller.
Does that matter?  I don't know.  It might be slightly more convenient.
It's just a nit, though; it's trivial to get a copy of the disklabel once
the machine's set up, and I doubt it would matter....


#333 of 547 by cross on Wed May 21 19:58:51 2003:

Oh yeah....  That's an idea.  Put /usr/src and /usr/obj on the IDE
drive, and then you don't have to do anything hacky with linking them
to /var as in my latest proposal.  /var can be decreased accordingly,
and more space allocated to /u.


#334 of 547 by janc on Wed May 21 20:09:47 2003:

OK, with a file size of 2000M, I get results from Bonnie.  The validity of
these results is, however, questionable, since a lot of the file may have been
in memory instead of on disk.

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU

raid 5   2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9  1.6
scsi     2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2  0.8

We have two lines of results.  The first was using the raid 5 array of three
SCSI disk.  The second was on a single plain ordinary SCSI disk.

For each test we have the speed and the % of CPU used.

There are three output tests:

  Per Char   -  file written sequentially with 2 billion calls to putc()
  Block      -  file written with block writes
  Rewrite    -  each block read, changed and rewritten

There are two input tests

  Per Char   - 2 billion calls to getc()
  Block      - block reads

And a seek test

  Seeks      - four child processes each execute 4000 seeks and reads.  After
               10% of these they change and rewrite the block.

So, on writing, RAID was 5 to 6 times slower.  Notice that the supposedly
optimum block writes were actually slower than the character writes for the
RAID.  The SCSI was twice as fast as RAID on the rewrite test.

On read the RAID array was still slower than the plain disks on the Per
Char reads, but a bit faster on the block reads.  It was substantially slower
on the seeks.

Admitting that the benchmark is seriously questionable due to the small size
of the file relative to the large size of memory, this is not at all an
impressive result.

I reran the tests and got similar results.

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU

raid 5   2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9  1.6
raid 5   2000  8745  6.4  7654  1.3  5717  2.2 51345 63.5 64022 14.6 150.0  1.1

scsi     2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2  0.8
scsi     2000 54058 43.4 54618 14.1 10129  2.8 60552 71.0 60865 11.1 203.4  0.9

I suppose the main advantage in performance is in balancing load among multiple
spindles, but this would really only be noticable if multiple processes were
reading/writing the disk at once.  With a single process, we aren't going to
gain much.  Only in the seek test are there multiple processes, and then only
four.


#335 of 547 by cross on Wed May 21 23:48:20 2003:

Are softupdates turned on on the raid filesystem?


#336 of 547 by janc on Thu May 22 02:31:54 2003:

No.  They are not even enabled in the kernel.  From what little I understand
of it, it improves performance only with respect to metadata updates -
updating inodes when files are created or destoryed.  That wouldn't effect
these benchmarks.  I don't get a clear feeling that it is super stable yet
either.


#337 of 547 by cross on Thu May 22 03:29:00 2003:

Every write and every read is also a metadata update (mtime and atime).
Soft updates are definitely stable at this point; they're enabled by
default in FreeBSD.  OpenBSD tends to be somewhat more conservative,
though.

Gads; security be damned.  Grex would've been better off with FreeBSD.


#338 of 547 by janc on Thu May 22 05:55:06 2003:

Well, I argued that.  I have the impression softupdates are more mature in
FreeBSD than OpenBSD.  It's not really clear though.


#339 of 547 by janc on Thu May 22 06:30:14 2003:

For the heck of it, I ran eight copies of the Bonnie benchmark simultaneously
on the RAID 5 partition.  Below, A through H were started simultaneously.
The last line is just one benchmark process running

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
A        2047  8087  5.9  5867  1.2   331  0.1  1647  2.0  1775  0.4  22.5 0.1
B        2047  1889  1.4  7770  1.4   545  0.2   890  0.9  1646  0.3  14.2 0.1
C        2047  1020  0.8  7038  1.2   417  0.1  1929  2.7  1578  0.3  19.9 0.2
D        2047  8647  6.3  7474  1.3   253  0.1  1905  2.4  4597  1.1  89.4 0.8
E        2047  3997  2.9  6946  1.2   215  0.1 23458 27.9 29250  6.4 155.5 1.4
F        2047  8314  6.2  7149  1.3   369  0.1  1333  1.6  1707  0.3  21.2 0.1
G        2047  8926  6.3  7899  1.4   512  0.2   865  1.1  1132  0.3  15.0 0.1
H        2047  4280  3.2  7861  1.3   458  0.1   954  1.2  1649  0.4  19.1 0.1

raid 5   2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9 1.6

They didn't stay well synchronized - you can tell that process E continued
running long after the others had finished (process scheduling doesn't seem
to be very fair).  The write speeds didn't suffer too badly from the
competition, but the read times took a terrific beating - they are mostly
around 1/25 of the speed of one process.  Note that there were probably some
write processes still running while the read processes were going.

Here's a more sensible test, a comparison against the SCSI and IDE drives,
in non-RAID configuration, with just one process running:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
scsi     2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2 0.8
ide      2000 27188 21.7 27038  6.9  9634  2.6 24889 29.9 25640  5.2  99.0 0.8

Seems the SCSI is about twice as fast on most benchmarks, and about the same
on the Rewrite test.


#340 of 547 by gull on Thu May 22 13:07:15 2003:

RAID 5 is always going to be slower than a single disk, especially using
software RAID.  There's more processing overhead, and you're doing a
third more reads/writes because of the parity.  Still, I'm surprised to
see it 5 times slower.  That doesn't seem very acceptable at all.


#341 of 547 by scott on Thu May 22 13:10:09 2003:

RAID would be nice, and if we're making such a huge jump in processing power
then I don't think the performance penalty (assuming it's only 2-1 or
something less) isn't an issue.


#342 of 547 by janc on Thu May 22 13:50:41 2003:

I'm beginning to suspect that some of these some of these fast read times are
coming out of buffers.  The drastic crash in read speed when I ran 8 bonnies
could because instead of trying to buffer one 2G file in 1.5G of memory, we
were trying to buffer a total of 16G of files in 1.5G of memory.  Some of
these really fast speed (the ones around 50M/sec) are likely being done
largely out of cache.  This makes the results pretty meaningless.

Anyway, I ran three simultaneous bonnies on a plain SCSI.  I couldn't run
8 becase I didn't have 16 Gig partition.

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
A        2047 15240 16.1 20747  5.0  2018  0.6  3882  4.5  9506  1.7 164.7 0.6
B        2047 16768 13.7 20491  5.3  3016  0.9  4543  5.3  5598  1.1  31.6 0.2
C        2047 16812 13.6 17945  4.6  2977  0.9  4145  4.9  4513  0.8  46.5 0.2

scsi     2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2 0.8

scsi/3   2000 17918 43.4 18035 14.1  3363  2.6 20108 70.9 20355 11.5  67.1 0.8

The last line is just the one-process SCSI values divided by three.  Notice
the write statistics for the three processes are all pretty close to one
third of the write statistics for a single process.  The reads are way lower.
Is this an artififact of buffering?  The seeks are a bit hard to tell, because
by that time the processes were pretty much out of synchronization.

The degradation in read performance is similar in magnitude to what we saw
on the raid (keeping in mind that we only have 3 processes instead of 8). 
I think there must be a buffering thing going on here.  The write statistics
are much better for the RAID - most of the 8 process wrote much faster than
1/8 of the single process.

Note that in both cases, the single processes read faster than they write,
while the multiple process write faster than they read.  That's just weird.


#343 of 547 by aruba on Thu May 22 13:59:04 2003:

Jan - can you fool the OS into thinking Grex has less memory than it really
does?  Or tell it not to cache disk reads?


#344 of 547 by janc on Thu May 22 14:31:43 2003:

Re #341:  We are certainly taking a huge jump in processing power, but the
disk I/O performance improvement, while good, probably isn't as spectacular.
Disk speeds just haven't been growing as fast as processor speeds, and old
Grex's disks aren't nearly as old as it's processor.  So the performance jump
in disk I/O from old Grex to new Grex might not be that huge.   (Maybe I
should run some benchmarks on old Grex to compare with - will everyone please
log off?).  I expect the new Grex will have memory to spare, cpu to spare,
disk space to spare, but maybe not disk bandwidth to spare (and certainly not
net bandwidth to spare).

I think the main benefits of RAID are:

  - Availability.  If a disk dies, the system can keep running.  Performance
    degrades, but it still works.  If you have a hot spare disk, it can
    be brought on line, replacing the dead disk, without interruption in
    service.

    I do not consider this very important to Grex.  We can afford short
    downtimes in the case of disaster.

  - Data Protection.  If a disk dies, the data on the drives is not lost.

    This is important to Grex.  However, it can be achieved other ways.
    We could do daily rsync's from /var, /bbs, /home, and /etc to the IDE
    drive (or even another machine).  You might copy certain critical files
    (/etc/passwd) more frequently.  This has a performance penalty, of course.
    In the case of a crash, your backup will not be fully up to date, so there
    will be some data lose, but it should be tolerable.  In the case of
    accidental (or deliberate) deletion of data, this gives you a much better
    safety net then RAID, so much so that we'll want to do at least some
    of this even if we have RAID.

  - Performance.  RAID can balance the load over the drives nicely.

    Yes, but so can ccd (pretty much equivalent to RAID 0).

So this doesn't really make a strong argument for RAID.  However, there is
a bit of a flaw in the above break-down.  These three aspects are not fully
separable.  Suppose we merge our three SCSI drives into one big virtual ccd
drive and parition it up.  Load balancing over the drives should be great.
Then one SCSI drive fails.  You just lost a third of your data, scattered
randomly all over the system.  Though you still have the other two thirds,
but doing anything with it is going to be a nightmare.  Effectively a single
drive failure cooks all your data, instead of 1/3 of your data.  I don't think
the performance improvement given by ccd or RAID 0 is worth the increased
risk of losing the whole system.

So I think the real alternative to RAID is what I originally proposed -
simple partitions, scattered across the drives in an ad hoc manner in hopes
of balancing the load across the spindles, with rsyncs to the IDE drive
for data protection.

I'm really starting to feel that might be the best choice.  The advantages
of RAID for Grex are faint enough so that they don't quite overwhelm the
KISS factor in my estimation.


#345 of 547 by janc on Thu May 22 14:33:48 2003:

Re 343:  probably - but I'm not sure how.  I thought of just creating a
RAMDISK and letting that eat up much of the memory (I could also run the
benchmark on a ramdisk, which might be interesting), but it looks like
you need to do a lot of kernal work to bring up a ramdisk, and I'm
insufficiently motivated.


#346 of 547 by cross on Thu May 22 14:37:44 2003:

One can lower the amount of memory the kernel will use for caching by 
mucking with the kernel.  It looks like, when caching is taken out of
the picture, performance between RAID and the straight SCSI disks is
more or less on par?


#347 of 547 by janc on Thu May 22 14:48:13 2003:

Hmmm...the faq (http://www.openbsd.org/faq/faq11.html) talks about the
BUFCACHEPERCENT kernel value.  It says the default is 5%.  I haven't touched
it, so if I'm reading this right, there should be 75M or less of disk cache.
Hmmm...Linux uses all free memory as disk cache.  A much nicer setup.

Well, if that's the case then I'm not sure what make those benchmark numbers
so goofy.


#348 of 547 by gull on Thu May 22 17:12:09 2003:

Re #344: I think that's starting to make sense, yes.  Unless it turns
out the performance hit you're seeing is an artifact of your testing
method, we may be better off going with using the disks "straight". 
Getting only 20% of the potential performance of the disk subsystem in
exchange for easier recovery on the rare occasions when we have disks
fail doesn't seem like a good tradeoff.  I'm still having trouble
believing RAIDframe is *that* inefficient, though.


#349 of 547 by cross on Thu May 22 18:20:45 2003:

So am I; it seems unreasonably slow, and it looks vaguely like the
numbers start to converge when you have many processes working at
once, which is the normal mode of operation.  I'd be interested in
seeing what a test simulating a timesharing load would be like.


#350 of 547 by lk on Fri May 23 03:37:09 2003:

One simple way to "fool" the kernel into thinking that NextGrex has less
memory is... to remove all but one memory module. Guaranteed to work. (:

You might also want to test mirroring. Might be more efficent (less CPU
utilization for striping and no extra parity data) while offering both
availability and redundancy.  The "cost" here is 50% drive overhead.
The boot disk, with the system partitions (and /tmp or was that IDE?)
could be one disk while the other pair could be mirrored.


#351 of 547 by janc on Fri May 23 13:49:36 2003:

I'm reluctant to take the machine appart for such purposes.  Anyway, I'm a
software guy.

I certainly agree that we need better benchmarks, but I'm not sure how to
obtain them.  Anyone with better ideas is welcome to suggest them.  Those of
you with accounts on the system can probably run them yourselves, as the
relevant disk partitions are permitted 777.  We really want to get some sense
of how RAID would effect a realistic multi-user load.

I tried running a benchmark with a really small file, one where you should be
getting lots of use from cache.  Here's the 50 MB and 2000 MB results.  Explain
this, if you will:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
raid 5     50 24882 20.9 22641  3.5  3731  1.2  7555  9.6 64346 14.7 511.7 3.1
raid 5   2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9 1.6

The small run has much faster output, and significantly faster seek times.
The block read is about as fast as the large file (suggesting that it is
mostly reading from buffer).  But what's going on with the per char read?

Note that the sequence of the tests is:

    Per Char Output
    Rewrite Output
    Block Output
    Per Char Input
    Block Input
    Seek

So it may be that the Per Char read was from disk, but left the entire file
in cache, so the block read was then very fast.  But why wouldn't it already
be in cache after the block output?  And why would the same speed be
achieved on the Block Read with the 2M file, which can't have all been
in cache.

I don't think I know enough about how buffering and disk I/O works in openBSD
to really interpret this stuff.


#352 of 547 by aruba on Fri May 23 14:06:36 2003:

The information on the Bonnie web page (http://www.textuality.com/bonnie/)
makes it sound like the tests are designed to correct for caching.  There's
some info there on how to interpret the results.


#353 of 547 by janc on Fri May 23 15:00:00 2003:

Maybe to help more people figure out what is being discussed here,
I should give a brief over view of RAID.

RAID stands for "Redundant Array of Inexpensive Disks" (the I-word
varies).  Some wrote a paper once upon a time survey various options for
putting a lot of small disks together, and named the variations RAID 1,
RAID 2, RAID 3, RAID 4, and RAID 5.  The RAID 0 name was coined later and
isn't really RAID.  The interesting ones are RAID 0, RAID 1 and RAID 5.
I'll also discuss RAID 4 because understanding it makes RAID 5 easier
to understand.

Suppose you needed a 100 Gig disk, and all you had was ten 10 Gig disks.
Well, you could put them all together in a box, and write a little
controller that would write the first 10 Gig to disk one, the next 10
Gig to disk two and so on.  To the computer, your box would look like
a single disk.

The performance of this disk array wouldn't be so hot though.  Most
programs access file sequentially, so as the 100 Gig file was read,
we'd first have disk one very busy, while the other nine sit idle,then
disk two would be busy, and so forth.  It'd be nice to balance the load
among the disks.

Which brings us to RAID 0 - also known as striping.  We slice the disks
into 32K chunks.  As you write a big file to the disk, the first 32K
goes to disk one, the second 32K to disk two, on through the tenth
32K chunk going to disk ten.  That completes a stripe.  The eleventh
32K chunk goes to disk one again.  This balances the load over all ten
disks, so you get better performance.  You can vary the chunk size for
different applications.

So RAID 0 gets you a large virtual disk and balances load over your
drives.  It doesn't give you any increase in reliability.  Quite the
contrary.  If a drive dies, than instead of losing a 10Gig hunk of data,
you lose lots of 32K hunks of data scattered through all your data.
This is probably harder to restore.

Load balancing over multiple spindles would be nice for Grex, but not
vital.  We don't have just a single process reading the disk sequentially.
Increasing the difficulty of reconstructing the file system after a
disk crash is too high a cost to pay for slightly better load balancing.
I think we can rule RAID 0 out as an option.

There is no Reduncancy in RAID 0 (so it should be called "AID 0").
Real RAID starts with RAID 1 - also called "mirroring".  We are still
trying to make a virtual disk out of many real disks.  This time we'll
group our ten 10Gig disks into five pairs, disk 1A, 1B, 2A, 2B, etc.
Whenever we write data to disk 1A, we also write a copy of the same data
to the corresponding location on disk 1B.  The first obvious effect is
that our virtual disk only contains 50 Gig instead of 100 Gig.  But now,
if disk 1B dies, we have an up-to-the-nano-second backup copy.  We can
replace the disk 1B with a new disk, copy the contents of disk 1A onto
it, and be back up and running with no loss of data.

Ideally, in RAID 1, we'd do the writes to the two disks simultaneously,
so writing is no slower than reading.  (In software implementations of
RAID 1, this may not entirely work.)  On reads, we don't have to read
from both disk.  We just select the one that is less busy at the moment
and read from that.  So, we get decent performance and the capability
to survive a single drive failure, but at the cost of half our disk space.

I've heard of RAID 0+1, but not read much about it.  I assume it's just 
striping over the 5 pairs of mirrored disks in the example above.

RAID 4 is an attempt to get the same benefits as RAID 1, but with less
loss of disk space.  This time we call 9 of our disks "data disks" and
the other one a "parity disk".  Parity just means "even" or "odd".  The
129th bit stored on the parity disk depends on the values of the 129th
bit stored on the other nine drives.  If an odd number of those nine bits
are 1's, then a 1 is stored at that location on the parity disk.  If an
even number of them ar 1's then a 0 is stored at that location on the parity
disk.  In geek terms, the content of the parity disk is just a bit-wise
exclusive-OR of the contents of all the other drives.

Suppose a drive dies.  If it was the parity drive, we can just recompute its
value from the other drives.  But what if a data drive dies?  Well, we have
all the other drives and the parity drives.  So for each bit we have something
like:

   data1 data2 data3 data4 data5 data6 data7 data8 data9 parity
     1    0     1     X     0     1     0     0     1      1

The parity bit is 1, so we originally had an odd number of 1's on the
data disk.  There are 4 ones on the surviving drives, so the bit on the
dead drive must have been 1.  (In fact the dead drives contents are just
the bit-wise exclusive-OR of all the surviving data and parity drives, so
the reconstruction process for a dead data drive is identical to the 
reconstruction process for a dead parity drive).

So, this is cool.  We now have a virtual drive holding 90Gig of data, so
we've lost only 10% of our storage, and we can still reconstruct all the
data on any lost drive.

There are some additional performance costs though.  The first problem is
the parity drive.  Every time you write data to a drive, you have to update
the data on the parity drive.  So though data writing is split over nine
drives, parity writing is all on one drive, so that drive is nine times as
busy as the other drives.  It becomes a performance bottle neck.

The solution to this problem is RAID 5 - stripe the parity data over all the
drives.  For example, the parity data for the first 32K of all the drives
would be on drive 1, the parity for the second 32K of all the drives would
be on drive 2, and so on.  So there is no one parity drive and parity is
spread over all disks.  (Note that disk reconstruction doesn't change -
you still just exclusive-OR all the other drives to reconstruct get the
lost drive.)

There is a second performance hit in RAID 4 and 5 though.  Like RAID 1, every
write is to two drives - data to one drive and parity to another.  However,
before we can write the parity, we have to compute the parity, and that means
we need to read the corresponding data from the other eight data drives.  So
a simple write turns into 8 reads and 2 writes.

Also, in RAID 1, we were able to improve read performance by always reading
the data from the less busy drive of the two that had the data.  In RAID 4
and 5, the data is only one one drive, so we can only read it from that drive.
However, we like to assume the striping in RAID 5 will balance the load among
the drives pretty well anyway.

There are lots of hardware RAID devices that optimize this kind of thing, but
we can't afford them.  The option we are considering is software RAID, which
is implemented in the OpenBSD kernel by a program called RAIDframe.  It's
pretty solid and rather nice.  You can set up a RAID array, possibly with
spare drives.  If a drive fails, and there is a spare on-line, it will
automatically bring the spare on line, reconstruct the lost data and proceed
without interuption of service.  If there are no spares, it'll run with a
drive short (in RAID 5, any read from the dead drive is simulated by reading
from all the others and exclusive-Oring them).  This is all terrific if you
need a server up 24x7, which Grex doesn't really.

Note that the redundancy in RAID gives you some protection against single
disk failures (it's assumed that you do something before the second disk
dies).  It does not replace a backup.  If you accidentally delete the wrong
file, or a vandal breaks in and alters all your files, the RAID will give
you nice redundant copies of the altered files, not the original ones.
So RAID is not a subtitute for backups.  It's protection against hardware
failure and that's all.

RAID 0 can give you some performance enhancements by load balancing.  The
other versions of RAID are all likely to be slower than a non-RAID setup,
especially if implemented in software.  RAID 0 doesn't cost you any disk
space.  The other versions are going to eat up some of your disk space.
In our case, since we have 3 drives, RAID 1 doesn't quite work and RAID 5
would eat up 1/3 of our disk space.


#354 of 547 by janc on Fri May 23 15:03:44 2003:

OK, that wasn't so brief.  But writing it just make me more sure that RAID
isn't right for Grex.  The problem it is primarily designed to solve isn't
an important issue for Grex.

I may do some experimenting with rsync, and see if I can get a sense of how
expensive it would be to regularly rsync to the IDE disk.


#355 of 547 by gull on Fri May 23 15:43:07 2003:

Where I work, we use rsync to keep a mirror of about 50 gigs worth of
data.  We're doing it across the Internet, via a T1, as well.  It does
cause a fair amount of disk thrashing on both ends when it figures out
what files need to be transferred (very much like doing a 'find' across
the filesystem) but overall it seems very efficient.  It's worked well
for us.  My guess is the "expense" of doing an rsync to another local
disk a couple times a day is going to be pretty low, especially since
you're not transferring over a network and so won't need to involve ssh
or compression.


#356 of 547 by cross on Fri May 23 15:51:36 2003:

I ran a bench mark last night; one of my own design.  It's nothing really
fancy or scientific; I wrote it a few years ago to try and get a feel for
how various disk subsystems and filesystem times handled a load I thought
was fairly typical of timesharing style machines.  Basically, it just
copies a bunch of 32KB files all over the place.

Running on both the IDE and SCSI drives took about 4 seconds.  Running on
the RAID took around 80 seconds.

Something is wrong here; there's no reason RAIDframe should be *20 times*
slower than a `normal' filesystem, I just can't believe it's that bad.
Perhaps I'm wrong about the stripe size; maybe 64 is just to small.  Jan,
could you up it to 256 and see if that helps any?  I see at least one
post from someone who says they used an interleave size of 168 and got
decent performance, but 32 (and probably 64) was too small.


#357 of 547 by janc on Fri May 23 16:02:39 2003:

Right.  I installed rsync from the ports tree (I like the ports tree).

I then went to /sd0 (the test partition on the first scsi disk) and did
  time rsync -ax /usr .
This should copy the whole /usr partition from the IDE disk to the SCSI
disk (which is backwards from the direction we would be going) and give
me some statistics.  The /usr partition contains 664,632K of data.  The
result from 'time' was;

   12.0u 24.5s 4:28.34 13.6% 0+0k 161046+660947io 36pf+0w

So it took 4.5 minutes elapsed time, eating 13.6% of an otherwise idle CPU.
I than reran it.  In this case it should be checking the two copies against
each other, and copying over only what changed (little or nothing).  The
time result was:

  3.8u 3.5s 0:46.13 16.1% 0+0k 47000+1454io 1pf+0w

This took 45 seconds.

In real life we'd want the --delete option on the command, so files that
don't exist on the source are removed from the copy, but I didn't do it in
the test because I was paranoid about getting the arguments backward.  Even
so, we'd want our target partitions rather larger than the source partitions.
Maybe just one big target partition instead of separate ones corresponding
to the different source partitions, the whole thing readable only by root
and possibly unmounted when it isn't being updated.

Doing this a couple times a day seems a much lower impact way to data
reduncancy than RAID.

It'd be tempting to keep two copies of some partitions, and update them on
alternate days.  Dunno if that's necessary.

This is not a substitute for real backups to tape, of course.


#358 of 547 by janc on Fri May 23 16:04:33 2003:

Dan slipped in.  I'll try reconfiguring the RAID.


#359 of 547 by janc on Fri May 23 16:52:11 2003:

OK, I reconfigured it with a 256 K stripe size.  The current config file
is in /etc/raid1.conf.  Running bonnie now.


#360 of 547 by jep on Fri May 23 16:55:38 2003:

How much would a RAID controller cost?  I'm sure Jan is right that it'd 
cost too much, but if there are substantial benefits maybe the users 
would spring for some more money.

I'm not sure the benefits would be all that substantial in any case.  
We're going to brand new spiffy hardware and I expect that will already 
mean a big improvement in reliability.  Grex isn't unreliable even 
now.  But it seems like it'd be easier to discuss it now than after the 
new machine is in place and in use.


#361 of 547 by scg on Fri May 23 17:11:23 2003:

I want to dispute the claim that since Grex doesn't have to be up all the
time, the high availaibility provided by RAID isn't important.  Grex doesn't
pay anything for its staff time, but it is a scarce resource.  The difference
in staff time required to format a new disk and restore data to it, versus
just putting in a new disk and letting it happen automatically, is huge.

I too am curious about the costs of hardware RAID controllers.  It's been
years since I looked at such things, but given that they were widely
available three or four years ago, I'm surprised to hear the price hasn't
come down.


#362 of 547 by janc on Fri May 23 17:16:52 2003:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU


raid 64  2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9 1.6
raid 256 2000  9483  7.1  8768  1.9  5443  2.6 56017 67.9 70599 14.3 183.1 1.3
scsi     2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2 0.8

OK, the second line is RAID with stripe size of 256 kiB instead of 64 kiB.
Generally things are better, but not dramatically so.  (Doing
'raidtcl -sv raid1' confirms that it did get reconfigured.)

Generally, if you do a large number of small reads and writes to small files,
then a large stripe size is better, and if you read a smaller number of larger
files, a smaller stripe size is better.  Grex probably belongs on the larger
end of the spectrum.


#363 of 547 by janc on Fri May 23 17:26:02 2003:

Note that we have a hardware RAID controller on our motherboard, a "Promise"
device whose model number I've forgotten.  It works only with IDE drives and
is not supported by OpenBSD (they don't seem to think they are going to
support such things either).  So, there is a wide range of hardware RAID
controllers with different capabilities and prices.

Recovering from a disk crash certainly costs less staff time with RAID.  But
how often does it happen?  If you have a recent snapshot on another disk,
recovering from a disk crash isn't all that hard even without RAID.  Ammortize
the time difference over the low frequency with which it happens, and I don't
see much weight to that argument.


#364 of 547 by jep on Fri May 23 18:45:16 2003:

I did a quick search on RAID controllers, and saw prices in the mid-
several hundreds ($300-700).  I don't know anything about what value 
would be provided by the different types.  I am not in position to 
analyze the number and effects of disk hardware failures, either.  I'm 
only asking a question.


#365 of 547 by gull on Fri May 23 20:27:34 2003:

Also, OpenBSD's hardware support is pretty limited even compared to
other open-source operating systems, so you can't buy just any RAID
controller and expect it to work.


#366 of 547 by janc on Fri May 23 20:59:37 2003:

http://www.openbsd.com/i386.html#hardware includes a list of hardware RAID
controllers supported by OpenBSD.  Not thaat I think we should get one.


#367 of 547 by lk on Sat May 24 03:12:14 2003:

As jep said, you can get a decent RAID controller for about $400.
OpenBSD drivers, though, are another matter.

I think Grex needs to move forward. The 2nd guessing can continue for
years, but the hardware is already in place (perhaps there should have
been more discussion earlier). Keep in mind that what we're "bickering"
over is what may (or may not) be a little bit better than the alternative.

Having said that, what about my idea?!  Have one boot disk with all the
(rarely changing) system directories on it and then configure the other
two "data" disks as RAID 1 (mirroring).  It entails 50% disk "waste",
but shouldn't have the performance hit while retaining availability
and redundancy.

After all, we live in compromising times....   (:


#368 of 547 by jep on Sat May 24 04:09:16 2003:

I didn't have the impression I was holding anything up, or that anyone 
else was, either, with the questions about RAID.  Dan has been making 
what appear to be useful suggestions -- I can conclude that, if only 
that Jan has been accepting some of them.

As for my part, I think it's clear enough to everyone here that I 
shouldn't have any input about RAID.  I've never set up a RAID system.

If there's a choice for a staffer between doing anything about the new 
system, and answering one of my questions or comments, by all means, 
work on the system.  (As if I even have to say that.)  


#369 of 547 by i on Sat May 24 12:47:47 2003:

Back in janc's "Intro to RAID":
   RAID 5 turns a disk write into 2 reads & 2 writes.  Better than what
janc suggested that grex (with 3 disks, not 10) would face, but still
not good when (i believe) grex is doing plenty of writes.  (Is it?) 
   Good hardware RAID (with dedicated hardware to do parity calculations,
lots of private cache memory to reduce disk activity, etc.) could improve
this.  But disk space is cheap enough these days to make RAID 1 the way
to go if one wants redundancy in a "lots of writes" situation.  (At least
for our size & budget.)  RAID 1 is also considerably easier to do 
"acceptablely" in software, and great software RAID is obviously not a 
priority for OpenBSD. 
   If we're eager to avoid downtime, a spare hard drive's great to have. 
When a dead drive has you down or limping, there's often a huge downtime
difference between "have an identical, well-tested spare drive on hand"
and "rush to research suitable replacement models, where they might be
bought, costs, and lead times".  *Especially* since different generations
of SCSI hard drives sometimes fail to "play well together" in flakey,
intermittent ways.


#370 of 547 by cross on Sat May 24 16:03:22 2003:

Hmm.  It would appear that RAID5 performance is just unacceptably slow
with RAIDframe in OpenBSD.  Weird; I'd have thought it'd be better.  Oh
well, it's not the first time I've been wrong.

If a hardware RAID controller is $400, one would have to weigh the cost
of buying one of those versus bying another SCSI disk for $200 and using
raid 0+1 (mirroring, and striping over the mirrors).  That I am reasonably
confident would be fast.  Is it worth it for grex?  That's another matter.
I agree with scg that it is, but I'm not paying all the bills.

I disagree with Leeron that doing mirroring by itself is the way to go;
I think the price/performance ratio isn't worth it.


#371 of 547 by lk on Sat May 24 16:18:03 2003:

Sorry, jep, I didn't mean to imply that you (or others) were holding up
anything. I certainly have no idea what the implementation time frame is.
For all I know, Grex budgeted the next 3 months for such discusssion
before finalizing NewGrex and putting it on-line.  (:

There's a lot of worthy discussion here and many good suggestions.
But I do know how over-discussion can become negative on a BBS, and I don't
want to see that happen here.  Not to sound like the US Patent Officer of
125 years ago, I think all the constructive comments about RAID, with all
its pluses and minuses have been made. It's time to make a decision....

These are the points I'd consider:
(Note that whether RAID is useful for Grex almost becomes a moot point)

1. We have no RAID controller
(and I'm not impressed by the list supported by OpenBSD)

2. The software RAID-5 performance rules that out.

3. Software RAID-1 remains a possibility.
(At least Walter and I think so.)


#372 of 547 by janc on Sat May 24 23:04:40 2003:

Yeah, it occurred to me a little after I wrote my introduction to RAID that
there were more efficient ways to maintain parity on writing - you can read
the parity disk, and the value you are about to overwrite, and use those to
compute the new check sum.  So 2 reads and 2 writes suffices no matter how
many disks you have in a RAID 5 array.  So, Walter's correction is correct.

I'm not at all unhappy with this discussion.  I think we are still in a mode
of usefully exploring options and collecting data.  If I feel the discussion
is stagnating, I'll bring it to completion, by declaring a solution by fiat
if necessary, though I'd prefer to boil it down to a few options and get some
concensus among staff.  (If Marcus weren't out of town this month, I'd
probably call a staff meeting.  We'll need one after he's back in any case.)

I'm interested in Leeron's RAID 1 suggestion.  Two disks in RAID 1 and one
disk plain wastes 1/3 of our space, just as RAID 5 would have.  If RAID 1
performs substantially better than RAID 5, then this might be a viable option.
The performance is going to have to be pretty good to convince me that this
is better than the rsync option though.  However, I plan to rearrage two disks
into a RAID 1 array tonight, so we can benchmark that.

The other project I'm pursuing is improving my understanding of Grex's disk
usage patterns.  If you're reading this in coop, you may want to check out
Garage item 150 (I think) where I recently posted some statistics on old
Grex's disk usage.  Preliminary results seem to indicate that most of Grex's
disk usage is on the /var drive (which does not include /var/spool/mail). 
Apparantly what Grex does most of is logging.  More than half the disk
activity is there, and it is almost all writes, not reads.  I want to keep
investigating this.

I don't think we are in any special hurry to get the new Grex up, but I want
to keep the process in motion, not letting it stagnate or stall.  We are not
stalled.  Things are good.


#373 of 547 by janc on Sun May 25 03:08:21 2003:

OK, I've re-arranged the disks once again.  Now /sd0 is a plain filesystem
comprising of of SCSI disk 0, and /raid is a RAID 1 array consisting of SCSI
drives 1 and 2.

             -------Sequential Output-------- ---Sequential Input-- --Random--
             -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU

scsi    2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2 0.8
raid 1  2000 16651 13.5 19368  3.4 10702  3.6 61614 73.5 68343 14.5 197.9 1.5
raid 5  2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9 1.6

This is definately performing much better than RAID 5, but the writes are
still rather on the slow side.  (Though we do seem to be getting a slight
win on the READ side - looks like it is balancing reads across the two
disks well enough to get a moderate performance win over a single disk.)

Dan's multi-process benchmark might be worth trying.

Leeron's idea was to use RAID 1 for the more ephemeral partitions -
partitions where data changes rapidly, and restoring from a week-old
backup tape after a crash might be unsatistactory.  RAID 1 would provide
a full backup of that data.

So, the RAID might have /bbs, root (mainly for /etc/passwd), /var
(current log files).  The regular disk might have /usr, /usr/local, etc.
Dunno where users would go.

The problem is, that the partitions whose contents change a lot (and are
thus more interesting to keep a real-time mirror of) also tend to have
a lot of writes.  So putting a partition like /var, which is almost
write-only, on RAID would be pretty unattractive from a performance
point of view.  So there's a bit of a paradox here - RAID's advantage
over rsync is greatest when writes are frequent, but it's performance
suffers most under those circumstances.

The one partition where RAID 1 looks good to me right now is /bbs.
More reading than writing certainly happens there, but there is enough
writing so that keeping a mirror would be nice.  I guess user partitions
would be a possibility for RAID too.


#374 of 547 by cross on Sun May 25 04:03:59 2003:

Hmm, I don't know.  The more we look at the performance numbers, the
less and less impressed I am, to the point of actually being really
disappointed in RAIDframe.  It almost doesn't seem worth it.

Doing something like RAID 1+0 might be better, but would require another
disk to really be useful.

I'm not sure I think doing RAID on one partition alone is really worth
it, the rationale being that if a disk dies, without using RAID everywhere,
you have to do a lot of work to bring it back online.  Doing the same work
with one or two partitions more doesn't seem like that much of an added
incremental cost.  That doesn't solve the problem of lost data, though.
One solution to that would be to leave a tape in the tape drive all the
time, and do a nightly full backup of /bbs and the user partitions (just
overwrite the tape).  Every now and then, do a full backup of everything
on a seperate tape and keep it for posterity.


#375 of 547 by cross on Sun May 25 04:15:03 2003:

FYI, I logged back into the nextgrex machine and re-ran my simple
benchmark.  The one that took 81 seconds on the RAID5 partition (using an
interleave size of 64; I didn't get a chance to try it on the one with an
interleave size of 256) took about 5 and a quarter seconds on average.
Almost a 20 fold speed increase.  Bonnie shows that performance on
a mirror is about 1/3 that of a straight disk.  With another disk,
I'd champion using RAID 1+0, as I'm guessing that would be in the same
general area performance wise as `normal' partitions, while still giving
high availability.  It'd cost another $200 to get another disk to do
it, though.


#376 of 547 by janc on Sun May 25 13:51:58 2003:

Yeah, I think later today I'll make another pass at designing a
RAID-less partition scheme.  This has all been very educational, and
RAID 1 is almost good enough to use, but I don't feel it is quite good
enough.


#377 of 547 by scg on Tue May 27 18:51:34 2003:

I should note that I'm not pushing hard for using RAID.  My impression has
been that RAID is a good thing, all other things being equal, but I don't
know enough about to make a good choice.

What I would object to, and what it seemed to me was being advocated in some
of the earlier arguments, is designing for low availability.  There are all
sorts of things it makes sense to design for in various situations, such as
low cost, low maintenance, high performance, high availability, and so forth,
and declaring one of those to be a high priority generally involves tradeoffs
in other areas.  If cost or performance are determined to be more important
than high availability, I might agree and I certainly wouldn't argue.  It's
not doing RAID purely for the sake of not needing high availability that I
was objecting to, and it doesn't sound to me like that's what's going on here
anymore.


#378 of 547 by janc on Tue May 27 20:35:54 2003:

No, that certainly isn't my thinking here.  RAID costs a lot of disk space,
which we can probably afford.  RAID, at least as implemented in software under
OpenBSD, seems to have a pretty huge performance penalty.  Much bigger than
it theoretically should have.  High availability *IS* the benefit of RAID.
(RAID 0 and well-implemented RAID 1 might give you performance benefits, but
in most version of RAID other overhead will eat any performance benefit). 
I like RAID, but my feeling is that high availability isn't important enough
to Grex to justify its other costs.

I may be wrong.  It may be that this new computer is going to be so fast,
that running Grex will hardly load it, and the performance cost of RAID
wouldn't mean anything to it.  If so, we should consider moving onto RAID
in the future.  I don't think making that change later will be hard.  We
need to rebuild the system every year and a half anyway, and changing the
disks from flat disks to RAID does not have broad implications for the
rest of the system configuration.


#379 of 547 by lk on Wed May 28 04:38:45 2003:

With all due respect, check out the speed of M-Net these days (arbornet.org).
I'm not up-to-date on the hardware specs, but I'd assume it's running on
a CPU that is 1/3rd to 1/4 the horsepower and slower drives.


#380 of 547 by cross on Wed May 28 06:00:12 2003:

Hmm, I use mnet...every couple of days or so.  It's usually quite
fast, but I don't believe they're using RAID.  What's more, they're
running FreeBSD, which has a different RAID implementation yet again.
Leeron, what are you refering to that one should note in terms of mnet's
performance?


#381 of 547 by janc on Wed May 28 14:42:12 2003:

I haven't been on M-Net for a while - but it also generally had fewer users
than Grex.

Generally I expect that the new Grex will be way too fast for the load the
current user base will put on it.  However, the user base may grow with better
performance.  Also we will be turning on quotas, which is going to put some
drag on the disk performance - that's a lot more important to me than RAID.
Also Grex occasionally gets hit by vandals - I just spent some time tracking
down a mailbomber who was slowing the system down badly.  How will the new
Grex perform under those conditions?  I don't know.  I think we'll need to
gain experience with the new computer before we can really decide this.
I think we can reconfigure to use RAID later if we feel the need.  I think
we could do such a reconfiguration in a day, if needed.


#382 of 547 by cross on Wed May 28 16:13:17 2003:

Sounds good to me.  Also, going to the next grex allows one to do some
things that I think will be beneficial, such as turning off the queueing
telnet daemon (the queue is almost always empty, anyway, except in like,
5% of all cases), using a new version of SSH, ditching sendmail in favor
of something like postfix, etc.


#383 of 547 by tod on Wed May 28 16:58:41 2003:

This response has been erased.



#384 of 547 by janc on Thu May 29 01:18:49 2003:

Well, I didn't get much work on next Grex done today, but I built a
respectable castle out of Lego, so the day isn't entirely a waste.


#385 of 547 by aruba on Thu May 29 03:07:02 2003:

We finally received our OpenBSD CDs today - it took them 16 days to get here
from Calgary.  Stickers were included.


#386 of 547 by spooked on Thu May 29 11:00:41 2003:

I have an inkling Marcus won't ditch sendmail as readily as you might
wish, Dan.  It seems to be one of his favourite hacking toys.


#387 of 547 by janc on Thu May 29 13:43:24 2003:

I don't know what his plans are, but I'd be surprised if he didn't seriously
consider alternatives.  I think the port to OpenBSD is going to be a bit of
a "start over" for him even if he decides to stay with sendmail, because
moving all his modifications into a current sendmail release is going to
be nearly as much work as switching to a different program.  I don't think
he's really all that fond of sendmail.


#388 of 547 by cross on Thu May 29 16:01:16 2003:

He was talking about exim recently, but I still think postfix is a better
choice.  Grex isn't for people's personal hacking toys, anyway.


#389 of 547 by gull on Thu May 29 17:05:18 2003:

I use Exim, and it's certainly easier to configure than sendmail.  It
has a good, flexible filter language, too.  It doesn't have the same
privilage seperation features as Postfix, though -- it's still a
monolithic binary.


#390 of 547 by cross on Thu May 29 19:31:58 2003:

Yeah, that's one of my problems with exim.  I honestly believe that
postfix is just as powerful, can be made to do everything that grex
wants/needs, and is more secure.  I also argue that it's better documented.


#391 of 547 by mdw on Tue Jun 3 07:14:27 2003:

Major overload here.  Hm.

Regarding old hardware.  Even when we switch over, we'll want to keep
the old stuff intact for at least a bit in case of some sort of truely
disastrous problem with the new hardware.  Once we're comfortable, then
we can decommission it.  The disks, being slightly newer, may have some
slight use for other small projects.  Much of the data on them doesn't
really matter, but for mail, spool, user files, swap, and /etc, we
certainly want to scrub those before using them for other purposes or if
we decide to sell or give away any of them (even to my basement
collection).  Scrubbing them *is* going to be an easier way to ensure
reasonable security than destroying the disks.  This is because
sufficient physical destruction has its own issues.  Disassembling
things then bashing them with a hammer and using a bulk eraser may make
data recovery more difficult, but it may still leave traces of data that
could be recovered by the same sort of determined adversary that could
recover data from a "single overwrite of all 0" drive.  If that is the
level of security you want, then physical destruction would require
either a fairly good acid bath of all disk surfaces, or probably better
yet incineration of the aluminum platters.  No doubt we have pyromaniacs
would would enjoy doing this, and there are certainly services that will
do this (for a fee), but we'd probably be better off reserving these
drives for future small projects (such as offloading mail processing,
kerberos, etc.) or doing a multiple overwrite scrub procedure then
selling them on ebay.

Regarding kerberos.  Cross and I clearly have a unreconciable difference
of opinion here.  I'm clearly not going to change his opinion, there
seems little likelyhood he'll change mine, and I doubt most others share
even my level of paranoia or care all that much.  So I don't want to
waste time arguing this.  Cross (and any others who care) is welcome to
change his password just before & after switching to kerberos, which
should cover any personal concerns he may have.  Root & other passwords
will almost certainly change or be addressed by new mechanisms - this is
almost inherent in any switchover in any case.  Once we switch to k5,
baring unexpected changes to the standard, changing one's password will
likely result in a standards compliant k5 key at least potentially
useful from other machines.

Regarding mail.  Hm, I think some of this scrolled off.  Yes, we want to
keep hierarchical mail boxes.  The 4 possible mta's include exim,
postfix, current sendmail, & legacy/hacked sendmail.  Unfortunately,
mail is an area where we have significant functional requirements, which
means any stock solution out of the box will almost certainly prove
unacceptable.  One functional requirement is mailbox quotas, which at
least has simple design parameters.  Another functional requirement is
anti-spam logic, which is both controversial and important.  A final
functional requirement, unfortunately, is that this all needs to come up
in some finite amount of time.  I intend to look at exim & postfix, with
a view that one of these should a good enough base to support the
functionality we want.  As a fallback position, I am at least somewhat
willing to consider installing the current legacy/hacked sendmail, with
the understanding that it's both temporary and very very undesirable.  I
hope to spend time coding to avoid this possibility rather than
composing lengthy responses defending whatever choices I make here.

Hm.  Surely I've said at least half of this somewhere already?  Is this
useful?


#392 of 547 by cross on Tue Jun 3 07:23:23 2003:

Yes, it's useful.

Tell me, what do you think is so unique about grex's mail setup that
a stock solution won't work?  Surely postfix+procmail+spamassassin
could handle the load grex would put on it, complete with hierarchial
directories and mailbox quotas.  Much larger sites use that combo and it
works well.  In fact, if you went with putting mail in $home/Mailbox,
you'd get hierarchial mail directories for free, and eliminate a
filesystem.

Regarding Kerberos: I'd feel more comfortable with using your hashing
algorithm if the guarantee was made that it would disappear from
the system's Kerberos implementation no more than one year after its
introduction, or some other suitable timeframe.  There's no reason not
to agree to that.


#393 of 547 by carson on Tue Jun 3 08:10:05 2003:

(re: anti-spam logic:  I agree that a procmail/spamassassin combo would be
a good move.  nearly all of the spam that I receive at my Grex account is
sent via open relays, which both SpamAssassin and SpamCop [via reporting]
recognize and flag as such.  whatever Grex is currently using, obviously,
does not.  given Grex's culture, I understand the reluctance to outright
block mail from open relays, but I'd like to think that, with Next Grex,
we should have sufficient processing capability to flag such mail.) 



#394 of 547 by mdw on Tue Jun 3 11:59:15 2003:

Mail has enough issues that perhaps it ought to be discussed in its own
item.  At some point, I will need to come up with a list of what grex
mail currently has as "custom hacks"; that's not the same as a list of
functional specs, but might beat idle speculation that just because
there are "a lot" of solutions out there there's necessarily a set that
matches our needs.  I hope we will be able to take advantage of other
people's work as much as possible.  But I don't think there are any
guarantees that we will necessarily find exactly what we want.

Regarding procmail+spamassassin; this can't reject mail, which would be
a significant step backwards spam-wise from what we can currently do.
There are other issues regarding procmail+spamassassin (such as
enforcing mailbox quotas, running perl on every piece of mail) that I
don't find particularly attractive on a system-wide basis.  I don't have
a problem with this as a user option, but I'm much more concerned what
to do for everybody else as a default.

Regarding RBL - grex gets listed on them just often enough there's no
way I can see us wanting to do this.  RBL would be less unpalatable when
used in conjunction with other stuff as just one more clue something
"might" be spam.  I hope that whatever we end up will have the
flexibility to allow us such options, but not a sufficient or reasonable
solution on its own.

Regarding kerberos - there's no guarantee that *the* standard will
necessarily do what grex needs, especially right off.  Just for
starters, des/des3 have inadequate etype info, preauth methods to
reinforce weak passwords is lacking, aes is not yet fully standardized,
and there is argument that the default aes string to key ought to be
computationally intensive - fine for single/user workstations, not at
all a good match for a popular timesharing system.  I very much hope the
standard evolves to a point where it fully meets the needs I think we
have for it here on grex.  For the short-term, our schedule means we
probably shouldn't be, and when the standard converges to our needs is
not something we can dictate.  So, I don't think such a promise as you
ask would be in grex's best interest.

It may be worth keeping in mind that until we deploy useful kerberoized
distributed applications from grex, the ability to kerberos authenticate
to grex from elsewhere will be almost entirely only of academic
interest.  The real compatibility issue we have to sort out in the short
term is not kerberos standards compliance, but how well does it fit into
openbsd supplied interfaces and the grex environment?  We aren't even
close to worrying about distributed desktop applications, single
sign-on, or making sure user passwords never leave the desktop.


#395 of 547 by cross on Tue Jun 3 17:02:53 2003:

That begs the question, why bother with Kerberos at all, then?

I don't understand how other, much larger sites get away with stuff like
using spamassassin on all incoming mail, and otherwise working on stock
anti-spam solutions, but grex can't do it.

Regarding timeframes; well, grex has already blown its one year timeframe.
Given that, it seems most profitable to just use the BSD login API to
deal with the custom hash algorithm and skip Kerberos for a later day.


#396 of 547 by cross on Tue Jun 3 17:05:14 2003:

Oh yes, regarding the Kerberos standard; I'm fairly sure the idea of
burning a bunch of CPU time has been discredited and rejected.  If not,
it's a tunable parameter, anyway.


#397 of 547 by gull on Tue Jun 3 18:34:27 2003:

I think the problem he has with spamassassin is more the principle of it.  We
still have to accept the mail, we can't slam the door in the spammer's face. 
With the new system the extra processing penalty of having to accept the
whole message before deeming it spam is probably not as important.

I've seen some really poor spamassassin results on a few mailing lists I've
been on, but part of that may have been poor configuration.  On one list in
particular spamassassin seemed to flag every message sent from Mozilla as
spam, while missing a lot of *real* spam.  I've kind of shied away from it
for that reason.


#398 of 547 by cross on Tue Jun 3 20:04:44 2003:

I hate to break it to you guys, but 99% of the time, you're not slamming
the door in the spammer's face as it is now.


#399 of 547 by carson on Tue Jun 3 21:45:58 2003:

(my vague recollection is that Kerberos will allow Grex to offload mail
processing to another machine in the future.)


#400 of 547 by cross on Tue Jun 3 21:50:03 2003:

I don't think that'll buy much on the new machine, frankly.  btw- I've
started a new item, #156 in garage, for discussing mail issues.


#401 of 547 by gelinas on Tue Jun 3 22:40:09 2003:

(The problem we are trying to solve, if I recall correctly, is network
bandwidth, NOT CPU/disk/etc.  In that sense, the door is being closed
fairly quickly:  my understanding is that the mail is being rejected
before the SMTP 'data' command.)


#402 of 547 by cross on Wed Jun 4 02:05:34 2003:

Postfix will do that.  btw- has anyone ever measured how much bandwidth
grex is really using?


#403 of 547 by mdw on Wed Jun 4 05:53:22 2003:

Kerberos does 3 things for us: gives a more graceful way to deal with
our existing password hash data, allows us to start developing
distributed applications (such as file service, mail, conferencing,
etc.), and eventually (if standards and availability permit) may allow
us to offload client side stuff to user workstations.  The first is
immediately useful to us all on its own.  The 2nd could be useful in the
next 1-5 years.  The 3rd is "pie in the sky" for now, but not
impossible.


#404 of 547 by janc on Wed Jun 4 13:33:25 2003:

Did I misunderstand that?  The modified password hash algorithm was justified
because it would ease the transition to Kerberos.  Now Kerberos is justified
because it works better with the modified password hash algorithm?  Seems
like the case for kerberos has to rest on points two and three.


#405 of 547 by cross on Wed Jun 4 17:11:23 2003:

Again, I think it would be easier to use the BSD login API to deal with
the custom hash algorithm now, and find some other way to transition to
a standard Kerberos KDC.  Particularly if you're not considering moving
to distributed services for another 1 to 5 years.


#406 of 547 by jp2 on Wed Jun 4 20:34:42 2003:

This response has been erased.



#407 of 547 by mdw on Thu Jun 5 02:31:02 2003:

The first is the reason to do kerberos now rather than wait another 1-5
years.  The second is the reason we want to do kerberos.  The third is
the biggest but longest term and least likely win.

kerberos 5 does do md5.  But the md5 there has nothing to do with Unix
md5 password hash.  The unix md5 password hash was an early example of
"computationally expensive" hash logic -- see above.


#408 of 547 by cross on Thu Jun 5 03:22:16 2003:

It's an extremely bad idea to take the hashed passwords from /etc/shadow
and use them as keys in a Kerberos KDC.  I already advocated a method for
solving this problem cleanly and easily, but Marcus basically ignored it.
See item 134 in the Garage conference.


#409 of 547 by dang on Fri Jun 6 03:48:38 2003:

(Spamassassin can be used to reject mail in the MTU, rather than via procamail.
This is not as good as the spam rejection we do now, because the entire mail
must be on Grex to run spamassassin against it. However, it will reject
considerably more spam. It's a tradeoff.)


#410 of 547 by aruba on Fri Jun 6 04:19:37 2003:

Yeah, I gotta say, the spam-rejection we're doing now is letting a whole lot
of spam through.


#411 of 547 by cross on Fri Jun 6 05:49:08 2003:

Maybe it's best to delay this discussion until we know exactly what
the mail filters do now....


#412 of 547 by scg on Fri Jun 6 07:07:05 2003:

I should note that one of the places I receive mail through runs SpamAssassin,
and most of the spam I get scores considerably lower on the SpamAssassin
scores than most of my legitimate mail.  Of course, I never see the mail
SpamAssassin does catch, and that's probably a considerable quantity of mail.

Rejecting spam in the MTA is obnoxious.  If you're receiving the spam directly
from the sender, it probably makes a lot of sense.  In general, though, most
of the spam I receive is forwarded through other lists and aliases, and the
return addresses are generally invalid, so bouncing spam just forwards it to
the postmaster of whatever mail server was forwarding it, disguised as bounce
messages of a sort the postmaster might actually need to look at.  Spam
filters should throw the spam away silently, with the caveat that you have
to be really careful to err on the side of assuming mail is legitimate, since
the sender of legitimate mail won't know the mail isn't getting through.  In
addition, if you're tagging spam in SpamAssassin and letting procmail do the
discarding, you give individual users some control over how much is being
deleted.

My complaint about SpamAssassin letting through too much spam doesn't mean
it's any worse than Grex's current filtering.  I started automatically
discarding all my Grex mail because more spam was coming through than I could
manage.


#413 of 547 by cross on Fri Jun 6 07:33:34 2003:

A good amount of the spam I receive these days comes through grex.
I don't bother forwarding it back to grex's uce alias since most of it
is the standard type of junk one typically sees.  That is, I doubt anyone
is going to learn anything from it that isn't already common knowledge.

I will note that spamassassin now includes a Bayesian filter that can
be `trained' to recognize real spam.  Running it and letting it do the
tagging can't hurt much.

There might be something to be said about generating bounces to
postmasters who are running open mail relays, but I tend to doubt it.


#414 of 547 by mdw on Fri Jun 6 07:51:25 2003:

It's extremely difficult to collect good statistics about the
effectiveness of many anti-spam measures.  Spammers tend not complain
when they can't send spam, so we don't know how many gave up.  Natural
selection has clearly favored the evolution of spammers who get by
grex's defenses.

The problem with bayesian filters is it has to be trained on an
individual basis to give the good results people quote.  It's not enough
to give it spam, it also have to be given ham, and each person's spam
and ham are different enough that a filter trained to one person's ham
isn't going to do so good at another's ham.  A group of people with
common interests may get acceptable results even so; and in a work
environment defining "common" may even be possible.  But I doubt there's
enough commonality amongst all grex users to achieve results of any
great value.


#415 of 547 by janc on Fri Jun 6 13:14:54 2003:

Case in point:  I'd be perfectly happy with a spam filter that rejects all
mail written in any languages other than English and German.  If I trained
a Bayesian filter, that's probably part of what it would do.  However, there
are lots of other users on Grex who for some strange reason like foriegn
language main.  A suitable Bayesian filter for them would be very different
than one for me.


#416 of 547 by gull on Fri Jun 6 13:28:29 2003:

Re #412: If a postmaster's site is acting as a spam relay, they deserve
to be annoyed.

Re #414: Where I work we're currently using a Bayesian filter with a
site-wide corpus, with pretty good results.  But this is a small
company, with about 30 employees that tend to make similar decisions
about what is and isn't spam.  (The majority of our spam is, for some
reason, for porn sites and penis enlargment products, which no one
admits to wanting in their work accounts.)  We also don't bounce
messages based on the filter, just tag them for later filtering with
each user's mail client.  I seriously doubt this approach would work for
Grex, because the user base here is too diverse.

To me, it's far, far more important that legitimate mail to my Grex
account not be rejected than that spam be blocked.  For that reason I'd
oppose any heavy-handed spam filtering unless it was configurable on an
individual user basis.  I'd also oppose any spam filtering that silently
dropped messages instead of bouncing them, because it's far better if a
legitimate message bounces than if it just disappears.  People are
conditioned to assume that if an email doesn't bounce back, it arrived
intact.


#417 of 547 by cross on Fri Jun 6 14:31:13 2003:

A global installation of spamasassin can be use each individual user's
preferences.  That is, it can be run by default and use a target users
data; there doesn't have to be a system wide immutable setting.  Also,
spamassassin doesn't automatically delete spam, it just tags it and
let's the user decide what to do with it.  I dump mine into a special
MH folder that I scan once or twice a day to pull out any false positives.


#418 of 547 by carson on Fri Jun 6 17:06:06 2003:

(when installing spamassassin on a system, it's a good idea to run it in
a "learning" mode before using it to block mail.  because it works using
Bayesian filters, it needs time to A) learn what to block and B) prove
that it's not dumping legitimate mail at an unacceptable rate.)

(Marcus wrote somewhere [I don't remember where off-hand] recently
indicating that, if Grex were to use an open relay blocking list, it would
occasionally reject mail to itself because Grex sometimes appears on RBL
lists.  I'm not sure why he caem to that conclusion.  while I've
occasionally seen Grex on DNS blacklists, I've yet to see it on an open
relay list.  at any rate, it seems trivial [to me, anyway] that, if Grex
were to use outside blacklists, it would be on its own local whitelist.)

(Scott Vintinner wrote a document on how to set up an anti-spam gateway
using a combination of OpenBSD, Postfix, Amavisd-new, SpamAssassin, Vipul's
Razor, and DCC.  it's at http://lawmonkey.org/anti-spam.html .  he makes
some choices in implementation that we likely would not.  still, it
provides a start, and there are some useful suggestions in the document
that are valid in and of themselves.  the one question I'd have about the
gateway set-up is "how resource-intensive is it?", but I keep reminding
myself that NextGrex is more powerful than the current model and will
likely surprise us with what it can and can't handle.)


#419 of 547 by scg on Fri Jun 6 21:26:50 2003:

re 416:
        There's a rather large difference between an open relay forwarding spam
in random directions, and a well secured mail server handling mailing lists
or .forward files.

As a case in point, I used to host the Grex staff mailing lists on my mail
server.  That is, people would send mail to the staff@grex.org address or some
other less publicized aliases, and it would be forwarded to my mail server,
which would then forward it on to the individual staffers on mail servers all
over the place.  Then a couple of staff members started using spam filtering
that rejected spam in MTA, thus sending the spam back to the postmaster of
the mail server that was trying to deliver it to them, in this case me.  One
staffer had this imposed on him by his mail provider, and didn't have much
of a choice in it, while another staff member had configured it himself and
refused to fix it.  The result was that running a mailing list that sent mail
to those people was more trouble than it was worth.


#420 of 547 by dang on Tue Jun 10 18:19:19 2003:

My provider dedided a while ago to start bouncing mail with SpamAssassin. For
whatever reason, this was catching a lot of mail that I wanted, and people were
complaining. I left them and went to a site that didn't block spam, as I run my
own spam filters (a combination of spamassassin with custom rules and
bogofilter). Just a datapoint.


#421 of 547 by i on Sat Jun 21 13:13:32 2003:

There was a decent item on SlashDot yesterday about noting the tuple of
sender, recipient, & IP address at the the start of a SMTP session and
putting the e-mail off an hour (with "try again later" or some such) if
the tuple isn't in a kept-on-the-side database of tuples seen more than
one hour but less than 36 days ago.

This does a passable job of blocking most try-once-quickly-and-move-on
spam, but little try-again-per-the-RFC real e-mail (according to the
author).  The 1 hour and 36 days were adjustable parameters; there were
other details & some real-use-experience statistics.

This sounds like it has a number of features we're looking for in an
anti-spam tool for Grex...any thoughts?


#422 of 547 by other on Sat Jun 21 17:05:30 2003:

Whatever will help stem the flow...


#423 of 547 by gull on Mon Jun 23 14:17:11 2003:

It would tend to increase the network load.  (Each valid message that
wasn't high-traffic would require two connections and attempts intead of
one.)  Is the majority of spam really "try once and move on?"  I imagine
that's probably true of direct-to-MX spam, but probably not true of open
relay spam.


#424 of 547 by keesan on Mon Jun 23 21:06:02 2003:

I get the same spams numerous times.


#425 of 547 by cross on Mon Jun 23 21:33:17 2003:

Perhaps, but it's not clear to me that the load on the network link
is excessive right now.  It's thought to be, but no one's ever measured
it.  I'd think the latency in getting email would be more troublesome.

Spam isn't a problem that's solved by adding arbitrary delays into the
mix.


#426 of 547 by i on Tue Jun 24 00:39:06 2003:

We're close to having this ready for production use at work.  Very early
results suggest that most spam doesn't come back to retry delivery later.


#427 of 547 by russ on Tue Jun 24 03:49:02 2003:

It might be easier to deal with spam by accepting the first N pieces
of mail from a novel host, then delaying the rest; if the mail starts
showing up in the UCE bin, further mail from that host is refused for
an extended period (or perhaps permanently).  This takes care of
hijacked relays as well, while passing the occasional e-mail from an
odd host without any delays at all.

While spam may not be a big bandwidth load, it sure isn't going to
stay small unless we act; it really behooves us to frustrate spammers
if we can.

Jan, is there any way to get Backtalk to salt its pages with fake
e-mail addresses that would be picked up by spam harvesters?  That
would be one way to be certain that a host was sending spam to Grex.


#428 of 547 by gull on Tue Jun 24 13:11:57 2003:

All of these delaying tactics sound like they have the potential to
cause problems for people who are on legitimate mailing lists.


#429 of 547 by janc on Tue Jun 24 15:34:14 2003:

I think it's probable that email addresses are being harvested from Grex, but
I'm not sure of the extent of the problem.  Grex backtalk has a robot.txt file
that requests honest robots not to harvest it, which is why google searches
don't find Grex items (somewhat mixed blessing).  Obviously dishonest spammers
are likely to disregard this, and I have seen indications of robots walking
through Backtalk on Grex.  In most Backtalk interfaces, clicking on the user
name will give user bio page - but on Grex that is just the .plan file, and
won't contain in clickable email addresses, which are probably the spammer's
favorite thing to harvest (on other systems Backtalk does have clickable email
links, an issue that I need to address).    Some people will have their
email addresses in their .plan files, but most probably don't or have email
addresses for systems other than Grex, so spam to addressesharvested from
there would go to non-Grex addresses and be hard to recognize.

A spammer might be smart enough to go through the Backtalk pages, pick up login
names and attach "@grex.org" to the end of each.  But I'd think this would be
uncommon.

I'm not really inclined to think that seeding a lot of bad addresses is
going to help enough to be worth the ugliness.  However, you are certainly
welcome to include <A HREF=mailto:uce@grex.org>send spam here</A> links
in all your HTML postings on Grex.


#430 of 547 by scg on Fri Jun 27 22:40:18 2003:

I don't doubt that delaying mail acceptance for an hour would be effective
against the current generation of spammers, but my general impression of
techniques for blocking spam by insisting on standards compliance is that
spammers are getting better and better at following standards.  That strikes
me as something which, if done commonly, would put a lot of extra load on
legitimate mail servers, and would break the ability of e-mail to be used as
a fairly instantanious back and forth communications tool.


#431 of 547 by i on Sat Jun 28 00:22:20 2003:

More results from using this at work - over 75% of spammers do not come
back within 24 hours.  No evidence that any legit e-mail has been lost.
Looking through the tuple database, substantial spam attacks are *really*
obvious...suggesting automated means to keep 'em locked out after their
tuples age out of probation...i think we're using 5 minutes for this now.

Yes, like any anti-spam technology, this has downsides and costs for both
the infrastructure & users.  But *not* using any anti-spam technology 
also has downsides & costs - like overflowing "in" boxes of left-'cause-
there-was-too-much-spam ex-users.


#432 of 547 by gull on Mon Jun 30 15:31:19 2003:

Walter, are you using some kind of pre-made package to do this or did
you roll your own?


#433 of 547 by i on Wed Jul 2 01:08:54 2003:

We rolled our own (adding a few lines of C to our mail server software
with MySQL on the back end).  Graylisting is hardly more complex than
a bubblesort routine, and it took fewer lines of code than any bubblesort
that i can recall.

We started it in "observer, record, & report what you'd do" mode.  We've
added more simple features (whitelist, etc.) to it as they've occured to
us.

It looks like spammers are far more impatient on retrying than legit mail
servers - we're hoping to add the rule "if graylisted e-mail comes back 
before the graylist time expires, then add a minute to the interval it 
came back within, compare that sum to the remaining graylist interval, 
and update the graylist interval to the greater of those two".  Tweaking
the numbers promises to let almost all legit e-mail avoid additional
delays from this, hopefully spam with delay itself to death (or blacklist).


#434 of 547 by jep on Fri Oct 17 16:04:15 2003:

How's the NextGrex project coming along?  I haven't seen any updates 
here since July.  Is the computer still going to be new when this is 
completed?


#435 of 547 by cross on Fri Oct 17 17:23:27 2003:

It's stalled; too many staff have too many other things going on.  I was
going to propose, and I suppose here is as good a place to do it, that
we move nextgrex from Jan's house to the pumpkin after the new version
of OpenBSD comes out.  I think that'll make it easier to test and debug
subsystems, and ultimately easier to transition over from old grex to
new grex.  Once we do that, we should set a schedule; a reasonable one,
that we can stick to, trying to anticipate people's time demands, for
switching over within, say, no more than three months.

Major subsystems I see needing configuration before we can switch are:

        1) The BBS.  Someone has to port PicoSpan (mdw?) or provide
           an alternative (YAPP?  Frontalk?)
        2) Mail.  We have to build out the mail system.
        3) set up newuser
        4) Configuration changes.

That's about it.  Everything else, we can do once we've transitioned.


#436 of 547 by remmers on Fri Oct 17 17:30:45 2003:

I'd say that's a fair summary.  Only potential drawback that I can see
to locating it in the pumpkin is if the hardware still needs a fair
amount of hands-on attention, it'd be a nuisance for somebody to run
over there all the time to tend to it.  If the hardware configuration
is pretty stable now, this isn't an issue.


#437 of 547 by jp2 on Fri Oct 17 17:47:51 2003:

This response has been erased.



#438 of 547 by mary on Fri Oct 17 18:09:53 2003:

Could we keep mail as is, on old Grex, for a while?
Maybe get it up using YAPP and Frontalk only? 

I, like everyone else, would hat to see this project 
stagnate much longer.  If that means coming up less
complete, that would be better than not at all.


#439 of 547 by aruba on Fri Oct 17 19:20:54 2003:

I agree.

Re #437: My understanding is that Grex is running a legal copy of Picospan,
though we do not have an official license.  The validity of our copy comes
from Marcus, so he needs to explain how that works.


#440 of 547 by jep on Fri Oct 17 19:58:33 2003:

Sorry for the sharp-sounding comment at the end of my question.  It 
didn't come out the way I had imagined it would.  No criticism was 
intended.

If the hardware is moved to the Pumpkin, are there people who can do 
the things which need doing?  Is the direction of the mail system 
determined closely enough that another staffer can do it, or is this 
a "Marcus-only" project?  How about newuser?  I assume Picospan is 
something only Marcus can do.

I'd hope the staff is avoiding things wherever possible that can only 
be done by one specific person, whether it's Marcus or anyone else.

Specifically... all of the world uses e-mail; Grex should be able to 
implement an e-mail system, too.  It isn't even that important.  If 
Grex didn't have e-mail at all, every Grexer would be able to get by 
anyway by using any of their dozens of other mail addresses.  It would 
be silly to find out NextGrex was being held up only because a specific 
person wasn't available to set up a customized e-mail system.

For other topics; is there agreement on what needs to be done for 
newuser?  Is someone working on that?

I'm not particularly parsimonious... but all of the hardware was 
purchased months ago, some of it at premium prices as I understand it 
to obtain stuff that's state of the art.  Computer hardware doesn't 
have much of a shelf life for being state of the art.

Grex got donations, and bought all of the stuff at a pretty quick 
pace... then nothing much seems to have happened, at least judging from 
the information I'm seeing coming out.  It made me wonder about why 
things aren't happening any more.  Is it a brief slowdown for specific 
reasons, and things will be picking up again soon?

Thanks!


#441 of 547 by jp2 on Fri Oct 17 20:02:28 2003:

This response has been erased.



#442 of 547 by davel on Sat Oct 18 00:39:37 2003:

Re 440: John, some users have no other email address whatsoever.  I happen
to live with 3 of them.  Other users have enough correspondents who email to
their Grex addresses that notifying them all would be a big problem - and
suddenly having mail start bouncing without much warning would be a much
bigger problem.  I'd say we could do without conferencing for a while about
as well as doing without email.


#443 of 547 by cross on Sat Oct 18 02:05:41 2003:

Regarding #440; We're not going live without a working mail system.
It's not rocket science to put one together, but it does take time,
and that's what everyone is lacking right now.

Regarding Remmers's comment about hardware configuration; As far as
I can tell, with the exception of the ethernet interface (which seems
flakey), it's pretty much set.  Everything left is purely software
configuration.


#444 of 547 by jep on Sat Oct 18 03:18:13 2003:

I'll accept the point about e-mail, but that's not the main point.

resp:0 brings up the idea of buying the hardware right away... 
February 17, 2003.  There was enthusiasm at that time, maybe even some 
urgency.

The first component was purchased on April 8, ordering continued 
through April, assembly began on May 4, and by May 17, aruba said it 
was all assembled, tested and working.  There was enthusiastic testing 
through May.  The final physical component arrived on May 28 -- the 
OpenBSD CDs.

And work stopped, at least as visible to interested outsiders.

It has been 4 1/2 months since May 28; approximately twice as long as 
it took to acquire, assemble and test all of the hardware.  I presume 
OpenBSD has been installed on the new hardware so that *something* can 
have been done since then.  In 4 1/2 more months, it will have been a 
year since this item was entered.

Where's the urgency now?  Or at least enthusiasm?  Shall I wait until 
February and ask again then?  Or maybe hold my horses until April?  Or 
will that have been too soon to expect results from the purchase of 
all new hardware that was fully assembled by the end of May?

If we hadn't bought anything yet, how much further behind would we be, 
compared to where we are now?  Put another way, how long do we wait 
before the hardware needs to be replaced because it's too old?

I understand very well about being too busy... I also understand it's 
not always *everyone* who's too busy.  That's not even very likely.  
There have been many group projects that never happen at all because 
everyone waits forever on one person.  It would be a real shame if 
this project is one like that.

The treasurer bemoans the lack of donations all the time.  Donations 
are declining... but a lot of people rushed right in to send money 
when they were told it was needed for the new NextGrex computer.  You 
folks on the staff gave every indication you were ready to set it up 
so we could start using it.  We all understood it would take time... 
but how much time?  It takes a *really* *long* time to finish a 
project if no one is working on it.

If that's the case, as it appears to me it is, maybe it it could be 
time to look at alternatives?


#445 of 547 by cross on Sat Oct 18 04:49:16 2003:

Hey, I agree with you 100%.  It's ridiculous that it's taken this long
to get things rolling.  I think we should be able to move over to next
grex within three months, and I can think of no reason we shouldn't be
able to: this has stalled long enough.

There is some work happening, but you won't see it if you just read coop.
Most of it happens in garage, and some (to a much lesser extent) in
other places.  In particular, despite juggling kids and other time
commitments, Jan has made some stellar progress setting up facets of
nextgrex.  Regardless, we're stalled right now.

Here's my take on some of what happened.  There's been disagreement
among staff about how _best_ to proceed in certain technical areas.
But traditionally, certain staff members have maintained domains of
responsibility comprising various subsystems on grex (like how Marcus
maintains PicoSpan).  Some of this is necessary (Marcus and PicoSpan is a
good example; he's the only one with the source code), some is contrived.
Those parts have been, if not off-limits to other staffers, then at least
considered that individual staffer's responsibility and left up to them.
So, despite some staffers having more time than others, certain areas
of nextgrex remain untouched pending the staffers who have less time,
but traditional responsibility of those areas, to become available.

I don't think this is working out too well.  I think maybe we should
consider just saying, ``screw it; we need to get the new machine online.
Let's figure out the quickest way to do that and go from there.''

*THAT* said, we also have to be careful.  Grex, right now, is a real
mess in my opinion.  There are patches upon patches upon bandaids upon
kludges upon hacks upon more patches stacked up so high, it's difficult
to see over them all.  I think it's scary for newer staff (well, for
me anyway) to *do* anything because everything is so customized.  A lot
of the work Jan has been doing on nextgrex is meticulously documenting
*everything* he's done so that rebuilding grex from scratch is going to
be an easy process for a reasonably competent Unix system administrator.
This is good, and necessary, and we really do need to do this with all
the major components of the system so that, moving forward, we don't
end up in the hole we're in right now (it really is a hole).

I think we can get back on track if a couple of us agree to donate a few
hours or so a day for the next couple of weeks, to get OpenBSD 3.4 to
where we're at now with OpenBSD 3.3 (note: OpenBSD 3.4 comes out on
the first of November).  If we then put the machine in the pumpkin, we
should be good to go with getting nextgrex online in under three months.
hour or so a day for the next few weeks to g


#446 of 547 by jp2 on Sat Oct 18 14:27:11 2003:

This response has been erased.



#447 of 547 by cross on Sat Oct 18 14:46:51 2003:

I've proved the security of grex's password hash to be the same as that
of sha1, which is provably mathematically superior to MD5.

Also, isn't YAPP shareware?  I thought you had to pay a significant
amount of money for anything other than casual use?


#448 of 547 by jp2 on Sat Oct 18 15:12:03 2003:

This response has been erased.



#449 of 547 by cross on Sat Oct 18 19:06:23 2003:

Blowfish in OpenBSD is pretty slow.  I've argued many times that building
our own scrambling algorithm was a bad idea, and I certainly would have
done it differently, but it happened before I came on board.  Sorry,
them's the breaks.  Sometimes you just have to accept what you're given
and work with it as best you can: we've got anywhere from 20,000 to
40,000 users whose passwords are hashed with it.  But, we've also gotten
it working with OpenBSD (I've got it working with login.conf on nextgrex).

Jamie, I think you have a lot of good ideas on how to move grex forward,
but don't blow it by being your polemical self.  Three months *is*
a reasonable amount of time for a complete build out of nextgrex.

And I don't care how long it took Salcedo to get mnet up and running.
This isn't mnet, and we want to have an amount of downtown that's a
minimal amount longer than the time it takes to transfer the data from
oldgrex to nextgrex (five weeks just to move things over---now that's
outrageously long, in my opinion).  I don't even live anywhere *near*
Ann Arbor (otherwise, I'd volunteer to take in the hardware and crank
away on it over the next two or three weekends), and other staffers
don't have a lot of time, so it's unlikely we'll be able to half the
amount of time I already projected.

As for YAPP, I'm all for it.  We need something up and running sooner
than later.  However, ``you-think-you-heard-someone-say-maybe'' isn't
a license agreement, and if your primary objection to PicoSpan is a
licensing issue, then trading that can of worms for a mess that is a
verbal licensing agreement for YAPP doesn't seem much better.  Now,
if I had my druthers, I'd just as soon see FrontTalk built out and
used as a replacement for both.  We're sure of the license for that;
unfortunately, that would require someone talking a lot of time to
make happen, and, as we know, time is an issue.

Look, you're preaching to the choir here.  So cut the confrontational
crap and let's figure out how to move forward.


#450 of 547 by jp2 on Sat Oct 18 19:16:35 2003:

This response has been erased.



#451 of 547 by glenda on Sat Oct 18 20:19:55 2003:

I would like to see us stay with picospan.  I left m-net when they switched
to yapp.  I tried it for a couple of weeks, hated it and never logged in
again.


#452 of 547 by jp2 on Sat Oct 18 20:29:33 2003:

This response has been erased.



#453 of 547 by jp2 on Sat Oct 18 20:39:57 2003:

This response has been erased.



#454 of 547 by jep on Sat Oct 18 21:02:07 2003:

Four and a half months is long enough to wait for one or two formerly essential
people to do what they have apparently promised. If there are holdups who
aren't going to do what Grex needs, then they need to be replaced. Otherwise,
the future of Grex is being held hostage, and those one or two people are being
treated as more important than all of the rest of us. I can understand it if no
one on the staff wants to state it that baldly. There seems to be little enough
of teamwork for this project. So there it is, I'll say it. I have used YAPP for
years, and Picospan for years, btw, and have not noticed any deficiencies in
text YAPP. (WebYAPP is horrible, but Backtalk is available.) Let's put it this
way, I'd rather have YAPP and NextGrex than Picospan and the rotting remains of
obsolete Sun hardware.


#455 of 547 by cross on Sat Oct 18 21:35:17 2003:

Regarding #450; You're making all the same arguments I did.  They haven't
worked yet, but I'm willing to transition to another algorithm.  It just
takes convincing people that it's the right thing to do.  However, the
specific suggestion you make, to use the password expiration feature
to force switching to another algorithm, would take a year (and is also
something I suggested several years ago).

Regarding #451; What specifically didn't you like about it?  To me,
they seem functionally equivalent, though I confess to not using any of
the more exotic features of either.

Regarding #454; Okay, that's fair.  Btw- it's not just picospan; there are
a number of things that are waiting the laying on of hands of one or two
people.  I confess I myself am guilty of stalling on at least one thing.

The way I look at it today, it's silly to try and do anything before
OpenBSD 3.4 comes out on the first of November.  Assuming there's going to
be a bit of rough edges around the release, let's say it's reasonable to
assume we can do an FTP install on the 4th or so.  From today, that gives
us about three weeks before we can *really* make any software changes.

I propose this: we start the clock for the transition to nextgrex today.
We have the period between now and the 4th of November to argue about how
to do things, and then once November hits, we have two months and one week
to get the new system online, with all of the data transferred, and the
Sun turned off.  I feel pretty confident we could make that deadline, and
if someone has a pet project they can't squeeze in before that, too bad.

Comments?


#456 of 547 by cross on Sat Oct 18 21:51:10 2003:

Regarding #448; By the way, the details of the grex password hash, as
well as the details about the configuration of many of the subsystems
on grex, have been publically available for some time through the,
``grex staff notes'' on the web.  Indeed, there's even a link to the
code that implements it (using that was how I ported it to nextgrex).

See http://www.cyberspace.org/staffnote/passwd.html, with the actual
code at: http://www.cyberspace.org/staffnote/mkp2.c.


#457 of 547 by jp2 on Sun Oct 19 00:09:14 2003:

This response has been erased.



#458 of 547 by cross on Sun Oct 19 03:06:42 2003:

Responding to #457:

(1) No.  It has it's own recognizable token that doesn't match $[0-9]+$,
but it's easy enough to switch off.  /etc/master.passwd can be put
together with awk or perl.  Indeed, I already have a script to do it,
and authentication uses the standard BSD login.conf framework.

There are two possible places to go from here.  (a) we move to Kerberos.
This could be done by modifying the password changing program to
automatically register principles using a standard string to key
algorithm, or by using the modified KDC Marcus wants that uses his hash
algorithm, where we'd use the existing contents of /etc/shadow as the
key database.  I think the latter is a really bad idea.  (b) keep the
same hash algorithm, using the login.conf framework to deal with it.

Okay, there are really three: (c) switch over to one of the system
standard algorithms after a suitable period of time, by modifying the
password changing program to do it.  Our customizations would disappear
after a year or so.  I favor this route.  If we want to go Kerberos,
it'd be best to go with a standard string to key algorithm, but this
would be easy using methods I've already outlined elsewhere, using
login.conf to make them transparant to the user.

(2) That's the equivalent of a site-dependent salt.  The idea is that if
the same algorithm is used on more than one site, the hashes shouldn't
come out to be the same.

(3) That's not even the case with conventional cryptography (the algorithm
doesn't get *easier* to crack, it just doesn't get appreciably harder).
Regardless, this isn't conventional cryptography; it's a compression based
hash algorithm.  Marcus's password scrambling algorithm is essentially
the HMAC construction applied to password hashing.  The thing is,
HMAC doesn't provide any additional security to password hashing over
simple hashing, because it's designed to solve a different problem:
authenticating messages over an insecure network, using a shared secret
key.  The thing is, with password hashing, either the key or the message
is fixed, so you don't get any additional security.

(4) That would require a change in the current configuration.  But,
there's no point.  We have the situation with the password hashing
algorithm well in hand on nextgrex.  See (1).

Don't bother arguing about the custom hash algorithm.  It was a mistake
(well, sort of.  It was better than Unix crypt(3)); everyone knows that.


#459 of 547 by jep on Sun Oct 19 03:28:33 2003:

What reason is there to believe that anything will start on November 
4, when it didn't start on May 28?  Dan, are you speaking for the 
staff?


#460 of 547 by cross on Sun Oct 19 04:37:22 2003:

Regarding #459; No, I'm not speaking for all of staff.  I am, however,
speaking for myself as *part* of staff.  There is of course a difference,
though my experience tells me that timeline is reasonable if the
rest of staff commits, say, 4 of 5 hours a week to make it happen.
I'm hoping other staff members will chime in here saying either, ``no,
that's completely unreasonable, and here's why...'' or, ``Yes, I think
would could do that.''

My reason for stating that November 4 is a good starting date is that
the next version of OpenBSD *will* be released on the 1st of November,
and therefore it's reasonable to assume the 4th will be a date when most
of the major problems of the release will be worked out, and it would
be feasible to do an install.


#461 of 547 by gelinas on Sun Oct 19 05:50:35 2003:

There is a change in object file format with the next release, if I've
properly understood the discussion.  We (staff) have been discussing
when to make the switch, with most wanting to wait for the release that
includes/requires it.  So November 1 sounds like a good starting point
to me.

I expect this to be a significant part of the agenda of Wednesday's
staff meeting.


#462 of 547 by gelinas on Sun Oct 19 06:24:44 2003:

On expiring passwords:  ssh does not work well with expired passwords. 
Blanket expiration of all passwords would cause a lot of trouble.


#463 of 547 by mary on Sun Oct 19 12:05:49 2003:

I'd like to respond to the earlier thread, of how we've gotten to the
point that the move to New Grex has stalled.  Because I'm going to name
names and be pretty straightforward here, I'm going to be very clear this
is my opinion and I could be dead wrong on a lot of it. 

One of the things that makes Grex so cool is that we are the sum of
volunteer effort.  It's also a problem as there really isn't anyone to
call on the carpet when project goals slide.  And not only are we
dependent on volunteers but a very few pretty much call the shots for how
anything goes.  Partly this is out of respect for their opinions, partly
it's because if the wrong decision is made they are the ones who will have
to clean up the mess.  But there is something else working there too, that
I can't quite put into kind words, but it's working against the system as
a whole right now.  It can be seen in the difficulty our staff seems to
have working as a team. 

We are all at fault for allowing one or two people to be so central to
Grex's future.  It's my understanding that Marcus is under a lot of stress
at the moment, dealing with family issues.  Been there, done that, but I
was fortunate not to have a community of thousands breathing down my neck
to simultaneously keep their project growing.  If Marcus is holding things
up right now it's not Marcus' fault.  It's our fault for letting *any one
person* be in the position of being that necessary. 

So, where I'm going, is toward a shift in philosophy.  We need to not only
move Grex to new hardware but to a way of working that fosters teamwork,
uses software that is team customizable, and where any one person could
walk away and the rest of staff could pick up and carry on.  It's way time
this should happen.

I don't think Picospan is going to fit that goal.  I'd like to take a
serious look at software that would.  It's scary, for sure.  It probably
would mean we'd not be too happy until we'd had a chance to mold it to our
specific needs.  But the point is we could mold it and the staff and users
would get to decide how.  And we'd be far far less dependent on one
person. A win/win after the agony of working though the change. 

I'd also like to applaud Jan and other staff, who are working hard to
document exactly what they are doing with New Grex, and looking to
standardize as much of the hardware and software as possible.  They are
looking out for us. 



#464 of 547 by jep on Sun Oct 19 13:23:12 2003:

Mary, in my opinion, that came off as both tactful and straight.  I 
don't think I've ever managed to do both at the same time.  Nice job.


#465 of 547 by aruba on Sun Oct 19 14:29:54 2003:

I basically agree with Mary here, and like everyone I am frustrated that we
haven't moved forward.  I'd like to see some input from more staffers.


#466 of 547 by cross on Sun Oct 19 15:05:56 2003:

Well, there is another issue at play.  Replacing Picospan is fine with
me, but what are we going to replace it with?  An old version of YAPP is
available in source form, but even mnet doesn't run that, and there are
no indications that it's completely free (as in not paying for it), or
that the new version is available in source form, free, or that we have
any chance of getting a license for it without paying many thousands
of dollars.  What alternatives do we have?  Is frontalk there yet?
That's the *only* reasonable alternative if getting YAPP doesn't pan out.


#467 of 547 by jp2 on Sun Oct 19 21:17:03 2003:

This response has been erased.



#468 of 547 by cross on Sun Oct 19 22:30:49 2003:

What's the email address?  I'll send a message asking, but I'm not
convinced it's the right direction.


#469 of 547 by jp2 on Sun Oct 19 23:49:53 2003:

This response has been erased.



#470 of 547 by cross on Mon Oct 20 01:09:08 2003:

Okay, I sent an email to that address.  We'll see what comes back.


#471 of 547 by asddsa on Mon Oct 20 03:01:13 2003:

We sure will.


#472 of 547 by janc on Mon Oct 20 03:02:37 2003:

I did a lot of work on NextGrex for a while.  After a while I lost momentum.
Now I'm also short on time, having a lot of work to do.  There are still
things I ought to do, but I don't think anything is actually blocked on me.

I don't think Picospan is a problem.  Ask Marcus, and he'll deliver.  An
OpenBSD version already exists.  Installing it would require very little of
his time.  He hasn't done it because he's thinking Grex isn't coming up right
away anyway.

If you want to move away from Picospan to something we can get a source
license to, then that's a more significant project.  I'm inclined to
discourage Yapp.  I've worked with it on Spring and M-Net and in both places
it much flakier than Picospan.  There seems to be something where it deletes
hunks of the following response when a response is censored, and other
flakiness.  The Spring especially seems to be riddled with mangled item
files that can't be correctly parsed any more.

Fronttalk is buggier still, but the bugs are all in the user interface.  Some
of the search things don't work right, I think.  It doesn't mung up the item
files, because that part is done by Backtalk, which is pretty solid at that
level.  I do about 75% of my conferencing on Grex with the Fronttalk running
on Grex.  (Type "ft" at the shell prompt to run it - start up is slow but
after that it's fine.  Type "help differences" to see how it differs from
Picospan.  Mostly in good ways.)

But I don't think it will be necessary to replace Picospan.

Mail, not picospan, is the biggest single blocker.  It would be good to have
Marcus involved in that, at least in a spec-writing mode.

Mostly there are a lot of smallish tasks that need doing.  Some people need
to put in some serious time.  I don't have it right now.  I'd be delighted
if someone else did.


#473 of 547 by asddsa on Mon Oct 20 03:06:09 2003:

We should officially change the name to "NextGreX".  It's a lot more
symmetrical.


#474 of 547 by cross on Mon Oct 20 03:27:09 2003:

Several points.  I'm starting to get really leery of major software
components that only one or two staffers can fix, install, whatever.
This is certainly the case with PicoSpan, and for legal reasons; it's
not even a question of technical knowledge (which could presumably
be transferred)!  Surely that's bad.  What's more, it's not at all
clear to me that we have a legal license for it.

But let me ask this: why is PicoSpan still `closed source'?  I don't
think there are too many people running it.  Would it be possible to
ask the holders of the intellectual property to release it under an
open source style license?  Or under a dual-license so that non-profits
can use it for free?  That would eliminate the problem once and
for all.

But even then, I'm not sure that's going in the right direction.
Something like frontalk, which works across machines, is in my
opinion where we should be directing our efforts.

Mail isn't a big deal.  Give me a day and I'll have it set up.  But
the time for elaborate spec writing and endless back and forth has
passed: we've used up our time trying to create a system that
satisfies everyone, and in the end, we've satisfied no one.  Let's
just start from a few basic guiding principles, a reasonable design,
and go from there.  If some aspect or other of the system isn't to
someone's liking, too bad; at least we'll have a place to start
addressing that person's concerns from.


#475 of 547 by aruba on Mon Oct 20 13:27:42 2003:

I agree wholeheartedly with the last paragraph.  I have never understood why
Grex needs to have a mail system that's much more hacked than anyone else's.
Why can't we use a (mostly) off-the-shelf solution for mail?  (I'd be happy
to be enlightened.)


#476 of 547 by remmers on Mon Oct 20 16:02:03 2003:

I agree with Jan that Picospan probably isn't the holdup here,
especially since an OpenBSD version already exists.  As one can
see by reading Dan's summary, quite a bit of work on NextGrex has 
already been done and some critical issues have been resolved; it's
mostly been reported in the Garage conference, not Coop.  I'm glad
to see that Dan doesn't think mail will be a big issue, because
that had been worrying me.

Even if NextGrex comes up initally running Picospan for conferencing,
which it probably will, I do believe that we should think in terms
of moving away from it in the long run and towards something that
is open source, non-proprietary, and is a bit more modern in its
underlying architecture (that the user doesn't see).  This rules out
YAPP, of course, which on balance I'd have to say that I don't like
as well as Picospan anyway (I've used both extensively).  The most
promising effort in this direction that I've seen is Jan's FrontTalk,
which I hope he (or somebody) puts some more effort into getting
ready for primetime.  Fronttalk is essentially a Picospan-like
front end to Backtalk, so users would continue to have a familiar
interface, but the technical hassles involved in getting text-based
conferencing to play well with web-based conferencing that one has
with both Picospan and YAPP would basically disappear.  Because of
its client-server architecture, FrontTalk can also handle distributed
conferencing, i.e. can access conferences on more than one machine.
FrontTalk is slower than Picospan, but I think that once we're on
the new machine, speed differences won't be particularly noticeable.


#477 of 547 by jp2 on Mon Oct 20 18:02:56 2003:

This response has been erased.



#478 of 547 by jep on Mon Oct 20 19:00:47 2003:

I'd like to think we aren't waiting for new software to be developed 
for NextGrex, because that would appear to me to be an indication of 
major further delays.  Why were we in such a hurry to buy the hardware 
if we're so far away from using it?

My expectation, when I saw we were buying new hardware, is that "we" 
(the staff)  knew how it was going to be used, or thought they would 
know soon and had reason to believe they could begin using it.  If we 
know we're using Picospan and that it's going to be ready in small 
amounts of time, that's terrific; it doesn't need to be worried about.  

Resp:474 implies the holdup is designing a mail system.  Folks, this is 
not the right time to be designing a new mail system!  The right time 
was before the hardware was bought.  Maybe Next^2Grex can have the 
perfect mail system, after a few years of development.  I'd like to 
respectfully request you quit screwing around with garbage like that, 
at least quit allowing it to hold up the new Grex machine, and install 
something available now.


#479 of 547 by cross on Mon Oct 20 19:07:12 2003:

Regarding #477; If you can get it done in the next three months, we'll
consider it.  We'll consider anything.  However, the priority now should
be doing whatever will take the least amount of time.

Regarding #478; That's what I said.  I plan on just building a mail system
and moving on.  If it's imperfect, we can fix it later.  But right now,
the goal *has* to be getting something reasonable up in short order.

The only thing it would be silly not to do is wait for OpenBSD 3.4 to
come out.  That's less than two weeks ago, though, so won't be a huge
deal.  On that, we're hamstrung by a megalomaniac in Canada, though.


#480 of 547 by jep on Mon Oct 20 21:48:55 2003:

re resp:479: Dan, it's my impression you need buy-infrom the rest of 
the staff for the new mail system and that you don't have it.  I think 
I was speaking to the staff as a whole rather than to you, indicating 
that, in my opinion, we have to go ahead.

If my impressions are wrong, you have enough freedom to go ahead with 
implementing a standard mail system, have committed to doing so, and 
that that major roadblock (as described earlier in this item) is not 
going to be a hold-up... that's terrific.  Congratulations to you and 
to all of us.

But it still leaves open the question, which I raised on Friday... 
what's the next hold-up?  What's going to be done about it?

Finally, in the end... when do we, the non-staff users, get to be on 
the new hardware?  However pointedly, brusquely (or other euphemisms 
for rudely) I've asked it, I think it's a valid question to be asking, 
and I don't feel like I have received an answer yet.


#481 of 547 by cross on Mon Oct 20 22:32:29 2003:

You're right, I do need buy-in from the rest of staff.  But what I
think we need buy-in for is to just *do* things without a huge amount
of debate and public discussion.  Something needs to be done, let's just
do it instead of trying to appease everyone.  That can come later.

Like I said, I think three months is a reasonable timeline.  But again,
that's dependent on staff buying into it, and on us agreeing to just do
the work.


#482 of 547 by gull on Tue Oct 21 13:27:59 2003:

For those who are worried the new hardware will be obsolete before we
get onto it:

I don't see this as a concern.  We're not building a system to play the
latest shoot 'em up game here.  It doesn't matter if our hardware is
obsolete as long as it's fast enough to do what we need it to.  Grex has
been running on obsolete hardware for as long as I've been using it.

There are reasons to be unhappy about the delays, but fear of
obsolescence is not one of them.

It's important that we get this system up and running soon.  But it's
also vitally important that it be set up right, and documented
thoroughly.  Thorough documentation is our only way out of the "we need
to wait for person X to fix it, because only they know how that works"
syndrome that curses us right now.  That's worth spending extra time on,
and I'm willing to wait for it.


#483 of 547 by cross on Tue Oct 21 14:54:58 2003:

There's a difference between doing a thorough, but *speedy* job, and
continuing to do what we have been, which is nothing, while we make
noises about needing more time to do it right.  Remember, it doesn't
matter whether we do it right or wrong if we don't do it at all!


#484 of 547 by jep on Tue Oct 21 16:25:00 2003:

We bought brand new equipment with the idea of running Grex on hardware 
that *wasn't* several years old.  That brand new equipment will have 
passed it's warranty period before NextGrex is anywhere close to being 
up and running.


#485 of 547 by other on Tue Oct 21 16:32:36 2003:

The idea was to make a significant infrastructure improvement, and of the 
ways to do that, the best investment was in new hardware.  

The relative age of the Grex hardware versus the current standard has *
never* been an issue of concern (except as it relates to the ability of 
Grex to provide even the basic services for which it exists), and very 
likely never will be.


#486 of 547 by cross on Tue Oct 21 17:14:01 2003:

Now, now, let's not mince our position.  We *did* buy top of the line
new hardware, with the intention of moving to the newer machine sooner
than later.  I will agree it wasn't our primary goal to have a super-
computer to run grex on, but that doesn't excuse us being this slow.
We really do have to do better.

I'm going to suggest that, if staff agrees to move to nextgrex within
three months, we end this thread of discussion.  Pointing fingers and
complaining about what we didn't do isn't going to help us achieve the
goal of moving to the new machine sooner than later.  Let's not lose
sight of what's important here: getting nextgrex running.

That said, I think we have a limited amount of time available until
OpenBSD 3.4 comes out.  Let's make profitable use of that time by trying
to hash out some of the remaining technical issues before we have to
start slinging away with configuring and documenting the new machine.

Let's just get this thing done and move on to other issues.


#487 of 547 by aruba on Tue Oct 21 23:26:05 2003:

John - ifit's any consolation, NextGrex has been up and running since not
long after we got the hardware.  So most waranteeable problems would have
showed up by now.


#488 of 547 by jep on Fri Oct 24 03:55:59 2003:

Let me clarify my position a little bit.

It's my intention to ask questions and to get the answers.  I just 
wanted to find out why NextGrex isn't being used yet, and when it will 
be used.  I still don't know either of those things.  Dan Cross is 
certainly stepping forward and setting expectations, but I don't know 
if the staff as a whole is going to accept and meet those expectations.

It's not my intention to bash anyone.  I don't think the NextGrex 
implementation needs to be about personal feelings.  If person A isn't 
going to get the job done -- for *whatever* reasons; I haven't asked 
for the reasons and no one has volunteered them -- then if there is a 
person B, that person should allowed to give it a shot.

There are "person B" people available, by the way.  Grex is lucky that 
it doesn't have to be dependent on one individual for it's survival 
and future direction.

re resp:485: I apologize for any of my comments which may seem to 
stretch a point or two.  Grex could have bought a 3.0 GHz machine in 
February, instead of a 2.2 GHz machine, so in that sense it's new 
hardware is not "state of the art".  And I don't care if it's state of 
the art in that sense; that wasn't my point.

It *was* new then, and is not new now.  If we had waited a year to 
buy, we'd be better off.  That's what probably should and would have 
happened, had we known there was no commitment to put Grex on the new 
hardware last winter and spring.  Having the NextGrex hardware sitting 
unused is hurting Grex.  *That* was my point.

re resp:487: Yeah, the warranty remark was a stretch.  A hardware 
warranty isn't of that much relevance.


#489 of 547 by gelinas on Fri Oct 24 04:03:39 2003:

I don't think the current state is "hurting" grex, but I do agree it's not
really helping.  And waiting a year to buy the hardware wouldn't have helped,
either.

We are setting up a checklist of things to be done.  The next step is to
install OpenBSD 3.4, which janc agreed to do.


#490 of 547 by gull on Fri Oct 24 13:22:34 2003:

Re #488:
> It *was* new then, and is not new now.  If we had waited a year to 
> buy, we'd be better off.  That's what probably should and would have 
> happened, had we known there was no commitment to put Grex on the new 
> hardware last winter and spring.

Honestly, I think this is a bit of a catch-22.  You're not going to get
people to commit time and effort until they know there's a will to spend
money on the hardware.



#491 of 547 by cmcgee on Fri Oct 24 21:40:26 2003:

I disagree that we should have waited to buy.  As I understand the process,
you have to make sure the software runs correctly on a specific hardware
configuration, not a hypothetical one.  

While we might have moved faster if we weren't a volunteer organization, or
if more people had more time, I just don't see that we should have waited on
the software.


#492 of 547 by aruba on Sat Oct 25 01:10:14 2003:

Well we tried waiting, for over a year, with the idea that we would install
the OS on two different types of hardware and then pick one.  Nothing
happened.  Last winter the consensus was that we should commit to Intel
hardware and buy what we need, because then staff would be more likely to do
something.  So we bought it.  We made some progress at first, but now we're
stalled.  I'm glad cross is trying to jumpstart the process.


#493 of 547 by asddsa on Sat Oct 25 05:46:38 2003:

Oh boy...time drags on...

re 489 There are dozens of reasons that show the current state is hurting
GreX.


#494 of 547 by cmcgee on Sat Oct 25 12:50:16 2003:

Errr, last line 491 software=hardware.


#495 of 547 by jep on Sat Oct 25 23:26:23 2003:

I agree with moving ahead if we were ready.  I'm piqued that we spent 
the money then stopped.

I'll raise another point, too.  For years, there have been two people 
who have been most adamantly against switching from Sun to Intel 
hardware; Marcus and STeve.  These are the two people who were most 
needed to do anything new with Grex.  I am on the outside; I'm not 
involved with the process, but... these are the two people who have 
stalled the move.  Right?


#496 of 547 by glenda on Sat Oct 25 23:37:10 2003:

They have not stalled it on purpose.  Both are in the midst of other time
crunches at work.  Between volunteer work and salaried work, salaried work
wins out.  I, personally, know that if STeve spends much more time away from
home and family, there will be no home or family.  I have seen very little
of him since about 2 weeks before the big power outage because he is spending
all of his awake time fighting various incarnations of Microsoft yuckiness.

That, coupled with my insane class/work schedule means we get no 'us' time
as it is.  He is not dragging his feet on the NextGrex.  There is only so much
time available and right now it is onverfull.  Believe me, he feels badly that
he doesn't have more time.  But, I have put my foot down.  I don't want him
to have another stroke because he is pushing the envelope too far.  As much
as I love Grex, it just isn't worth it.


#497 of 547 by gelinas on Sun Oct 26 00:26:18 2003:

Neither STeve nor Marcus, in my opinion, have been the roadblock; we
just haven't made as much progress as we would like.  There are things we
could have used Marcus for, but we've been making some progress despite
his not being available.


#498 of 547 by jep on Sun Oct 26 01:05:32 2003:

Glenda, I don't want to slight either STeve or Marcus.  I definitely 
don't wish either of them bad health.  They have done a lot of things 
for Grex that I appreciate.  I understand about having a life, as well.

However, if there's something that needs doing, that's waiting on you, 
that you can't do, there is nothing wrong with saying, "Someone else 
do this.  I can't."  It occurred to me that that could possibly 
resemble the situation here.


#499 of 547 by cross on Sun Oct 26 01:49:33 2003:

There's no one person who has stalled things.  Marcus had some things he
wanted to do that we've been waiting on that I don't think are realistic
at this point: moving to Kerberos being the primary one.

I'm interested in what happened at the staff meeting.  Obviously, I
can't attend them since I'm so far away, but I'm hoping some concensus
was reached on how to proceed.  Joe mentioned in party the other night
that Jan is going to install OpenBSD 3.4 on nextgrex; hopefully we can
proceed from there.


#500 of 547 by bhoward on Sun Oct 26 02:51:39 2003:

Other structural issues such as those raised by Mary (in #463) and
others need to be discussed but that really ought to be
separated from discussion on the tactics of getting nextgrex into 
production as soon as possible.

Seems to me you have X amount of work waiting to be done, Y number of
staff trusted and capable of doing this kind of work and only a subset
available to do it.

We want to promote nextgrex into production in as little time T as
possible.  How much time that is can (and has) be(en) argued but it
remains that to reduce the time you can:

1) Reduce the amount of work required for go live

Possible.  Depends on what is critical-path.  Throwing in
non-critical-path distractions like replacing Picospan as was suggested by
someone earlier does little to reduce T.  Shortcuts may be an option
in some areas, but must be weighed against security/integrity
risks and a call made whether the deferred cost in staff time (remember
short cuts need to be cleaned up) is worth it.  Personally, I'm willing
to give up some time to insure our new system is well documented, stable
and secure.

2) Increase the number of people working on the upgrade

This is tougher.  You can't just expand the pool of trusted technical
staff overnight.  Trust is something earned over time and a track record
of delivering.

Basically to reduce T here we either discover underutilized people
resources within the existing group of trusted or semitrusted staff, or
possibly implement some system of work delegation non-staff volunteers
that applies limited staff time to reviewing work done by them.  However,
there may practical limitations on the value of mixing in more people
or farming out the work.

Moving forward, we should take a hard look how to redistribute technical
responsibilities to reduce the pressure on individual staff and avoid
single points of failure but our options for the immediate term are
limited by the above.  From what I gather, the main tasks remaining are:

* Recompile Picospan for OpenBSD
* Setup newuser
* Configuration tuning, whatever smallish tasks janc refers to in #472.
* Mail
* Transfer the data

Mail seems to be the only significant one.  Perhaps we can sort out an
agreement from Jan and Marcus that Dan move ahead on what he would like
to do with the understanding he document as he builds it out and they
make some time to review.

For the record, if there are tasks that can be delegated to non-staff,
please add my name to the roster of volunteers.

My apologies for letting this get so long.  Price of having lurked for
so long, I guess :-)


#501 of 547 by gelinas on Sun Oct 26 03:02:43 2003:

(I thought I had heard that Marcus has already compiled Picospan for OpenBSD.
He'll probably have to do it again, before the migration, but it doesn't seem
to be a show-stopper.)


#502 of 547 by gull on Sun Oct 26 03:21:01 2003:

My understanding is that OpenBSD is totally changing their binary format and
breaking backwards compatibility, so it will have to be recompiled. 
However, if he's already compiled it once, there's probably nothing
time-consuming to do.


#503 of 547 by cross on Sun Oct 26 03:54:58 2003:

Regarding #500; Well, I think there are some things we can seriously
cut down on to save time.  For instance, we don't need to build out
a complex authentication system using Kerberos right now.  The reason
for trying to use something other than PicoSpan is to cut down on time
(Marcus isn't around a lot these days), and partly because using software
we don't have the source for is going to cause problems as time goes on.
For instance, it'd be swell if someone could just compile picospan on
nextgrex and have done with it.  Unfortunately, it's just not that
easy.

As far as mail goes, I'd love to put mail on another machine, and just
use lmtp to deliver locally.  Unfortunately, I don't know that's going
to be practical.


#504 of 547 by gelinas on Sun Oct 26 04:11:55 2003:

What's the big deal with Picospan?  From what I've heard, Marcus has control
of the source.  He can put it where he wants.  Switching now doesn't gain us
anything.


#505 of 547 by cross on Sun Oct 26 04:57:05 2003:

Maybe my understanding is flawed.  It's that we don't really have a legal
license for it, Marcus sort of has a copy of the source as an artifact, and
the legalities of it all are really rather fuzzy, and he doesn't have much
control over the source at all.  If the source could be opened up, all of
my objections would evaoprate, but it really seems rather more complex than
that.


#506 of 547 by remmers on Sun Oct 26 15:02:41 2003:

Re #495 - Intel vs. Sun.  Marcus and STeve are in favor of moving from
Sun to Intel hardware.  In fact, at the meeting where we decided on what
type of hardware to purchase, STeve made that recommendation, speaking
for himself and Marcus.  Subsequent to that, STeve was actively involved
in the hardware acquisition process.

If I had to speculate on the reasons for slow progress - the fact is
that several key staff members had unexpected situations come up in
their lives that limited the time they had available to work on
NextGrex.  I think everyone on staff is committed to getting the
work done and is making an honest effort to do so within the
time constraints they have to deal with.


#507 of 547 by cmcgee on Sun Oct 26 15:24:36 2003:

People who have been with Grex a long time may also be unconsciously comparing
"how long it took to do X" when staff was much younger, had fewer family
members living with them, and had jobs that were less consuming.


#508 of 547 by cross on Sun Oct 26 19:08:12 2003:

Regarding #506; I think the thing that's upsetting some folks, and
perhaps rightfully so, is this idea that there are some staff members
who are `key' at all.  In reality, that's the way it's always going
to be, but that doesn't mean the rest of us should be paralized by
it.


#509 of 547 by remmers on Sun Oct 26 19:26:03 2003:

Right.  But I don't think that's what's been holding things up.


#510 of 547 by cross on Sun Oct 26 21:08:30 2003:

I concur.  I just wanted to attempt to clarify.


#511 of 547 by jep on Sun Oct 26 22:00:25 2003:

A lot of things happen behind the scenes, where most of us don't know 
anything about them.  When it looks like nothing is happening, that 
can be frustrating.  We all had a lot of expectations, I think, and 
everyone cheerfully donated all the money that was asked for... then 
nothing visible happened for months.  That's fine, but I don't feel 
guilty for asking some questions now.

So, there was a staff meeting on Saturday... was there any buy-in to 
Dan's idea of moving ahead?


#512 of 547 by bhoward on Tue Oct 28 04:07:45 2003:

Keep an eye on your mailbox, folks.  My OpenBSD 3.4 CD (and t-shirt)
just arrived 30 minutes ago.  Grex's ought to be arriving any day 
now.


#513 of 547 by gelinas on Tue Oct 28 04:39:03 2003:

John, the staff meeting was on Wednesday, October 22.  The staff report in
the minutes of tonight's BoD meeting pretty much sums up the discussion.

The Next Step is installing OpenBSD 3.4.


#514 of 547 by jep on Sun Nov 9 20:09:17 2003:

The question was asked in general, has OpenBSD 3.4 been installed?


#515 of 547 by gelinas on Mon Nov 10 00:40:22 2003:

Not last I checked.


#516 of 547 by bhoward on Mon Nov 10 01:20:04 2003:

I understand from the earlier discussion that Jan and the other staff
configuring grex have been documenting the configuration in detail.

For folks like myself curious about the technical nitty-gritty, is any
of that documention publicly available yet?


#517 of 547 by janc on Fri Dec 19 02:53:32 2003:

I haven't read all of what's above.

I should have a lighter work-load over the holidays, but the kids won't be
in school as much, so I might not have all that much time to work on Grex
either.  Still, I expect to be able to do some work.

Last night I started work on upgrading to OpenBSD 3.4.  It's up and working,
and I am about half way through the business of following the instructions
to redo the installs and configuration changes that had already been
documented.  As I've been working, I've also been updating and clarifying
the install documents.

Mostly the install documents have worked fine.  It's not just documentation.
A lot of it is custom scripts.  So setting up the /suidbin partition, moving
appropriate suid files to it and replacing the old copies with symbolic links
took about 7 minutes.  Full install and setup of party took four commands and
four minutes (most of the time to ftp the source over).  Configuring Apache
and the external authenticator took about 4 minutes too.  There are still
some glitches - my scripts to install Orville-Write seem to have failed. 
However, the goal is to be able to build a new Grex in fairly short order,
and we've made good progress toward that.

I don't have a good way to make these documents public right now.  It's
nothing amazingly interesting.

One bit of good news - I've done lots of reboots as I installed stuff, rebuilt
kernels, and such.  So far the ethernet interface has initialized correctly
every time.  I don't know if the ethernet driver got fixed in the 3.4 release,
or if my new router just plays better with OpenBSD, but it looks like this
issue is solved.

Right now I'm just playing catch-up to get the system back to where it was
before we upgraded to 3.4.  I hope to get a substantial amount of forward
progress done over the holidays.  I hope other staff members will too.


#518 of 547 by cross on Fri Dec 19 04:02:14 2003:

Great!  Okay, how about relocating the machine to the pumpkin?


#519 of 547 by janc on Sat Dec 20 01:39:51 2003:

For the next few weeks, I'll likely have some time to work on the thing.
I don't know what advantage moving it to the pumpkin would have, at least
during that time period.  However, if there is any strength of opinion
favoring that, I'd actually love to have it off my desk.  It's fans are
loud and it takes up scarce desk space.


#520 of 547 by cross on Sat Dec 20 03:42:41 2003:

If it's coming up on the network reliably now, the advantage is that
(a) we an test out network services other than those that you poke holes
in your firewall for, and (b) it's closer to oldgrex, and (c) it's already
in place for when grex shifts to it.


#521 of 547 by gelinas on Sat Dec 20 04:08:57 2003:

All good reasons, but I'd like to see it a bit closer to being ready for use
before moving it.  I'd like to see it move early in January, earlier if
possible.


#522 of 547 by mary on Sat Dec 20 13:15:33 2003:

Thanks, Jan.


#523 of 547 by janc on Sun Dec 21 17:43:43 2003:

Actually being able to make it accessible via http and smtp and things like
that may be useful for testing.  Well, for other people.  I can access those
services just fine :).

I'll move it as soon as any staff member says they'd find it easier to do
their work if it was moved, or at the end of the first week of January, at
which point I'm booting it out my house no matter what state it is in.


#524 of 547 by cross on Sun Dec 21 19:58:00 2003:

I think it'd be a lot easier to set up a decent mail configuration if it were
moved earlier.


#525 of 547 by janc on Sun Dec 21 20:20:13 2003:

OK.


#526 of 547 by janc on Sun Dec 21 20:22:30 2003:

However, before we move it out from behind my firewall, we need to check
that this isn't going to be a security problem.  Are there any services
we need to turn off?


#527 of 547 by jp2 on Sun Dec 21 20:48:04 2003:

This response has been erased.



#528 of 547 by remmers on Sun Dec 21 20:51:50 2003:

I would find http useful.


#529 of 547 by bhoward on Mon Dec 22 01:21:06 2003:

As a general principal, I agree with #527. 

Taking a quick look at what's currently running on nextgrex, I would turn off
tcp and udp ports:
        daytime (13)
        time (37)
        auth (113)

I don't see any particular need for any of these to be running.

Leaving ssh, www, 8080, https, smtp open should be fine with the caviat that
we may want to populate /var/www/htdocs with something closer to the real
grex html files before opening it generally.

I would turn off "submission" (587) in the sendmail cf files beneath /etc/mail
as we don't currently offer that on old grex.

finger (79) is currently off but presumably you will want to turn that on later
at somepoint since we do offer that on old grex.


#530 of 547 by gelinas on Mon Dec 22 03:41:06 2003:

auth/ident should be left open, I think.  It's one we've traditionally left
open.


#531 of 547 by janc on Mon Dec 22 04:52:33 2003:

http and https should be OK to leave open.  I've already configured those
(https with a self issued certificate).  /var/www/htdocs is no longer ht
document root.  The document root is /usr/local/www as on the traditional
Grex, and it currently contains only a place-holder index.html and some
backtalk images.  I should probably delete /var/www/htdocs, or symlink it
to /usr/local/www.

I'm not exactly sure how to schedule the move.  I'd pretty much have to do
it at night.  Wouldn't hurt to have someone else around to help.

Anyone know what IP addresses are free in the pumpkin?  I suppose it would
be save to use the old grease IP address.


#532 of 547 by carson on Mon Dec 22 14:21:44 2003:

(Jan, if you just need physical help in moving, I can be available.)


#533 of 547 by janc on Mon Dec 22 14:49:52 2003:

Don't think I really need physical help.  It's not a heavy computer.  I don't
suppose I really need help at all.  Figuring out how to get it onto the
network, getting it configured, moving junk around to make space for it,
someone to hold door while someone else carries it...it'd be pleasanter with
two people, but it'll work with one, and the difficulty of scheduling time
in advance means one is probably the best choice.  I guess I'll tentatively
aim at moving it this evening, sometime after the kids are in bed.


#534 of 547 by gull on Mon Dec 22 16:05:41 2003:

The only security problem with ident that I'm currently aware of is it 
can be used to determine what username servers are running under.  It's 
probably worth running it on Grex because it lets other sites inform us 
of which of our many users is causing them trouble, in the event of 
abuse.


#535 of 547 by janc on Tue Dec 23 01:16:55 2003:

I'm aborting the plan to move Next Grex to the pumpkin tonight.

I just released that I haven't got a monitor for it.  Right now it's on the
secondary inputs of my dual input monitor.  The only monitor we have free in
the pumpkin not a VGA monitor.  (Monochrome CGA, I think.)  To set it up in
the pumpkin I'd need to borrow the monitor and keyboard from gryps.  I could
do that, but it's not a very satisfactory solution.  I think we should let
the move wait till we have a monitor and keyboard.  The reasons to do it
now are not all that strong and a monitor should not be all that hard to find.
People all over town are paying money to get rid of them.

I think I have a spare keyboard someplace.  I'd have to dig around a bit.

I think Dan Gryniewicz (dang, if you must have a last name like that, I wish
you'd get a unique first name so I wouldn't have to type the last one all the
time) had offered the donation of a monitor.  I don't know if he's even in
town right now.


#536 of 547 by janc on Tue Dec 23 02:04:20 2003:

Steve Weiss says he has a monitor.  Maybe I'll grab that and make the move
tommorrow night.


#537 of 547 by davel on Tue Dec 23 02:28:48 2003:

We bought all the hardware & didn't get monitor & keyboard?


#538 of 547 by gelinas on Tue Dec 23 02:44:07 2003:

Right.  See reponses 43 and 48 of this item:

        As for random hardware peices, I'm willing to use things like
        donated monitors and keyboards, because if they fail the system
        won't be affected (STeve Andre, #43).

and

        You'll notice that I omitted the keyboard and monitor from the
        list of things to buy.  This is because of either fail, they
        won't affect the running of Grex immediately.  We can boot the
        system up without a monitor (indeed, nost of my OpenBSD machine
        have only had a monitor on them during the initial install and
        upgrades), and we have spare keyboards in the pumpkin already.
        So those are of a nature where a failure means we have to drag
        something over to Grex and use that instead (STeve Andre, #48).


#539 of 547 by kaplan on Tue Dec 23 03:55:44 2003:

Jan,

I can lend a 15" monitor to the cause.  It's easy for me to swing by
your house on my way to or from work (before 9:00 or after 6:00)
tomorrow.  Call me to arrange a time (unless you want to hold out for a
bigger screen).


#540 of 547 by jep on Tue Dec 23 04:01:01 2003:

It's exciting to see that things are starting to move again.  Thanks 
Jan and the rest of the staff!


#541 of 547 by janc on Tue Dec 23 14:01:01 2003:

A small screen is fine.  I'll call Jeff.


#542 of 547 by gull on Tue Dec 23 14:02:46 2003:

I've got some monitors here at work that the company is going to have to
pay to get rid of anyway.  Grex would be welcome to have one.  They're
14".  The tubes are worn enough that the display contrast is a bit low
for Windows, but they'd work fine for text consoles.  Just thought I'd
offer one in case you don't come up with something better.


#543 of 547 by keesan on Tue Dec 23 16:26:52 2003:

We have 12 or 13" mono VGA monitors unless Jim threw them out. Do you need
color?  These are easier to move around.


#544 of 547 by janc on Wed Dec 24 01:35:00 2003:

Jeff dropped off his 15" today.  I'll use that.  Thanks to all other offers.
I can temporarily loan a keybaord.  My plan is to move it tonight.


#545 of 547 by janc on Wed Dec 24 03:27:18 2003:

OK, it's in the pumpkin.

We have a UPS issue though.  The UPS in the pumpkin is at capacity, so I
couldn't plug it into that (didn't try, actually).  So it's on wall power,
which is reputed to be none to clean.  I don't really know what to do about
this, so I'll just hope that someone else solves the problem.


#546 of 547 by gull on Fri Dec 26 14:30:36 2003:

I would recommend Grex buy an inexpensive UPS if you're going to be
running the system for any length of time.  It would cost under $200 and
could be used as a spare later if we ever have to take our main UPS down
for servicing again.

APC makes okay small UPS's.  Stay away from Cyberpower.


#547 of 547 by jesuit on Wed May 17 02:14:19 2006:

TROGG IS DAVID BLAINE


There are no more items selected.

You have several choices: