You are not logged in. Login Now
 0-24   25-49   50-74   75-99   100-124   110-134   135-159   160-184   185-209 
 210-234   235-259   260-284   285-286       
 
Author Message
25 new of 286 responses total.
dpc
response 135 of 286: Mark Unseen   Aug 27 18:29 UTC 2004

I'd also like to know what happened.  This "disk disaster" ranks with
the accidental destruction of the password file (and its hand-rebuilding
by Marcus) several years ago.
janc
response 136 of 286: Mark Unseen   Aug 27 18:37 UTC 2004

Here's what I know:

  Grex's disk drive zero died.  This disk is used to boot the system, and
  contains most of the operating system (root, /usr, /usr/local) and the
  bbs data (/bbs).  It was still partially working, but not enough to
  boot the system.

  Our last tape backup was many months old.  I forgot how many, but way too
  many.

  STeve and Kip did nearly all the work restoring things.  I was on vacation
  when this started and didn't get involved till late and then only in
  limited ways.

  STeve and Kip's initial problem was that you can't build a new disk on
  a machine that you can't boot.  So they're plan was to take Grex's tape
  drive to one of the machines at Kip's workplace, and build a new boot
  disk for Grex from the backup tape there.  However, they didn't have
  the latest backup tape, which (appropriately) was not stored on site.
  So they got the tape drive hooked up, and discovered that they couldn't
  read the much older tape backups that they had brought from the pumpkin.

  Later, Joe passed the latest backup tape to STeve.

  I came home from vacation, and spent a little time in the pumpkin.  Since
  I did much of the original building of the current Grex system, I
  remembered that we had a CD-drive for Grex (standard ones won't work
  with SunOS) and a 4.1.4 distribution CD which can be booted.  I hooked
  this up and figured out how to boot from the CDrom and documented this.
  Booting from the CDrom gives you an extremely limited set of tools. It
  looked to me to be too limited to actually do anything useful, plus I
  had neither the tape drive, nor the backup tape, so I left things at that.

  Kip and STeve again got together, with tape drive and backup tape.  They
  actually managed to figure out how to do a restore when booted from the
  CDrom.  (I'd still like to know how they did that.)  However, they
  discovered that most of the spare disk drives in the pumpkin were
  unusuable.  Some are differential drives.  Some are too small.  The
  only viable candidates were four 4Gig Conner drives.  They tried two
  of these and found that both were defective.  (I had tried using one
  of these years ago and found it didn't work, but they didn't know that).

  When I saw their emailed report the next day, I went and searched my
  house for some other disk drives.  When we had started putting together
  the new Grex, but didn't have drives yet, I had borrowed some drives from
  the pumpkin that I thought I could use temporarily.  These were 4Gig
  Seagate drives, which had previously been used on a development system
  that we ran for a while called "grease".  I never ended up using them on
  the NextGrex project, and couldn't really remember what I'd done with them.
  I found them in my garage and returned them to the pumpkin.

  STeve and Kip did another late night session, restoring the backups of
  root, /usr, and /usr/local onto one of the Seagate disks.  Years ago,
  STeve had set up a cron process that backed up the /etc/passwd file and
  related files to various other disks periodically, so they had a current
  copy of that.  However, he had not backed up /etc/group or the system
  mail aliases, so new versions of those have to be built.  There is
  probably more work to do updating things that had changed since the last
  backup, but not that much has.

  I believe the got /bbs by reading off the dead drive.  Mostly the /bbs
  partition of the old drive was still readable, but there seems to have been
  some damage.  Items 19 through 58 in oldagora (agora49) where trashed.
  I think someone said they had an offsite backup of this though, so we may
  be able to restore those.

  Overall, we took way too long to get this job done.  We repeatedly allowed
  ourselves to fall into resource deadlocks - first STeve and Kip couldn't
  do much because Joe had the backup tape, then I couldn't do much because
  STeve and Kip had the backup tape and tape drive respectively, then STeve
  and Kip couldn't do much because I had the only good disk drives.  Each
  time STeve and Kip's work got blocked, it took them some time to be able
  to get together again - they both have very busy schedules.

  I think that part of the problem is that knowledge of these old Sun systems
  is thinly scattered.  STeve and Kip are experienced system administrators,
  but I doubt either has done much SunOS work for years.  I'm not a system
  administrator at all, and my knowledge about this stuff is very spotty, but
  I did do a lot of the "recent" work on Grex, so I know more about what
  CD drives and disk drives and bootable CD's we have.  I don't think any
  one of us knows enough to be able to readily do this kind of job on our
  own.  This means that we have to work together on jobs like this, and
  that slows things down, as it is hard to coordinate among us.  Hopefully
  this will be less of an issue with the new machine and operating system
  which are better known to more people.
albaugh
response 137 of 286: Mark Unseen   Aug 27 18:51 UTC 2004

Thanks very much to everyone who helped with the restoration, in any way!

It sounds like this is the first time this has happened to grex.  It probably
shouldn't happen again.  It shouldn't happen any more frequently on Next Grex,
hopefully much less.  And if it did, Next Grex should be easier to recover
from.  Is this a correct assessment?
mary
response 138 of 286: Mark Unseen   Aug 27 22:10 UTC 2004

A huge thank you to Kip, STeve and Jan.  While the rest of us
were missing Grex you folks were spending hours of your time 
thrashing though problem after problem.  You are our heroes.
jor
response 139 of 286: Mark Unseen   Aug 28 00:01 UTC 2004

        no no, we just criticise, no gratitude.

        next we'll dock their pay.
kip
response 140 of 286: Mark Unseen   Aug 28 01:06 UTC 2004

Thanks Jan, that was a very good summary of the events.

As for the "trick" to restoring from tape while booting from the CD,
actually after booting from the CD, you have an option to install a miniroot
system (like a modern Linux rescue disk) to one of the swap partitiions on
the system and then boot from that where you can then create mount points and
start to work with the filesystems and eventually restore from the tape. 

A rather nice feature, wouldn't you say?
janc
response 141 of 286: Mark Unseen   Aug 28 03:15 UTC 2004

I created a mini root, but I didn't see a mount command or a restore
command.  I thought that was rather pathetic.  Probably I was hallucinating.
That would be just too dumb to be real.
charcat
response 142 of 286: Mark Unseen   Aug 28 03:18 UTC 2004

Thanks to Kip, Steve, Jan and all others who resurrected Grex! (charcat does
the snoopy happydance!)
keesan
response 143 of 286: Mark Unseen   Aug 28 03:27 UTC 2004

Jim asks whether the next grex will have more stuff on a bootable CD.
And whether you can do backups to CD or DVD instead of tape.
gelinas
response 144 of 286: Mark Unseen   Aug 28 03:44 UTC 2004

One advantage of the new machine, which could be put to use on the current
one, is that much of the documentation, and thus the critical files, are
being stored in CVS, on a separate machine.  

Yes, backups can be done to CD instead of tape.  Backups can also be done
to separate disks.  As disk gets cheaper, many folks are finding it makes
more (economic) sense to mirror to disk than to tape or CD.
janc
response 145 of 286: Mark Unseen   Aug 28 16:31 UTC 2004

Next Grex has lots of extra disk space, part of which is currently configured
as a mirror disk.  We don't yet have a CD-R drive for the machine.  We should
probably start a discussion of backup strategies for it.

Currently I've got the disks set up so that we can always have two copies of
the OS installed.  Each time I upgrade the OS, I replace the older copy with
the new one.  Thus it should always be possible to boot Grex into either of
the last two OS versions.  Eventually I want to change over to a procedure
where we can install the next version of the OS on the alternate partitions
while Grex is running.  Theoretically it should be possible to do an OS
upgrade with almost no down time.

Also, as Joe says we are putting everything needed to build a new Grex into
the off-site CVS archive.  My goal is to be able to build and configure a new
Grex, starting from a blank machine with a net connection, in under 24 hours.
So we are checking all the grex-specific code we have into the archive, all
the config files, together with scripts to fetch packages from the net, build,
configure and install them.  Of course, user data and and bbs data will need
to be restored from backup.
albaugh
response 146 of 286: Mark Unseen   Aug 28 19:31 UTC 2004

The one limitation of backing up to disk is that it would still be in close
proximity to the master, so if a disaster struck the pumpkin there would be
no off-site storage to recover from.
gregb
response 147 of 286: Mark Unseen   Aug 28 21:54 UTC 2004

Why is SunOS used as opposed to Linux or BSD?
jor
response 148 of 286: Mark Unseen   Aug 28 23:51 UTC 2004

        runs on a SUN
janc
response 149 of 286: Mark Unseen   Aug 29 03:44 UTC 2004

Next Grex runs on OpenBSD.

Grex opened for business on July 18, 1991.  Linus Torvald released the very
first version of Linux about two months later. ("Hello everybody out there
using minix - I'm doing a (free) operating system (just a hobby, won't be
big and professional like gnu) for 386(486) AT clones.")  Somehow, the
founders didn't seem to think Linux was quite ready for the job at the time.

I'm not exacty sure what the situation was with BSD in 1991, but it wasn't
an option the founders were likely to have spent an awful lot of time
thinking about either.
keesan
response 150 of 286: Mark Unseen   Aug 29 09:13 UTC 2004

So what OS did first grex use?  And what hardware?
remmers
response 151 of 286: Mark Unseen   Aug 29 14:14 UTC 2004

The hardware was a Sun 2, running SunOS (I forget which version).

Jan's right - Linux didn't exist yet, BSD wasn't easily available at low
cost, and we did have access to Sun (which was regarded as the Cadillac
of Unixes at the time).

Times have changed though, and I'm glad we're making the switch to x86
hardware and BSD.
janc
response 152 of 286: Mark Unseen   Aug 29 14:56 UTC 2004

I think the first open source version of BSD, BSD/386 was also released in
1991.  It too would have been horribly inadequate for Grex's needs.  FreeBSD,
NetBSD, and OpenBSD were all years later.

I presume that it was SunOS 4.1.3 on the Sun 2.  The differences between
that and the SunOS 4.1.4 running on this system are entirely unnoticable.
Mostly bug fixes.

At the time, SunOS was clearly the most stable, most capable version of
Unix available in our price range (probably in any price range).  It's
still a remarkably solid piece of software.  For me the main reason to
move off it is that too many of the open source packages that we want to
use (like mysql) no longer compile on SunOS.
dpc
response 153 of 286: Mark Unseen   Aug 30 14:34 UTC 2004

Thanks to Kip, STeve and Jan!
tsty
response 154 of 286: Mark Unseen   Aug 30 17:14 UTC 2004

nice job ... we all appreciate the efforts and results -thank you
mfp
response 155 of 286: Mark Unseen   Aug 31 19:55 UTC 2004

Hi, all!  I was in Ann Arbor!  I ate at the Fleetwood!
happyboy
response 156 of 286: Mark Unseen   Aug 31 20:13 UTC 2004

i'm sorry.

i use to work there.  yuk.
tod
response 157 of 286: Mark Unseen   Aug 31 20:18 UTC 2004

I used to consume hippy hash served by a tracked up Lisa.  The coffee sucked
but they had a torlet so what the hell.
happyboy
response 158 of 286: Mark Unseen   Aug 31 20:29 UTC 2004

i remember some of the grafitti from the torlet:

"The Fleetwood makes me shit PURE WATER SHIT."

 accompanied by a childlike drawing of a screaming
person sitting on a torlet.

the cook use to pork his girlfriend in the storeroom and would
ash his ciggies in the chilipot.

bon appetit!
tod
response 159 of 286: Mark Unseen   Aug 31 20:48 UTC 2004

We put a arbornet sticker in that torlet..wonder if its still there
 0-24   25-49   50-74   75-99   100-124   110-134   135-159   160-184   185-209 
 210-234   235-259   260-284   285-286       
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss