|
Grex > Coop13 > #380: Cyberspace Communications finances for November 2006 | |
|
| Author |
Message |
| 25 new of 124 responses total. |
gelinas
|
|
response 16 of 124:
|
Dec 8 01:41 UTC 2006 |
(Just a note: that "attic" is our host's server space. The last time I
was up there, they were using less than half the available floor space.)
|
aruba
|
|
response 17 of 124:
|
Dec 11 15:49 UTC 2006 |
This response has been erased.
|
aruba
|
|
response 18 of 124:
|
Dec 12 05:47 UTC 2006 |
Dan - I guess I'm not convinced that Grex's users will see much benefit from
that $2000 investment. Grex's hardware had a ouple glitches (which cost a
lot less than $2000 to fix), but it's been pretty stable lately. So
convince me that we'll see $2000 worth of improvement if we spend that much
money.
|
mcnally
|
|
response 19 of 124:
|
Dec 12 06:24 UTC 2006 |
> Grex's hardware had a couple glitches (which cost a lot less than $2000
> to fix)
Actually, we had months of almost daily downtime, and we *still* have
periodic problems with user home directory partitions and (much more
frequently) /var/spool/mail filling up.
|
spooked
|
|
response 20 of 124:
|
Dec 12 06:30 UTC 2006 |
Mike and Dan are accurate in their arguments and comments.
However, buying better hardware will NOT fix the problems because good
system administration is more about active monitoring, tailoring, and
anticipating problems --- none of these 3 are currently sufficiently met
by the Grex staff.
I know this may sound harsh, but it is spot on. There really needs to be
change in Grex staff, its culture in particular, and processes.
|
aruba
|
|
response 21 of 124:
|
Dec 12 13:52 UTC 2006 |
Re #19: I agree that when we were having memory problems, that was ugly, adn
if throwing money at the problem would have fixed it, it would have been a
good thing. But we haven't had that problem for the last year, since STeve
pulled the bad memory chip. So I think it's a moot point.
I suppose we could buy a bigger disk and alleviate the mail spool problem
for a while. But it would just fill up again, right? So I'm not convinced
money can solve that problem.
|
cross
|
|
response 22 of 124:
|
Dec 12 14:30 UTC 2006 |
There have been downtime periods of greater than a week on grex, largely due
to hardware (and more frequently) software failures. How much does that cost
grex in terms of opportunity costs? How much does it cost the staff people
who have to turn around and fix those problems?
Sure, in a direct, apples-to-apples comparison you won't see $2000 of benefit
for a $2000 investment, but that's the wrong metric. Instead, judge it based
on how much money is *saved* from things like reduced staff time commitment,
improved reliability, etc. Would the mailbox partition fill up if staff could
have devoted more time several months ago (when staff *had* time) to tweaking
the mail system rather than figuring out why grex was crashing all the time?
What was the cost to Steve for nursing a sick grex back to health in terms
of time away from his job, his family, etc? Is that worth $2000?
|
keesan
|
|
response 23 of 124:
|
Dec 12 14:46 UTC 2006 |
How long would it take to write some program that deletes any mailbox which
has not been accessed for a month after the account was opened?
|
nharmon
|
|
response 24 of 124:
|
Dec 12 14:57 UTC 2006 |
This response has been erased.
|
nharmon
|
|
response 25 of 124:
|
Dec 12 14:58 UTC 2006 |
About 10 minutes.
|
easlern
|
|
response 26 of 124:
|
Dec 12 15:43 UTC 2006 |
Two cents from the peanut gallery: seems like we can prevent downtime like
what's been recent by policing accounts better. It's hard to see what benefit
there would be in giving anonymous accounts a more powerful system to beat
on the ISP with.
|
aruba
|
|
response 27 of 124:
|
Dec 13 14:34 UTC 2006 |
Re #22: Dan, I'm just not convinced that throwing money at the problem is
going to help at all. I'm not an expert on hardware, but I know that it's
not literally true that we "went cheap" when we bought the current machine.
The total initial cost of the current machine was $2,201, more than you're
proposing to spend.
It seems to me that there is some probability that any piece of hardware
will go bad, in any time interval. Grex pushes hardware pretty hard, so it
doesn't surprise me a whole lot that we lost a disk and a memory chip in the
3.5 years the machine has been running (2 years since it's been online).
|
cross
|
|
response 28 of 124:
|
Dec 13 14:53 UTC 2006 |
Well, consider the disk failure for instance: yes, you're absolutely right
that hardware components tend to fail over time, and there's not much that
can be done to prevent it. These things just wear out after a while. But,
if grex had invested in a hardware RAID solution, then losing a disk
wouldn't have necessarily brought the entire machine down. And repairing
the problem would have been about as easy as taking a spare to the colo and
yanking out the old disk and plugging in the new one. The hardware would
take care of the rest. This isn't magic; hot-swappable hardware RAID
controllers aren't hard to come by. And it would have prevented a week of
downtime. And it wouldn't have required Steve or anyone else to spend hours
and hours at the colo facility. And of course, had we used ECC memory like
was discussed ad naseum before buying the current hardware, the memory chip
just wouldn't have been an issue: it would have told us it was bad (the
memory hardware would have told the operating system that would have logged
a message) and it could have been replaced without a tremendous amount of
downtime (if, indeed, that was the problem at all), or people going back and
forth to the colo facility to run diagnostics, etc. What's more, it
wouldn't have taken down the machine. Is that worth it? You tell me.
As for the cost of the current grex hardware.... Remember that the Sun 4
that it replaced cost somewhere on the order of $100,000 when new. $2,201
is pretty cheap compared to that.
I guess I don't understand why you think that this is just ``throwing money
at the problem.'' Well, I'm not going to try and convince you. If you
don't think it's worth it, then you don't think it's worth it. But, I just
consider it making wise investments.
|
slynne
|
|
response 29 of 124:
|
Dec 13 20:58 UTC 2006 |
I work with hot swappable hard drives on servers and I have to admit
that I really do like them. Our set up has three harddrives and we can
lose one without having *any* downtime. Fixing it is pretty easy too. We
ship a hard drive to the retail location where we have someone who is
almost completely computer illiterate install it. It is pretty cool.
|
drew
|
|
response 30 of 124:
|
Dec 14 05:01 UTC 2006 |
I was told by someone in the IT industry that RAID is only worthwhile
if your downtime costs are measured in dollars per minute. Nonetheless I
recommend installing hardware RAID anyway, for reasons given by others
in this item.
It also occurs to me that with a RAID system, producing an offsite
backup should consist mainly of pulling out one of the redundant hard
drives to take offsite, and putting an empty in its place. Much faster
and easier than babysitting a tape drive.
|
cross
|
|
response 31 of 124:
|
Dec 14 05:13 UTC 2006 |
(I'm not sure that last paragraph follows - in particular, if you do, say,
RAID 5, one disk won't necessarily give you complete information in a backup.)
|
mcnally
|
|
response 32 of 124:
|
Dec 14 09:00 UTC 2006 |
That's a terrible way to back up a RAID array, even one that's just
basic disk mirroring.
|
nharmon
|
|
response 33 of 124:
|
Dec 14 16:13 UTC 2006 |
Does Grex even need a backup system, let alone an offsite backup? It
seems to me that all Grex needs is some sort of "Recovery Kit", or a
collection of software for Grex that can be put on DVD and distributed
to staffers or maybe even given away as free OSS (assuming we used OSS).
User home directories should be the responsibility of end users. We
could recruit tech-savy users to assist other people in backing up their
own data.
|
mcnally
|
|
response 34 of 124:
|
Dec 14 17:38 UTC 2006 |
Well, I still remember when STeve deleted all the mail on the
/var/spool/mail partition, so I'm inclined to think that Grex
ought to have a backup system. It'd also be kind of a bummer
if all the data in the conferencing system disappeared tomorrow
and couldn't be restored..
Users probably *should* back up their important data offsite,
but that process will certainly tax Grex's bandwidth if more
than a few people start to do that frequently.
|
cross
|
|
response 35 of 124:
|
Dec 14 17:48 UTC 2006 |
Email really ought to be delivered into the user's home directory, not a
separate partition. Then the mail spool area could be reallocated to more
user space. Backups of all a user's data would be pretty easy (just tar
up one directory instead of one directory and another file that the user
might not even know about). I suspect sufficiently few people use grex
seriously enough that backups on an individual basis would really tax the
system's bandwidth.
Since I'm the politically incorrect firebrand right now anyway, I'll say that
the lose of /var/spool/mail was just due to poor planning. It's interesting
to note that grex's disks were repartitioned without any concensus.
|
aruba
|
|
response 36 of 124:
|
Dec 15 13:47 UTC 2006 |
How much would it cost to add a hardware RAID system to our current machine?
|
maus
|
|
response 37 of 124:
|
Dec 15 14:29 UTC 2006 |
resp:36 It depends on a few things. Are we talking about adding a
two-drive mirror set or a RAID that spans many drives? Do we already own
the drives? Will we use SCSI or Serial ATA or IDE? Do we need hot-plug
capabilities? Do we want it to be battery backed so it can finish
commits to discs even if the system loses power in the middle of a
commit? Does it need to support a hot spare? Will the drives be in the
server's chassis or do we also need a shelf/enclosure for the drives?
I'll try to get you a few quotes over the next few days once I have an
idea of what you need.
|
nharmon
|
|
response 38 of 124:
|
Dec 15 14:34 UTC 2006 |
Can we price out a 3TB fiberchannel SAN? :-)
|
aruba
|
|
response 39 of 124:
|
Dec 16 19:05 UTC 2006 |
Re #37: I don't know the answer to those questions, except that we currently
have a lot of SCSI disk. I want to say 3 x 18 gig, plus one more rebuilt
disk sitting on my desk.
|
maus
|
|
response 40 of 124:
|
Dec 16 20:39 UTC 2006 |
We could get comparable performance and significant capacity increase by
doing the following:
Array0:
4-port Serial ATA 3Ware Escalade RAID board
port 0: 200GByte Serial ATA drive (possibly Seagate or Maxtor)
port 1: 200GByte Serial ATA drive (possibly Seagate or Maxtor)
port 2: 200GByte Serial ATA drive (possibly Seagate or Maxtor)
port 3: 200GByte Serial ATA drive (possibly Seagate or Maxtor)
port0 + port1 as a RAID-Mirror
port2 + port3 as hot-spares
This could sustain the loss of *ANY* two drives failing, as long as they
do not fail simultaneously. Serial ATA is hot plugable, provided the
drives are in a cage with proper connectors (to assure that logic or
data is not asserted while power is not on -- achieved by varying pin
length to have power use the longest pins in the connectors).
Equipment proposed:
===================
RAID Controller: 3Ware 9550SX-4LP
Drive Enclosure: 3Ware RDC- 400
Drive Cables: 3Ware Cables for 9590SE, 9550SX and 3ware Sidecar
High-Speed Drives: 4 x Seagate Barracuda ES Hard Drives
Would this setup do us for a while? If so, let me know and I will try to
get us quotes on this stack of kit. I will say that I have been
consistently pleased with 3Ware's kit, and Seagate has always been good
so long as I remember. OpenBSD 3.9 recognizes the 3Ware Escalade
automagically, and can use the array hanging off of it as a single SCSI
drive/LUN.
Also, just so you know, setting up the array on the 3Ware is easy,
provided you have console access to the server before the "boot>"
prompt.
|