|
|
| Author |
Message |
| 25 new of 547 responses total. |
jep
|
|
response 360 of 547:
|
May 23 16:55 UTC 2003 |
How much would a RAID controller cost? I'm sure Jan is right that it'd
cost too much, but if there are substantial benefits maybe the users
would spring for some more money.
I'm not sure the benefits would be all that substantial in any case.
We're going to brand new spiffy hardware and I expect that will already
mean a big improvement in reliability. Grex isn't unreliable even
now. But it seems like it'd be easier to discuss it now than after the
new machine is in place and in use.
|
scg
|
|
response 361 of 547:
|
May 23 17:11 UTC 2003 |
I want to dispute the claim that since Grex doesn't have to be up all the
time, the high availaibility provided by RAID isn't important. Grex doesn't
pay anything for its staff time, but it is a scarce resource. The difference
in staff time required to format a new disk and restore data to it, versus
just putting in a new disk and letting it happen automatically, is huge.
I too am curious about the costs of hardware RAID controllers. It's been
years since I looked at such things, but given that they were widely
available three or four years ago, I'm surprised to hear the price hasn't
come down.
|
janc
|
|
response 362 of 547:
|
May 23 17:16 UTC 2003 |
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
raid 64 2000 9520 6.8 7974 1.3 5706 2.0 50932 62.5 63815 13.0 147.9 1.6
raid 256 2000 9483 7.1 8768 1.9 5443 2.6 56017 67.9 70599 14.3 183.1 1.3
scsi 2000 53754 43.4 54106 14.1 10090 2.6 60326 70.9 61067 11.5 201.2 0.8
OK, the second line is RAID with stripe size of 256 kiB instead of 64 kiB.
Generally things are better, but not dramatically so. (Doing
'raidtcl -sv raid1' confirms that it did get reconfigured.)
Generally, if you do a large number of small reads and writes to small files,
then a large stripe size is better, and if you read a smaller number of larger
files, a smaller stripe size is better. Grex probably belongs on the larger
end of the spectrum.
|
janc
|
|
response 363 of 547:
|
May 23 17:26 UTC 2003 |
Note that we have a hardware RAID controller on our motherboard, a "Promise"
device whose model number I've forgotten. It works only with IDE drives and
is not supported by OpenBSD (they don't seem to think they are going to
support such things either). So, there is a wide range of hardware RAID
controllers with different capabilities and prices.
Recovering from a disk crash certainly costs less staff time with RAID. But
how often does it happen? If you have a recent snapshot on another disk,
recovering from a disk crash isn't all that hard even without RAID. Ammortize
the time difference over the low frequency with which it happens, and I don't
see much weight to that argument.
|
jep
|
|
response 364 of 547:
|
May 23 18:45 UTC 2003 |
I did a quick search on RAID controllers, and saw prices in the mid-
several hundreds ($300-700). I don't know anything about what value
would be provided by the different types. I am not in position to
analyze the number and effects of disk hardware failures, either. I'm
only asking a question.
|
gull
|
|
response 365 of 547:
|
May 23 20:27 UTC 2003 |
Also, OpenBSD's hardware support is pretty limited even compared to
other open-source operating systems, so you can't buy just any RAID
controller and expect it to work.
|
janc
|
|
response 366 of 547:
|
May 23 20:59 UTC 2003 |
http://www.openbsd.com/i386.html#hardware includes a list of hardware RAID
controllers supported by OpenBSD. Not thaat I think we should get one.
|
lk
|
|
response 367 of 547:
|
May 24 03:12 UTC 2003 |
As jep said, you can get a decent RAID controller for about $400.
OpenBSD drivers, though, are another matter.
I think Grex needs to move forward. The 2nd guessing can continue for
years, but the hardware is already in place (perhaps there should have
been more discussion earlier). Keep in mind that what we're "bickering"
over is what may (or may not) be a little bit better than the alternative.
Having said that, what about my idea?! Have one boot disk with all the
(rarely changing) system directories on it and then configure the other
two "data" disks as RAID 1 (mirroring). It entails 50% disk "waste",
but shouldn't have the performance hit while retaining availability
and redundancy.
After all, we live in compromising times.... (:
|
jep
|
|
response 368 of 547:
|
May 24 04:09 UTC 2003 |
I didn't have the impression I was holding anything up, or that anyone
else was, either, with the questions about RAID. Dan has been making
what appear to be useful suggestions -- I can conclude that, if only
that Jan has been accepting some of them.
As for my part, I think it's clear enough to everyone here that I
shouldn't have any input about RAID. I've never set up a RAID system.
If there's a choice for a staffer between doing anything about the new
system, and answering one of my questions or comments, by all means,
work on the system. (As if I even have to say that.)
|
i
|
|
response 369 of 547:
|
May 24 12:47 UTC 2003 |
Back in janc's "Intro to RAID":
RAID 5 turns a disk write into 2 reads & 2 writes. Better than what
janc suggested that grex (with 3 disks, not 10) would face, but still
not good when (i believe) grex is doing plenty of writes. (Is it?)
Good hardware RAID (with dedicated hardware to do parity calculations,
lots of private cache memory to reduce disk activity, etc.) could improve
this. But disk space is cheap enough these days to make RAID 1 the way
to go if one wants redundancy in a "lots of writes" situation. (At least
for our size & budget.) RAID 1 is also considerably easier to do
"acceptablely" in software, and great software RAID is obviously not a
priority for OpenBSD.
If we're eager to avoid downtime, a spare hard drive's great to have.
When a dead drive has you down or limping, there's often a huge downtime
difference between "have an identical, well-tested spare drive on hand"
and "rush to research suitable replacement models, where they might be
bought, costs, and lead times". *Especially* since different generations
of SCSI hard drives sometimes fail to "play well together" in flakey,
intermittent ways.
|
cross
|
|
response 370 of 547:
|
May 24 16:03 UTC 2003 |
Hmm. It would appear that RAID5 performance is just unacceptably slow
with RAIDframe in OpenBSD. Weird; I'd have thought it'd be better. Oh
well, it's not the first time I've been wrong.
If a hardware RAID controller is $400, one would have to weigh the cost
of buying one of those versus bying another SCSI disk for $200 and using
raid 0+1 (mirroring, and striping over the mirrors). That I am reasonably
confident would be fast. Is it worth it for grex? That's another matter.
I agree with scg that it is, but I'm not paying all the bills.
I disagree with Leeron that doing mirroring by itself is the way to go;
I think the price/performance ratio isn't worth it.
|
lk
|
|
response 371 of 547:
|
May 24 16:18 UTC 2003 |
Sorry, jep, I didn't mean to imply that you (or others) were holding up
anything. I certainly have no idea what the implementation time frame is.
For all I know, Grex budgeted the next 3 months for such discusssion
before finalizing NewGrex and putting it on-line. (:
There's a lot of worthy discussion here and many good suggestions.
But I do know how over-discussion can become negative on a BBS, and I don't
want to see that happen here. Not to sound like the US Patent Officer of
125 years ago, I think all the constructive comments about RAID, with all
its pluses and minuses have been made. It's time to make a decision....
These are the points I'd consider:
(Note that whether RAID is useful for Grex almost becomes a moot point)
1. We have no RAID controller
(and I'm not impressed by the list supported by OpenBSD)
2. The software RAID-5 performance rules that out.
3. Software RAID-1 remains a possibility.
(At least Walter and I think so.)
|
janc
|
|
response 372 of 547:
|
May 24 23:04 UTC 2003 |
Yeah, it occurred to me a little after I wrote my introduction to RAID that
there were more efficient ways to maintain parity on writing - you can read
the parity disk, and the value you are about to overwrite, and use those to
compute the new check sum. So 2 reads and 2 writes suffices no matter how
many disks you have in a RAID 5 array. So, Walter's correction is correct.
I'm not at all unhappy with this discussion. I think we are still in a mode
of usefully exploring options and collecting data. If I feel the discussion
is stagnating, I'll bring it to completion, by declaring a solution by fiat
if necessary, though I'd prefer to boil it down to a few options and get some
concensus among staff. (If Marcus weren't out of town this month, I'd
probably call a staff meeting. We'll need one after he's back in any case.)
I'm interested in Leeron's RAID 1 suggestion. Two disks in RAID 1 and one
disk plain wastes 1/3 of our space, just as RAID 5 would have. If RAID 1
performs substantially better than RAID 5, then this might be a viable option.
The performance is going to have to be pretty good to convince me that this
is better than the rsync option though. However, I plan to rearrage two disks
into a RAID 1 array tonight, so we can benchmark that.
The other project I'm pursuing is improving my understanding of Grex's disk
usage patterns. If you're reading this in coop, you may want to check out
Garage item 150 (I think) where I recently posted some statistics on old
Grex's disk usage. Preliminary results seem to indicate that most of Grex's
disk usage is on the /var drive (which does not include /var/spool/mail).
Apparantly what Grex does most of is logging. More than half the disk
activity is there, and it is almost all writes, not reads. I want to keep
investigating this.
I don't think we are in any special hurry to get the new Grex up, but I want
to keep the process in motion, not letting it stagnate or stall. We are not
stalled. Things are good.
|
janc
|
|
response 373 of 547:
|
May 25 03:08 UTC 2003 |
OK, I've re-arranged the disks once again. Now /sd0 is a plain filesystem
comprising of of SCSI disk 0, and /raid is a RAID 1 array consisting of SCSI
drives 1 and 2.
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
scsi 2000 53754 43.4 54106 14.1 10090 2.6 60326 70.9 61067 11.5 201.2 0.8
raid 1 2000 16651 13.5 19368 3.4 10702 3.6 61614 73.5 68343 14.5 197.9 1.5
raid 5 2000 9520 6.8 7974 1.3 5706 2.0 50932 62.5 63815 13.0 147.9 1.6
This is definately performing much better than RAID 5, but the writes are
still rather on the slow side. (Though we do seem to be getting a slight
win on the READ side - looks like it is balancing reads across the two
disks well enough to get a moderate performance win over a single disk.)
Dan's multi-process benchmark might be worth trying.
Leeron's idea was to use RAID 1 for the more ephemeral partitions -
partitions where data changes rapidly, and restoring from a week-old
backup tape after a crash might be unsatistactory. RAID 1 would provide
a full backup of that data.
So, the RAID might have /bbs, root (mainly for /etc/passwd), /var
(current log files). The regular disk might have /usr, /usr/local, etc.
Dunno where users would go.
The problem is, that the partitions whose contents change a lot (and are
thus more interesting to keep a real-time mirror of) also tend to have
a lot of writes. So putting a partition like /var, which is almost
write-only, on RAID would be pretty unattractive from a performance
point of view. So there's a bit of a paradox here - RAID's advantage
over rsync is greatest when writes are frequent, but it's performance
suffers most under those circumstances.
The one partition where RAID 1 looks good to me right now is /bbs.
More reading than writing certainly happens there, but there is enough
writing so that keeping a mirror would be nice. I guess user partitions
would be a possibility for RAID too.
|
cross
|
|
response 374 of 547:
|
May 25 04:03 UTC 2003 |
Hmm, I don't know. The more we look at the performance numbers, the
less and less impressed I am, to the point of actually being really
disappointed in RAIDframe. It almost doesn't seem worth it.
Doing something like RAID 1+0 might be better, but would require another
disk to really be useful.
I'm not sure I think doing RAID on one partition alone is really worth
it, the rationale being that if a disk dies, without using RAID everywhere,
you have to do a lot of work to bring it back online. Doing the same work
with one or two partitions more doesn't seem like that much of an added
incremental cost. That doesn't solve the problem of lost data, though.
One solution to that would be to leave a tape in the tape drive all the
time, and do a nightly full backup of /bbs and the user partitions (just
overwrite the tape). Every now and then, do a full backup of everything
on a seperate tape and keep it for posterity.
|
cross
|
|
response 375 of 547:
|
May 25 04:15 UTC 2003 |
FYI, I logged back into the nextgrex machine and re-ran my simple
benchmark. The one that took 81 seconds on the RAID5 partition (using an
interleave size of 64; I didn't get a chance to try it on the one with an
interleave size of 256) took about 5 and a quarter seconds on average.
Almost a 20 fold speed increase. Bonnie shows that performance on
a mirror is about 1/3 that of a straight disk. With another disk,
I'd champion using RAID 1+0, as I'm guessing that would be in the same
general area performance wise as `normal' partitions, while still giving
high availability. It'd cost another $200 to get another disk to do
it, though.
|
janc
|
|
response 376 of 547:
|
May 25 13:51 UTC 2003 |
Yeah, I think later today I'll make another pass at designing a
RAID-less partition scheme. This has all been very educational, and
RAID 1 is almost good enough to use, but I don't feel it is quite good
enough.
|
scg
|
|
response 377 of 547:
|
May 27 18:51 UTC 2003 |
I should note that I'm not pushing hard for using RAID. My impression has
been that RAID is a good thing, all other things being equal, but I don't
know enough about to make a good choice.
What I would object to, and what it seemed to me was being advocated in some
of the earlier arguments, is designing for low availability. There are all
sorts of things it makes sense to design for in various situations, such as
low cost, low maintenance, high performance, high availability, and so forth,
and declaring one of those to be a high priority generally involves tradeoffs
in other areas. If cost or performance are determined to be more important
than high availability, I might agree and I certainly wouldn't argue. It's
not doing RAID purely for the sake of not needing high availability that I
was objecting to, and it doesn't sound to me like that's what's going on here
anymore.
|
janc
|
|
response 378 of 547:
|
May 27 20:35 UTC 2003 |
No, that certainly isn't my thinking here. RAID costs a lot of disk space,
which we can probably afford. RAID, at least as implemented in software under
OpenBSD, seems to have a pretty huge performance penalty. Much bigger than
it theoretically should have. High availability *IS* the benefit of RAID.
(RAID 0 and well-implemented RAID 1 might give you performance benefits, but
in most version of RAID other overhead will eat any performance benefit).
I like RAID, but my feeling is that high availability isn't important enough
to Grex to justify its other costs.
I may be wrong. It may be that this new computer is going to be so fast,
that running Grex will hardly load it, and the performance cost of RAID
wouldn't mean anything to it. If so, we should consider moving onto RAID
in the future. I don't think making that change later will be hard. We
need to rebuild the system every year and a half anyway, and changing the
disks from flat disks to RAID does not have broad implications for the
rest of the system configuration.
|
lk
|
|
response 379 of 547:
|
May 28 04:38 UTC 2003 |
With all due respect, check out the speed of M-Net these days (arbornet.org).
I'm not up-to-date on the hardware specs, but I'd assume it's running on
a CPU that is 1/3rd to 1/4 the horsepower and slower drives.
|
cross
|
|
response 380 of 547:
|
May 28 06:00 UTC 2003 |
Hmm, I use mnet...every couple of days or so. It's usually quite
fast, but I don't believe they're using RAID. What's more, they're
running FreeBSD, which has a different RAID implementation yet again.
Leeron, what are you refering to that one should note in terms of mnet's
performance?
|
janc
|
|
response 381 of 547:
|
May 28 14:42 UTC 2003 |
I haven't been on M-Net for a while - but it also generally had fewer users
than Grex.
Generally I expect that the new Grex will be way too fast for the load the
current user base will put on it. However, the user base may grow with better
performance. Also we will be turning on quotas, which is going to put some
drag on the disk performance - that's a lot more important to me than RAID.
Also Grex occasionally gets hit by vandals - I just spent some time tracking
down a mailbomber who was slowing the system down badly. How will the new
Grex perform under those conditions? I don't know. I think we'll need to
gain experience with the new computer before we can really decide this.
I think we can reconfigure to use RAID later if we feel the need. I think
we could do such a reconfiguration in a day, if needed.
|
cross
|
|
response 382 of 547:
|
May 28 16:13 UTC 2003 |
Sounds good to me. Also, going to the next grex allows one to do some
things that I think will be beneficial, such as turning off the queueing
telnet daemon (the queue is almost always empty, anyway, except in like,
5% of all cases), using a new version of SSH, ditching sendmail in favor
of something like postfix, etc.
|
tod
|
|
response 383 of 547:
|
May 28 16:58 UTC 2003 |
This response has been erased.
|
janc
|
|
response 384 of 547:
|
May 29 01:18 UTC 2003 |
Well, I didn't get much work on next Grex done today, but I built a
respectable castle out of Lego, so the day isn't entirely a waste.
|