|
|
| Author |
Message |
nharmon
|
|
RAID and more Disk
|
Feb 14 17:46 UTC 2007 |
From the 11 Feb 2007 staff meeting report that Jan posted in Coop 396:
There is some interesting in buying a hardware RAID controller
and a number of large fast disks to use with it. This plan
would vastly increase disk space and reliability. Cost might
be around $1600, but STeve will make better estimate.
Also see Coop 380 where the issue of finances were discussed in regards
to purchasing a RAID array.
This item is for discussion of a possible disk upgrade to a RAID array.
|
| 31 responses total. |
gelinas
|
|
response 1 of 31:
|
Feb 16 02:27 UTC 2007 |
I don't remember the details of STeve's proposal. Let's hope he (or glenda)
has a moment to stop by.
|
steve
|
|
response 2 of 31:
|
Feb 18 04:15 UTC 2007 |
I've been looking at raid arrays at work for a while now, and
have settled on raid 1 (disk mirroring) using a hardware only solution.
The ones I have now (pata and sata ide) are made by Arco, now called
dupldisk.
With a hardware only solution the OS doesn't know whats going
on underneath, which means that it doesn't mess with it at all.
I'm thinking that one of the arco SATA raid cards with four
really huge disks should work for Grex for some time. 750G disks
are in the area of $340 each, so with the raid card being about
$270, we'd be looking at about $1650 for a raid system that had
1.5T of disk.
Now, I know that 1.5T of disk seems a preposterous amount of
disk, and I admit that I don't know why we'd want all that, but
given the fact that getting the time to do the work is the most
rare commodity, and that smaller disks aren't all that much less
money, I'm not sure it makes sense to try and make a smaller
system.
Sadly, the spam situation shows no signs of slowing down,
so a 100 - 500M mailbox limit for user accounts might be
something to think about in the future. Certainly the concept
that the 20M mailbox limit we have now not being nearly big
enough, back when we started Grex would have seemed madness.
|
maus
|
|
response 3 of 31:
|
Feb 18 05:32 UTC 2007 |
I think I can agree with that, although I am not familiar with Arco, and
so cannot vouch for them. How many ports does the Arco SATA board have?
Could we do RAID 1 + 0 or RAID 1 + concatenating and have hot spares? I
still like the 3ware boards, and they are known to work natively with
OBSD (the drivers are in the default kernel and the entire LUN shows up
as a single SCSI drive).
|
ball
|
|
response 4 of 31:
|
Feb 18 06:17 UTC 2007 |
RAID 1 is a Good Thing and less expensive than RAID 1+0 or 0+1
initially, which is handy if you're buying Bismarcks. What kind of slot
is the host adaptor going into, or are we looking at a board with SATA
ports for the drives and an Ultra160 interface to the Grex box? I like
cake.
|
cross
|
|
response 5 of 31:
|
Feb 18 15:15 UTC 2007 |
Whatever solution we go with, OpenBSD *must* support it natively in the
current, standard distribution. I'd rather have something we know works
than something that potentially the best board ever that isn't supported by
our software; this mistake was made when we upgrade grex to the present
hardware (the Ethernet chip on the motherboard had support problems, and
there were rumors of problems with the SCSI controller) and I don't want to
see a repeat.
|
cross
|
|
response 6 of 31:
|
Feb 18 15:19 UTC 2007 |
PS- I'm ambivalent about RAID 1 versus RAID 0+1 or 1+0. If the hardware is
doing it, it's likely to be sufficiently fast that we would notice. That
said, the latter might give us more available space on a single, logical drive
that's 1.5TB in size with blazing performance. I can't see a downside to
that.
|
steve
|
|
response 7 of 31:
|
Feb 19 01:38 UTC 2007 |
Huh? The ethernet card had zero problems; it always performed wonderfully.
The Broadcom ethernet chip was a little new at the time, but was more slow
than anything else.
As far as support goes for raid cards, the hardware only solution is
perfect. The arco card works great with IDE. I have a SATA Arco card
which I'm setting up next week but know of a couple happily supporting
OpenBSD now.
|
cross
|
|
response 8 of 31:
|
Feb 19 02:08 UTC 2007 |
Regarding #7; I recall that the broadcom chip caused grex to crash several
times; is my memory flawed? The ultimate solution was to install a PCI
ethernet card, if I recall correctly.
I'm looking forward to hearing how the Arco card you're setting up works with
the current stable release of OpenBSD, Steve. Please keep us informed.
|
cross
|
|
response 9 of 31:
|
Feb 19 05:07 UTC 2007 |
It seems that this is becoming critical.
Steve, I can't find any information about support for Arco cards on the
OpenBSD/i386 hardware listing for OpenBSD 4.0. However, the 3ware cards that
Maus mentioned before are listed as supported; what are the advantages you
see of the Arco cards over the 3ware gear?
|
cross
|
|
response 10 of 31:
|
Feb 19 05:08 UTC 2007 |
(Also, do you have a source for information on the OpenBSD support for the
Arco cards?)
|
steve
|
|
response 11 of 31:
|
Feb 19 19:52 UTC 2007 |
Nick Holland has written about them in misc@. Thats how I first got
wind of them. I know the pata IDE works great as I've been beating on
my new fileserver for a while now with no problems. Once I make a new
web server I can take the main one offline and stuff the 500G Arco sata
array there, and test that.
I thought the Arco stuff was talked about somewhere, perhaps the faq.
If it isn't, I should write something up and submit it, because the card
I've been using is great.
|
cross
|
|
response 12 of 31:
|
Feb 19 20:15 UTC 2007 |
Hmm; if we're talking about this:
http://archives.neohapsis.com/archives/openbsd/2003-09/2155.html
Then I'm not too impressed. It seems like we want something a bit more
`server grade.' Which card are you using, Steve?
|
cross
|
|
response 13 of 31:
|
Feb 19 20:23 UTC 2007 |
Perhaps a good idea would be to set out some requirements for what we need
here in a RAID storage solution. I see the following as being inviolable
requirements:
0) RAID 1 (mirroring; RAID 1+0 or 0+1 would be nice, but not necessary)
1) Will fit a standard rackmount enclosure.
2) Supported by OpenBSD in the current, standard, stable distribution.
-current support or partial support is not good enough. As of right
now, that means OpenBSD 4.0 supports it out of the box.
3) Support for hot swapable disks, hopefully via front-panel replacement.
4) Automated rebuilds of the mirror after disk replacement.
From what I read on the OpenBSD lists, the Arco gear doesn't support some
of these features. That would make it a non-starter in my opinion.
http://monkey.org/openbsd/archive/misc/0309/msg01787.html
|
cross
|
|
response 14 of 31:
|
Feb 19 20:32 UTC 2007 |
Add to this:
5) Support for the RAID controller telling the operating system when a disk
is sick or dead; the Arco stuff appears to do this via polling over a serial
cable. That is, a serial cable comes out of the RAID enclosure and would be
plugged into one of the serial ports on grex and then grex would have to run
a special program to read from that serial port to get the status of the RAID.
A much better solution would be for the RAID controller to use a device driver
interface to tell the host operating system when something was going wrong
(not to mention that there might not be a lot of serial ports available for
use for this on grex).
I'm getting this from here:
http://archives.neohapsis.com/archives/openbsd/2004-08/1581.html
Hmm, the more I look at it, the less I like the Arco cards.
Maus, what are the characteristics of the 3ware cards you had proposed
earlier?
|
steve
|
|
response 15 of 31:
|
Feb 19 20:56 UTC 2007 |
Well, we're going to get into a philosophical discussion here, but
I see the absolute #1 thing for the card to do is mirror well, and
be reliable. One of the reasons why I got the Acro card for work was
the fact that I have friends who deal with Windows that have been
using these cards without incident over a few years now. Having
hot swap abilities or auto rebuild kind of scare me. You'll call me
a technoligcal luddite for my saying this, but I like the KISS
principal (keep it simple, stupid) that to the extent that I'll
forgo extras if the unit basically does what it needs to do, namely
deal with data duplication. I know the history of several of them,
which is why I made that choice. The rebuild rate on the pata ide
card is about 53G/hour, which is about as good as the accusys sata
version. How fast the sata Arco will be I can't yet say, but I don't
think it will be any slower. The Arco card isn't able to keep up
at the disks 3Gb/sec rate, but is faster than pata ide by a large
amount.
I'm willing to optimize for simple operation that is reliable.
This is not to say that the accusys cards aren't great--I just don't
know. I do know the Acro units.
As for your requirements, we agree on 0) since it does raid 1.
The sata Arco device I have next to me would fit into a 3.5in bay,
so it should fit into rackmounts. 2) is done, in that it is pure
hardware--so OpenBSD anything.anything should work. This is part
of the reason why I like this kind of card--it has no idea of
whats upstairs, doing thigs. It just thinks in terms of sectors.
3) I think you're talking more money there. For how often I
see changing disks out, I'm willing to delve into the system.
Yeah, it would be nice to pop a disk out but I'm also thinking
of costs. 4) Again, I see a rebuild as a rare enough thing that
I'm willing to take some downtime. I've had really bad experiences
with raid systems that were supposed to be able to rebuild, and
did, mostly. As I said, I might be a luddite on this, but I'm
pretty conservative about disk data.
However, we're in a planning stage for this, so I'm all ears
for other ideas on what we could buy.
|
cross
|
|
response 16 of 31:
|
Feb 19 21:57 UTC 2007 |
My take on it is this: we need to capitalize on the resources we presently
have at our disposal (money) and compensate for those that we do not (staff
time).
We have some money right now (not gobs, mind you, but certainly enough to
move up to `server grade' hardware), but we do not have a lot of staff time
do grunt work when disks die. As a board member, I'm not comfortable
relying on the idea that any single staff member will be around to devote
many hours to disk recovery, let alone a group. I'd rather put the money
into the better quality hardware that can handle hot swapable disks and
auto-rebuilds than nickle-and-dime ourselves on the assumption that staff
will always be available to take up the slack we create. Given that we have
limited physical accessibility to the machine, I think we should be looking
for a solution that minimizes the amount of time we have to spend on
console, which is why I'm championing hot swapability and auto-rebuilding.
To me, this is more KISS than something that relies on a lot of human
involvement.
OpenBSD 3.8 was touted as the release with great RAID support; I think, if
that's the case, then we really ought to take advantage of it.
|
steve
|
|
response 17 of 31:
|
Feb 19 22:18 UTC 2007 |
So what would you propose? And, think of OpenBSD 4.1--it will be
out in 10 weeks, officially. I'll have a 4.1 src tree the day that
4.1-current comes out, so we could move to it even faster.
So we're into the philosophical now--I can't regard anything that
does auto-rebuilding as "simple". I fear that people might think
that staff time won't be needed as much with things like raid
technology, which I've found not to be true.
Part of this touches upon new hardware in general. I know you've
said that our getting the Antec box was a mistake. I specifically
chose that after talking with many people about reliability,
especially with the power supply. Given the budget we had, and
heat problems that others had with cheaper rackmount units, I
felt that the case was the better option. As well as paying
attention to the kiss idea, I really really like lots of air
blowing on stuff and we have four fans. The inside of that
case is pretty cool. I don't think we need to change boxes
currently. It isn't old, and the part that Grex has killed
many times over (disk) is the part that we're talking of
changing.
|
cross
|
|
response 18 of 31:
|
Feb 20 16:29 UTC 2007 |
Regarding #17; I propose that we do a bit more research and find an SATA RAID
controller that meets our requirements and fits in our price range. From what
I've read about the Arco controller and OpenBSD, it didn't sound much like
it met any reasonable requirements, and it sounded like there were problems
with respect to, e.g., soft booting. That's not good. Accusys and 3ware both
make interesting looking cards and I think we should look into those. User
maus proposed a nice looking SATA RAID setup in coop a while back; I think
we should look at the work he did and see if we can leverage it.
"Simple" is relative. Auto-rebuilding may not be "simple" to implement in
hardware, but then, the RAID vendors have had quite a number of years to
figure out how to do it. It's been *my* experience that if you go with a good
vendor, things just tend to `work.' Indeed, it's not hard to imagine how any
of the underlying algorithms work, and while they maybe aren't `simple'
they're not extraordinarily complex, either. I really do believe that staff
time will be reduced with a good RAID solution, and that *has* been my
experience.
As for the case, I never said that the Antec case was a mistake, or that lots
of airflow isn't good; what I said is that not going with a *rackmount* case
was a mistake. I'm all for good quality hardware, including a case and power
supply. But we need to focus our efforts. Focusing on this outdated desktop
model is a mistake: let's think server grade.
|
maus
|
|
response 19 of 31:
|
Feb 20 16:43 UTC 2007 |
Auto-rebuild, hot-spares and hot-pluggable discs make life much easier.
Labour time has value, and if we are wasting the labour time of very
knowledgeable, experienced people, then we are effectively wasting
money. If we neglect to spend either the money up front or the time in a
crisis, we are not serving our users (and maybe being fiduciary fuckups
for our paying members).
Think of it this way, what would be easier:
0) Getting a page that says a drive failed, pulling the drive while the
machine is running, slapping a new one (already in a hot-plug tray) into
the cage and walking away
1) Getting a page that says a drive failed, posting to the HVCN
webpage, shutting down the system nicely, un-racking and opening the
system's chassis, unscrewing the six tiny screws that hold a drive in
place, replace the drive, screwing the new drive in, closing and
re-racking the system chassis, booting to the firmware of the RAID
controller, starting a rebuild by hand, booting to single-user
maintenance mode, running fsck and then booting to multi-user mode
Which of those could you squeeze in on your lunch break or on the way
home from work? I know which one I prefer, but what would a small rodent
know besides chewing cables and eating cheese?
|
maus
|
|
response 20 of 31:
|
Feb 20 16:56 UTC 2007 |
I take that back, it looks like you can skip the "open the chassis and
unscrew the drive" bit, but even skipping that, rebuilding the mirror in
software and having to babysit it is a *HUGE* waste when it can be done
automagically, on the fly in firmware.
|
nharmon
|
|
response 21 of 31:
|
Feb 20 17:31 UTC 2007 |
I think we should focus our research into SATA RAID controllers to those
on this list:
http://www.openbsd.org/i386.html#hardware
I was about to recommend an Adaptec controller based on the tremendous
experience we've had with them. But they aren't very well supported in
OpenBSD it would seem.
Also, I don't see any Arco/Dupldisk cards on that list.
|
steve
|
|
response 22 of 31:
|
Feb 20 20:22 UTC 2007 |
Adaptec is not on the appreciated list of vendors in the OpenBSD
world, owing to their lack of willingness to give out documentation
on things. There is a long discussion of this in misc@. But raid
support overall has gotten far better. The Arco cards might not be
listed simply because no one ever bothered to add them. The pata
ide one certainly works and I'll be using/testing the sata version
likely week after next when I start the rebuild of our web server.
I am really not that concerned with the wait for a rebuild of
the disks, as oppsed to being able to leave whilst the system
rebuilds itself. I do *NOT* want anyone to think that rebuilding
a system is something you should be able to do on one's lunch
break. If we had a $100,000+ EMC disk system it might be different,
but we're talking of a little small system. I'm not willing to
be cavalier about this. Grex eats a disk about every two years
and always has. Hopefully the sata disks are going to change
that a little, but assuming a disk disaster every 1.5 years even,
I'm not going to bemoan the "extra" time it takes for the rebuild.
|
cross
|
|
response 23 of 31:
|
Feb 20 23:31 UTC 2007 |
I'm not willing to be cavalier about it either, Steve.
But as a board member, I'm also not willing to bet our collective farm on
staff's ability to drop whatever they're doing in their non-grex lives on a
dime and go take care of a sick disk, either. I'm also not willing to say,
``well, then grex might be down for a week or two.'' There's just no reason
to do that. We've had downtimes around a week due to dead disks. Actually,
scratch that: because a disk died and no one was available to go swap it out
and rebuild it. If we can automate that, then let's do it.
There really truly are RAID systems in our price range that can handle
automatic rebuilds just fine; we don't need a $100,000 EMC disk array. I
think we need to move to a more automated solution. Otherwise, we might as
well just buy another SCSI disk and stick with what we have, since the
downtime and effects would be comparable (except, perhaps, for lost data).
Hot spares, hot swapping, and online rebuilds. These things aren't rocket
science, or even state of the art anymore; they're also affordable.
|
tod
|
|
response 24 of 31:
|
Feb 21 00:44 UTC 2007 |
I agree with maus, nharmon, and especially cross: on-site staff time is not
abundant and an automagical rebuild is the preference
I think its great STeve has breached the topic and found DupliDisks to work
well for his Microsoft admin friends but Grex has had a few downtimes that
were more than just a day or two.
It would be nice to treat the financial contributors and all users to a system
with reliable uptime minus the requirement for "hands on" fixes.
|