|
|
From the 11 Feb 2007 staff meeting report that Jan posted in Coop 396:
There is some interesting in buying a hardware RAID controller
and a number of large fast disks to use with it. This plan
would vastly increase disk space and reliability. Cost might
be around $1600, but STeve will make better estimate.
Also see Coop 380 where the issue of finances were discussed in regards
to purchasing a RAID array.
This item is for discussion of a possible disk upgrade to a RAID array.
31 responses total.
I don't remember the details of STeve's proposal. Let's hope he (or glenda) has a moment to stop by.
I've been looking at raid arrays at work for a while now, and
have settled on raid 1 (disk mirroring) using a hardware only solution.
The ones I have now (pata and sata ide) are made by Arco, now called
dupldisk.
With a hardware only solution the OS doesn't know whats going
on underneath, which means that it doesn't mess with it at all.
I'm thinking that one of the arco SATA raid cards with four
really huge disks should work for Grex for some time. 750G disks
are in the area of $340 each, so with the raid card being about
$270, we'd be looking at about $1650 for a raid system that had
1.5T of disk.
Now, I know that 1.5T of disk seems a preposterous amount of
disk, and I admit that I don't know why we'd want all that, but
given the fact that getting the time to do the work is the most
rare commodity, and that smaller disks aren't all that much less
money, I'm not sure it makes sense to try and make a smaller
system.
Sadly, the spam situation shows no signs of slowing down,
so a 100 - 500M mailbox limit for user accounts might be
something to think about in the future. Certainly the concept
that the 20M mailbox limit we have now not being nearly big
enough, back when we started Grex would have seemed madness.
I think I can agree with that, although I am not familiar with Arco, and so cannot vouch for them. How many ports does the Arco SATA board have? Could we do RAID 1 + 0 or RAID 1 + concatenating and have hot spares? I still like the 3ware boards, and they are known to work natively with OBSD (the drivers are in the default kernel and the entire LUN shows up as a single SCSI drive).
RAID 1 is a Good Thing and less expensive than RAID 1+0 or 0+1 initially, which is handy if you're buying Bismarcks. What kind of slot is the host adaptor going into, or are we looking at a board with SATA ports for the drives and an Ultra160 interface to the Grex box? I like cake.
Whatever solution we go with, OpenBSD *must* support it natively in the current, standard distribution. I'd rather have something we know works than something that potentially the best board ever that isn't supported by our software; this mistake was made when we upgrade grex to the present hardware (the Ethernet chip on the motherboard had support problems, and there were rumors of problems with the SCSI controller) and I don't want to see a repeat.
PS- I'm ambivalent about RAID 1 versus RAID 0+1 or 1+0. If the hardware is doing it, it's likely to be sufficiently fast that we would notice. That said, the latter might give us more available space on a single, logical drive that's 1.5TB in size with blazing performance. I can't see a downside to that.
Huh? The ethernet card had zero problems; it always performed wonderfully. The Broadcom ethernet chip was a little new at the time, but was more slow than anything else. As far as support goes for raid cards, the hardware only solution is perfect. The arco card works great with IDE. I have a SATA Arco card which I'm setting up next week but know of a couple happily supporting OpenBSD now.
Regarding #7; I recall that the broadcom chip caused grex to crash several times; is my memory flawed? The ultimate solution was to install a PCI ethernet card, if I recall correctly. I'm looking forward to hearing how the Arco card you're setting up works with the current stable release of OpenBSD, Steve. Please keep us informed.
It seems that this is becoming critical. Steve, I can't find any information about support for Arco cards on the OpenBSD/i386 hardware listing for OpenBSD 4.0. However, the 3ware cards that Maus mentioned before are listed as supported; what are the advantages you see of the Arco cards over the 3ware gear?
(Also, do you have a source for information on the OpenBSD support for the Arco cards?)
Nick Holland has written about them in misc@. Thats how I first got wind of them. I know the pata IDE works great as I've been beating on my new fileserver for a while now with no problems. Once I make a new web server I can take the main one offline and stuff the 500G Arco sata array there, and test that. I thought the Arco stuff was talked about somewhere, perhaps the faq. If it isn't, I should write something up and submit it, because the card I've been using is great.
Hmm; if we're talking about this: http://archives.neohapsis.com/archives/openbsd/2003-09/2155.html Then I'm not too impressed. It seems like we want something a bit more `server grade.' Which card are you using, Steve?
Perhaps a good idea would be to set out some requirements for what we need here in a RAID storage solution. I see the following as being inviolable requirements: 0) RAID 1 (mirroring; RAID 1+0 or 0+1 would be nice, but not necessary) 1) Will fit a standard rackmount enclosure. 2) Supported by OpenBSD in the current, standard, stable distribution. -current support or partial support is not good enough. As of right now, that means OpenBSD 4.0 supports it out of the box. 3) Support for hot swapable disks, hopefully via front-panel replacement. 4) Automated rebuilds of the mirror after disk replacement. From what I read on the OpenBSD lists, the Arco gear doesn't support some of these features. That would make it a non-starter in my opinion. http://monkey.org/openbsd/archive/misc/0309/msg01787.html
Add to this: 5) Support for the RAID controller telling the operating system when a disk is sick or dead; the Arco stuff appears to do this via polling over a serial cable. That is, a serial cable comes out of the RAID enclosure and would be plugged into one of the serial ports on grex and then grex would have to run a special program to read from that serial port to get the status of the RAID. A much better solution would be for the RAID controller to use a device driver interface to tell the host operating system when something was going wrong (not to mention that there might not be a lot of serial ports available for use for this on grex). I'm getting this from here: http://archives.neohapsis.com/archives/openbsd/2004-08/1581.html Hmm, the more I look at it, the less I like the Arco cards. Maus, what are the characteristics of the 3ware cards you had proposed earlier?
Well, we're going to get into a philosophical discussion here, but I see the absolute #1 thing for the card to do is mirror well, and be reliable. One of the reasons why I got the Acro card for work was the fact that I have friends who deal with Windows that have been using these cards without incident over a few years now. Having hot swap abilities or auto rebuild kind of scare me. You'll call me a technoligcal luddite for my saying this, but I like the KISS principal (keep it simple, stupid) that to the extent that I'll forgo extras if the unit basically does what it needs to do, namely deal with data duplication. I know the history of several of them, which is why I made that choice. The rebuild rate on the pata ide card is about 53G/hour, which is about as good as the accusys sata version. How fast the sata Arco will be I can't yet say, but I don't think it will be any slower. The Arco card isn't able to keep up at the disks 3Gb/sec rate, but is faster than pata ide by a large amount. I'm willing to optimize for simple operation that is reliable. This is not to say that the accusys cards aren't great--I just don't know. I do know the Acro units. As for your requirements, we agree on 0) since it does raid 1. The sata Arco device I have next to me would fit into a 3.5in bay, so it should fit into rackmounts. 2) is done, in that it is pure hardware--so OpenBSD anything.anything should work. This is part of the reason why I like this kind of card--it has no idea of whats upstairs, doing thigs. It just thinks in terms of sectors. 3) I think you're talking more money there. For how often I see changing disks out, I'm willing to delve into the system. Yeah, it would be nice to pop a disk out but I'm also thinking of costs. 4) Again, I see a rebuild as a rare enough thing that I'm willing to take some downtime. I've had really bad experiences with raid systems that were supposed to be able to rebuild, and did, mostly. As I said, I might be a luddite on this, but I'm pretty conservative about disk data. However, we're in a planning stage for this, so I'm all ears for other ideas on what we could buy.
My take on it is this: we need to capitalize on the resources we presently have at our disposal (money) and compensate for those that we do not (staff time). We have some money right now (not gobs, mind you, but certainly enough to move up to `server grade' hardware), but we do not have a lot of staff time do grunt work when disks die. As a board member, I'm not comfortable relying on the idea that any single staff member will be around to devote many hours to disk recovery, let alone a group. I'd rather put the money into the better quality hardware that can handle hot swapable disks and auto-rebuilds than nickle-and-dime ourselves on the assumption that staff will always be available to take up the slack we create. Given that we have limited physical accessibility to the machine, I think we should be looking for a solution that minimizes the amount of time we have to spend on console, which is why I'm championing hot swapability and auto-rebuilding. To me, this is more KISS than something that relies on a lot of human involvement. OpenBSD 3.8 was touted as the release with great RAID support; I think, if that's the case, then we really ought to take advantage of it.
So what would you propose? And, think of OpenBSD 4.1--it will be out in 10 weeks, officially. I'll have a 4.1 src tree the day that 4.1-current comes out, so we could move to it even faster. So we're into the philosophical now--I can't regard anything that does auto-rebuilding as "simple". I fear that people might think that staff time won't be needed as much with things like raid technology, which I've found not to be true. Part of this touches upon new hardware in general. I know you've said that our getting the Antec box was a mistake. I specifically chose that after talking with many people about reliability, especially with the power supply. Given the budget we had, and heat problems that others had with cheaper rackmount units, I felt that the case was the better option. As well as paying attention to the kiss idea, I really really like lots of air blowing on stuff and we have four fans. The inside of that case is pretty cool. I don't think we need to change boxes currently. It isn't old, and the part that Grex has killed many times over (disk) is the part that we're talking of changing.
Regarding #17; I propose that we do a bit more research and find an SATA RAID controller that meets our requirements and fits in our price range. From what I've read about the Arco controller and OpenBSD, it didn't sound much like it met any reasonable requirements, and it sounded like there were problems with respect to, e.g., soft booting. That's not good. Accusys and 3ware both make interesting looking cards and I think we should look into those. User maus proposed a nice looking SATA RAID setup in coop a while back; I think we should look at the work he did and see if we can leverage it. "Simple" is relative. Auto-rebuilding may not be "simple" to implement in hardware, but then, the RAID vendors have had quite a number of years to figure out how to do it. It's been *my* experience that if you go with a good vendor, things just tend to `work.' Indeed, it's not hard to imagine how any of the underlying algorithms work, and while they maybe aren't `simple' they're not extraordinarily complex, either. I really do believe that staff time will be reduced with a good RAID solution, and that *has* been my experience. As for the case, I never said that the Antec case was a mistake, or that lots of airflow isn't good; what I said is that not going with a *rackmount* case was a mistake. I'm all for good quality hardware, including a case and power supply. But we need to focus our efforts. Focusing on this outdated desktop model is a mistake: let's think server grade.
Auto-rebuild, hot-spares and hot-pluggable discs make life much easier. Labour time has value, and if we are wasting the labour time of very knowledgeable, experienced people, then we are effectively wasting money. If we neglect to spend either the money up front or the time in a crisis, we are not serving our users (and maybe being fiduciary fuckups for our paying members). Think of it this way, what would be easier: 0) Getting a page that says a drive failed, pulling the drive while the machine is running, slapping a new one (already in a hot-plug tray) into the cage and walking away 1) Getting a page that says a drive failed, posting to the HVCN webpage, shutting down the system nicely, un-racking and opening the system's chassis, unscrewing the six tiny screws that hold a drive in place, replace the drive, screwing the new drive in, closing and re-racking the system chassis, booting to the firmware of the RAID controller, starting a rebuild by hand, booting to single-user maintenance mode, running fsck and then booting to multi-user mode Which of those could you squeeze in on your lunch break or on the way home from work? I know which one I prefer, but what would a small rodent know besides chewing cables and eating cheese?
I take that back, it looks like you can skip the "open the chassis and unscrew the drive" bit, but even skipping that, rebuilding the mirror in software and having to babysit it is a *HUGE* waste when it can be done automagically, on the fly in firmware.
I think we should focus our research into SATA RAID controllers to those on this list: http://www.openbsd.org/i386.html#hardware I was about to recommend an Adaptec controller based on the tremendous experience we've had with them. But they aren't very well supported in OpenBSD it would seem. Also, I don't see any Arco/Dupldisk cards on that list.
Adaptec is not on the appreciated list of vendors in the OpenBSD world, owing to their lack of willingness to give out documentation on things. There is a long discussion of this in misc@. But raid support overall has gotten far better. The Arco cards might not be listed simply because no one ever bothered to add them. The pata ide one certainly works and I'll be using/testing the sata version likely week after next when I start the rebuild of our web server. I am really not that concerned with the wait for a rebuild of the disks, as oppsed to being able to leave whilst the system rebuilds itself. I do *NOT* want anyone to think that rebuilding a system is something you should be able to do on one's lunch break. If we had a $100,000+ EMC disk system it might be different, but we're talking of a little small system. I'm not willing to be cavalier about this. Grex eats a disk about every two years and always has. Hopefully the sata disks are going to change that a little, but assuming a disk disaster every 1.5 years even, I'm not going to bemoan the "extra" time it takes for the rebuild.
I'm not willing to be cavalier about it either, Steve. But as a board member, I'm also not willing to bet our collective farm on staff's ability to drop whatever they're doing in their non-grex lives on a dime and go take care of a sick disk, either. I'm also not willing to say, ``well, then grex might be down for a week or two.'' There's just no reason to do that. We've had downtimes around a week due to dead disks. Actually, scratch that: because a disk died and no one was available to go swap it out and rebuild it. If we can automate that, then let's do it. There really truly are RAID systems in our price range that can handle automatic rebuilds just fine; we don't need a $100,000 EMC disk array. I think we need to move to a more automated solution. Otherwise, we might as well just buy another SCSI disk and stick with what we have, since the downtime and effects would be comparable (except, perhaps, for lost data). Hot spares, hot swapping, and online rebuilds. These things aren't rocket science, or even state of the art anymore; they're also affordable.
I agree with maus, nharmon, and especially cross: on-site staff time is not abundant and an automagical rebuild is the preference I think its great STeve has breached the topic and found DupliDisks to work well for his Microsoft admin friends but Grex has had a few downtimes that were more than just a day or two. It would be nice to treat the financial contributors and all users to a system with reliable uptime minus the requirement for "hands on" fixes.
Right. Our users deserve maximal uptime. Now, I think Steve's argument is that auto-everything RAIDs often talk the talk but don't walk the walk. I don't think that's necessarily the case, if you use decent quality equipment, and that decent-quality equipment is within our price range and supported by our software. So let's just do that.
Does anyone have equipment they can test?
I know that with the 3ware Escalades, the auto rebuild works great and can go on while the machine is booted and running normally (since we don't use drives in hot-swap trays, I have not tried doing this live, but from what colleagues have told me, it is something that is just "slap it in and ignore it").
This wasn't a find in a baseboard, but old stuff I had lying around from a long-abandoned project. If you want a pair of American Megatrends Series 493 boards for grexserver, take them, their yours. These are not new. These are not sexy. What they are is reliable and supported in OBSD (http://www.openbsd.org/cgi-bin/man.cgi?query=ami&sektion=4&apropos=0&man path=OpenBSD+Current&arch=). This will allow us to mirror our root drive in hardware and keep it seperate from our home and mail slices.
How do Escalade controllers notify you of a drive failure and let you know when the rebuild is complete?
This is where I move from definite knowledge to "I think". I know that in RHEL, SLES, Windows Server, you install a package from their webpage, which can send emails to you of all relevant events, and even has its own webpage on a high-number port that you can use to look at the status of the array and its constituent drives. I am not sure how notification works in OBSD, but I would imagine it would be similar.
From what I saw, the 3ware stuff uses SMART.
Response not possible - You must register and login before posting.
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss