You are not logged in. Login Now
 0-24   25-49   50-74   75-99   100-124   125-149   150-174   175-199   200-224 
 225-249   250-274   275-299   300-324   325-349   329-353   354-378   379-403   404-428 
 429-453   454-478   479-503   504-528   529-547      
 
Author Message
25 new of 547 responses total.
janc
response 354 of 547: Mark Unseen   May 23 15:03 UTC 2003

OK, that wasn't so brief.  But writing it just make me more sure that RAID
isn't right for Grex.  The problem it is primarily designed to solve isn't
an important issue for Grex.

I may do some experimenting with rsync, and see if I can get a sense of how
expensive it would be to regularly rsync to the IDE disk.
gull
response 355 of 547: Mark Unseen   May 23 15:43 UTC 2003

Where I work, we use rsync to keep a mirror of about 50 gigs worth of
data.  We're doing it across the Internet, via a T1, as well.  It does
cause a fair amount of disk thrashing on both ends when it figures out
what files need to be transferred (very much like doing a 'find' across
the filesystem) but overall it seems very efficient.  It's worked well
for us.  My guess is the "expense" of doing an rsync to another local
disk a couple times a day is going to be pretty low, especially since
you're not transferring over a network and so won't need to involve ssh
or compression.
cross
response 356 of 547: Mark Unseen   May 23 15:51 UTC 2003

I ran a bench mark last night; one of my own design.  It's nothing really
fancy or scientific; I wrote it a few years ago to try and get a feel for
how various disk subsystems and filesystem times handled a load I thought
was fairly typical of timesharing style machines.  Basically, it just
copies a bunch of 32KB files all over the place.

Running on both the IDE and SCSI drives took about 4 seconds.  Running on
the RAID took around 80 seconds.

Something is wrong here; there's no reason RAIDframe should be *20 times*
slower than a `normal' filesystem, I just can't believe it's that bad.
Perhaps I'm wrong about the stripe size; maybe 64 is just to small.  Jan,
could you up it to 256 and see if that helps any?  I see at least one
post from someone who says they used an interleave size of 168 and got
decent performance, but 32 (and probably 64) was too small.
janc
response 357 of 547: Mark Unseen   May 23 16:02 UTC 2003

Right.  I installed rsync from the ports tree (I like the ports tree).

I then went to /sd0 (the test partition on the first scsi disk) and did
  time rsync -ax /usr .
This should copy the whole /usr partition from the IDE disk to the SCSI
disk (which is backwards from the direction we would be going) and give
me some statistics.  The /usr partition contains 664,632K of data.  The
result from 'time' was;

   12.0u 24.5s 4:28.34 13.6% 0+0k 161046+660947io 36pf+0w

So it took 4.5 minutes elapsed time, eating 13.6% of an otherwise idle CPU.
I than reran it.  In this case it should be checking the two copies against
each other, and copying over only what changed (little or nothing).  The
time result was:

  3.8u 3.5s 0:46.13 16.1% 0+0k 47000+1454io 1pf+0w

This took 45 seconds.

In real life we'd want the --delete option on the command, so files that
don't exist on the source are removed from the copy, but I didn't do it in
the test because I was paranoid about getting the arguments backward.  Even
so, we'd want our target partitions rather larger than the source partitions.
Maybe just one big target partition instead of separate ones corresponding
to the different source partitions, the whole thing readable only by root
and possibly unmounted when it isn't being updated.

Doing this a couple times a day seems a much lower impact way to data
reduncancy than RAID.

It'd be tempting to keep two copies of some partitions, and update them on
alternate days.  Dunno if that's necessary.

This is not a substitute for real backups to tape, of course.
janc
response 358 of 547: Mark Unseen   May 23 16:04 UTC 2003

Dan slipped in.  I'll try reconfiguring the RAID.
janc
response 359 of 547: Mark Unseen   May 23 16:52 UTC 2003

OK, I reconfigured it with a 256 K stripe size.  The current config file
is in /etc/raid1.conf.  Running bonnie now.
jep
response 360 of 547: Mark Unseen   May 23 16:55 UTC 2003

How much would a RAID controller cost?  I'm sure Jan is right that it'd 
cost too much, but if there are substantial benefits maybe the users 
would spring for some more money.

I'm not sure the benefits would be all that substantial in any case.  
We're going to brand new spiffy hardware and I expect that will already 
mean a big improvement in reliability.  Grex isn't unreliable even 
now.  But it seems like it'd be easier to discuss it now than after the 
new machine is in place and in use.
scg
response 361 of 547: Mark Unseen   May 23 17:11 UTC 2003

I want to dispute the claim that since Grex doesn't have to be up all the
time, the high availaibility provided by RAID isn't important.  Grex doesn't
pay anything for its staff time, but it is a scarce resource.  The difference
in staff time required to format a new disk and restore data to it, versus
just putting in a new disk and letting it happen automatically, is huge.

I too am curious about the costs of hardware RAID controllers.  It's been
years since I looked at such things, but given that they were widely
available three or four years ago, I'm surprised to hear the price hasn't
come down.
janc
response 362 of 547: Mark Unseen   May 23 17:16 UTC 2003

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU


raid 64  2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9 1.6
raid 256 2000  9483  7.1  8768  1.9  5443  2.6 56017 67.9 70599 14.3 183.1 1.3
scsi     2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2 0.8

OK, the second line is RAID with stripe size of 256 kiB instead of 64 kiB.
Generally things are better, but not dramatically so.  (Doing
'raidtcl -sv raid1' confirms that it did get reconfigured.)

Generally, if you do a large number of small reads and writes to small files,
then a large stripe size is better, and if you read a smaller number of larger
files, a smaller stripe size is better.  Grex probably belongs on the larger
end of the spectrum.
janc
response 363 of 547: Mark Unseen   May 23 17:26 UTC 2003

Note that we have a hardware RAID controller on our motherboard, a "Promise"
device whose model number I've forgotten.  It works only with IDE drives and
is not supported by OpenBSD (they don't seem to think they are going to
support such things either).  So, there is a wide range of hardware RAID
controllers with different capabilities and prices.

Recovering from a disk crash certainly costs less staff time with RAID.  But
how often does it happen?  If you have a recent snapshot on another disk,
recovering from a disk crash isn't all that hard even without RAID.  Ammortize
the time difference over the low frequency with which it happens, and I don't
see much weight to that argument.
jep
response 364 of 547: Mark Unseen   May 23 18:45 UTC 2003

I did a quick search on RAID controllers, and saw prices in the mid-
several hundreds ($300-700).  I don't know anything about what value 
would be provided by the different types.  I am not in position to 
analyze the number and effects of disk hardware failures, either.  I'm 
only asking a question.
gull
response 365 of 547: Mark Unseen   May 23 20:27 UTC 2003

Also, OpenBSD's hardware support is pretty limited even compared to
other open-source operating systems, so you can't buy just any RAID
controller and expect it to work.
janc
response 366 of 547: Mark Unseen   May 23 20:59 UTC 2003

http://www.openbsd.com/i386.html#hardware includes a list of hardware RAID
controllers supported by OpenBSD.  Not thaat I think we should get one.
lk
response 367 of 547: Mark Unseen   May 24 03:12 UTC 2003

As jep said, you can get a decent RAID controller for about $400.
OpenBSD drivers, though, are another matter.

I think Grex needs to move forward. The 2nd guessing can continue for
years, but the hardware is already in place (perhaps there should have
been more discussion earlier). Keep in mind that what we're "bickering"
over is what may (or may not) be a little bit better than the alternative.

Having said that, what about my idea?!  Have one boot disk with all the
(rarely changing) system directories on it and then configure the other
two "data" disks as RAID 1 (mirroring).  It entails 50% disk "waste",
but shouldn't have the performance hit while retaining availability
and redundancy.

After all, we live in compromising times....   (:
jep
response 368 of 547: Mark Unseen   May 24 04:09 UTC 2003

I didn't have the impression I was holding anything up, or that anyone 
else was, either, with the questions about RAID.  Dan has been making 
what appear to be useful suggestions -- I can conclude that, if only 
that Jan has been accepting some of them.

As for my part, I think it's clear enough to everyone here that I 
shouldn't have any input about RAID.  I've never set up a RAID system.

If there's a choice for a staffer between doing anything about the new 
system, and answering one of my questions or comments, by all means, 
work on the system.  (As if I even have to say that.)  
i
response 369 of 547: Mark Unseen   May 24 12:47 UTC 2003

Back in janc's "Intro to RAID":
   RAID 5 turns a disk write into 2 reads & 2 writes.  Better than what
janc suggested that grex (with 3 disks, not 10) would face, but still
not good when (i believe) grex is doing plenty of writes.  (Is it?) 
   Good hardware RAID (with dedicated hardware to do parity calculations,
lots of private cache memory to reduce disk activity, etc.) could improve
this.  But disk space is cheap enough these days to make RAID 1 the way
to go if one wants redundancy in a "lots of writes" situation.  (At least
for our size & budget.)  RAID 1 is also considerably easier to do 
"acceptablely" in software, and great software RAID is obviously not a 
priority for OpenBSD. 
   If we're eager to avoid downtime, a spare hard drive's great to have. 
When a dead drive has you down or limping, there's often a huge downtime
difference between "have an identical, well-tested spare drive on hand"
and "rush to research suitable replacement models, where they might be
bought, costs, and lead times".  *Especially* since different generations
of SCSI hard drives sometimes fail to "play well together" in flakey,
intermittent ways.
cross
response 370 of 547: Mark Unseen   May 24 16:03 UTC 2003

Hmm.  It would appear that RAID5 performance is just unacceptably slow
with RAIDframe in OpenBSD.  Weird; I'd have thought it'd be better.  Oh
well, it's not the first time I've been wrong.

If a hardware RAID controller is $400, one would have to weigh the cost
of buying one of those versus bying another SCSI disk for $200 and using
raid 0+1 (mirroring, and striping over the mirrors).  That I am reasonably
confident would be fast.  Is it worth it for grex?  That's another matter.
I agree with scg that it is, but I'm not paying all the bills.

I disagree with Leeron that doing mirroring by itself is the way to go;
I think the price/performance ratio isn't worth it.
lk
response 371 of 547: Mark Unseen   May 24 16:18 UTC 2003

Sorry, jep, I didn't mean to imply that you (or others) were holding up
anything. I certainly have no idea what the implementation time frame is.
For all I know, Grex budgeted the next 3 months for such discusssion
before finalizing NewGrex and putting it on-line.  (:

There's a lot of worthy discussion here and many good suggestions.
But I do know how over-discussion can become negative on a BBS, and I don't
want to see that happen here.  Not to sound like the US Patent Officer of
125 years ago, I think all the constructive comments about RAID, with all
its pluses and minuses have been made. It's time to make a decision....

These are the points I'd consider:
(Note that whether RAID is useful for Grex almost becomes a moot point)

1. We have no RAID controller
(and I'm not impressed by the list supported by OpenBSD)

2. The software RAID-5 performance rules that out.

3. Software RAID-1 remains a possibility.
(At least Walter and I think so.)
janc
response 372 of 547: Mark Unseen   May 24 23:04 UTC 2003

Yeah, it occurred to me a little after I wrote my introduction to RAID that
there were more efficient ways to maintain parity on writing - you can read
the parity disk, and the value you are about to overwrite, and use those to
compute the new check sum.  So 2 reads and 2 writes suffices no matter how
many disks you have in a RAID 5 array.  So, Walter's correction is correct.

I'm not at all unhappy with this discussion.  I think we are still in a mode
of usefully exploring options and collecting data.  If I feel the discussion
is stagnating, I'll bring it to completion, by declaring a solution by fiat
if necessary, though I'd prefer to boil it down to a few options and get some
concensus among staff.  (If Marcus weren't out of town this month, I'd
probably call a staff meeting.  We'll need one after he's back in any case.)

I'm interested in Leeron's RAID 1 suggestion.  Two disks in RAID 1 and one
disk plain wastes 1/3 of our space, just as RAID 5 would have.  If RAID 1
performs substantially better than RAID 5, then this might be a viable option.
The performance is going to have to be pretty good to convince me that this
is better than the rsync option though.  However, I plan to rearrage two disks
into a RAID 1 array tonight, so we can benchmark that.

The other project I'm pursuing is improving my understanding of Grex's disk
usage patterns.  If you're reading this in coop, you may want to check out
Garage item 150 (I think) where I recently posted some statistics on old
Grex's disk usage.  Preliminary results seem to indicate that most of Grex's
disk usage is on the /var drive (which does not include /var/spool/mail). 
Apparantly what Grex does most of is logging.  More than half the disk
activity is there, and it is almost all writes, not reads.  I want to keep
investigating this.

I don't think we are in any special hurry to get the new Grex up, but I want
to keep the process in motion, not letting it stagnate or stall.  We are not
stalled.  Things are good.
janc
response 373 of 547: Mark Unseen   May 25 03:08 UTC 2003

OK, I've re-arranged the disks once again.  Now /sd0 is a plain filesystem
comprising of of SCSI disk 0, and /raid is a RAID 1 array consisting of SCSI
drives 1 and 2.

             -------Sequential Output-------- ---Sequential Input-- --Random--
             -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU

scsi    2000 53754 43.4 54106 14.1 10090  2.6 60326 70.9 61067 11.5 201.2 0.8
raid 1  2000 16651 13.5 19368  3.4 10702  3.6 61614 73.5 68343 14.5 197.9 1.5
raid 5  2000  9520  6.8  7974  1.3  5706  2.0 50932 62.5 63815 13.0 147.9 1.6

This is definately performing much better than RAID 5, but the writes are
still rather on the slow side.  (Though we do seem to be getting a slight
win on the READ side - looks like it is balancing reads across the two
disks well enough to get a moderate performance win over a single disk.)

Dan's multi-process benchmark might be worth trying.

Leeron's idea was to use RAID 1 for the more ephemeral partitions -
partitions where data changes rapidly, and restoring from a week-old
backup tape after a crash might be unsatistactory.  RAID 1 would provide
a full backup of that data.

So, the RAID might have /bbs, root (mainly for /etc/passwd), /var
(current log files).  The regular disk might have /usr, /usr/local, etc.
Dunno where users would go.

The problem is, that the partitions whose contents change a lot (and are
thus more interesting to keep a real-time mirror of) also tend to have
a lot of writes.  So putting a partition like /var, which is almost
write-only, on RAID would be pretty unattractive from a performance
point of view.  So there's a bit of a paradox here - RAID's advantage
over rsync is greatest when writes are frequent, but it's performance
suffers most under those circumstances.

The one partition where RAID 1 looks good to me right now is /bbs.
More reading than writing certainly happens there, but there is enough
writing so that keeping a mirror would be nice.  I guess user partitions
would be a possibility for RAID too.
cross
response 374 of 547: Mark Unseen   May 25 04:03 UTC 2003

Hmm, I don't know.  The more we look at the performance numbers, the
less and less impressed I am, to the point of actually being really
disappointed in RAIDframe.  It almost doesn't seem worth it.

Doing something like RAID 1+0 might be better, but would require another
disk to really be useful.

I'm not sure I think doing RAID on one partition alone is really worth
it, the rationale being that if a disk dies, without using RAID everywhere,
you have to do a lot of work to bring it back online.  Doing the same work
with one or two partitions more doesn't seem like that much of an added
incremental cost.  That doesn't solve the problem of lost data, though.
One solution to that would be to leave a tape in the tape drive all the
time, and do a nightly full backup of /bbs and the user partitions (just
overwrite the tape).  Every now and then, do a full backup of everything
on a seperate tape and keep it for posterity.
cross
response 375 of 547: Mark Unseen   May 25 04:15 UTC 2003

FYI, I logged back into the nextgrex machine and re-ran my simple
benchmark.  The one that took 81 seconds on the RAID5 partition (using an
interleave size of 64; I didn't get a chance to try it on the one with an
interleave size of 256) took about 5 and a quarter seconds on average.
Almost a 20 fold speed increase.  Bonnie shows that performance on
a mirror is about 1/3 that of a straight disk.  With another disk,
I'd champion using RAID 1+0, as I'm guessing that would be in the same
general area performance wise as `normal' partitions, while still giving
high availability.  It'd cost another $200 to get another disk to do
it, though.
janc
response 376 of 547: Mark Unseen   May 25 13:51 UTC 2003

Yeah, I think later today I'll make another pass at designing a
RAID-less partition scheme.  This has all been very educational, and
RAID 1 is almost good enough to use, but I don't feel it is quite good
enough.
scg
response 377 of 547: Mark Unseen   May 27 18:51 UTC 2003

I should note that I'm not pushing hard for using RAID.  My impression has
been that RAID is a good thing, all other things being equal, but I don't
know enough about to make a good choice.

What I would object to, and what it seemed to me was being advocated in some
of the earlier arguments, is designing for low availability.  There are all
sorts of things it makes sense to design for in various situations, such as
low cost, low maintenance, high performance, high availability, and so forth,
and declaring one of those to be a high priority generally involves tradeoffs
in other areas.  If cost or performance are determined to be more important
than high availability, I might agree and I certainly wouldn't argue.  It's
not doing RAID purely for the sake of not needing high availability that I
was objecting to, and it doesn't sound to me like that's what's going on here
anymore.
janc
response 378 of 547: Mark Unseen   May 27 20:35 UTC 2003

No, that certainly isn't my thinking here.  RAID costs a lot of disk space,
which we can probably afford.  RAID, at least as implemented in software under
OpenBSD, seems to have a pretty huge performance penalty.  Much bigger than
it theoretically should have.  High availability *IS* the benefit of RAID.
(RAID 0 and well-implemented RAID 1 might give you performance benefits, but
in most version of RAID other overhead will eat any performance benefit). 
I like RAID, but my feeling is that high availability isn't important enough
to Grex to justify its other costs.

I may be wrong.  It may be that this new computer is going to be so fast,
that running Grex will hardly load it, and the performance cost of RAID
wouldn't mean anything to it.  If so, we should consider moving onto RAID
in the future.  I don't think making that change later will be hard.  We
need to rebuild the system every year and a half anyway, and changing the
disks from flat disks to RAID does not have broad implications for the
rest of the system configuration.
 0-24   25-49   50-74   75-99   100-124   125-149   150-174   175-199   200-224 
 225-249   250-274   275-299   300-324   325-349   329-353   354-378   379-403   404-428 
 429-453   454-478   479-503   504-528   529-547      
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss