|
|
| Author |
Message |
| 25 new of 547 responses total. |
janc
|
|
response 334 of 547:
|
May 21 20:09 UTC 2003 |
OK, with a file size of 2000M, I get results from Bonnie. The validity of
these results is, however, questionable, since a lot of the file may have been
in memory instead of on disk.
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
raid 5 2000 9520 6.8 7974 1.3 5706 2.0 50932 62.5 63815 13.0 147.9 1.6
scsi 2000 53754 43.4 54106 14.1 10090 2.6 60326 70.9 61067 11.5 201.2 0.8
We have two lines of results. The first was using the raid 5 array of three
SCSI disk. The second was on a single plain ordinary SCSI disk.
For each test we have the speed and the % of CPU used.
There are three output tests:
Per Char - file written sequentially with 2 billion calls to putc()
Block - file written with block writes
Rewrite - each block read, changed and rewritten
There are two input tests
Per Char - 2 billion calls to getc()
Block - block reads
And a seek test
Seeks - four child processes each execute 4000 seeks and reads. After
10% of these they change and rewrite the block.
So, on writing, RAID was 5 to 6 times slower. Notice that the supposedly
optimum block writes were actually slower than the character writes for the
RAID. The SCSI was twice as fast as RAID on the rewrite test.
On read the RAID array was still slower than the plain disks on the Per
Char reads, but a bit faster on the block reads. It was substantially slower
on the seeks.
Admitting that the benchmark is seriously questionable due to the small size
of the file relative to the large size of memory, this is not at all an
impressive result.
I reran the tests and got similar results.
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
raid 5 2000 9520 6.8 7974 1.3 5706 2.0 50932 62.5 63815 13.0 147.9 1.6
raid 5 2000 8745 6.4 7654 1.3 5717 2.2 51345 63.5 64022 14.6 150.0 1.1
scsi 2000 53754 43.4 54106 14.1 10090 2.6 60326 70.9 61067 11.5 201.2 0.8
scsi 2000 54058 43.4 54618 14.1 10129 2.8 60552 71.0 60865 11.1 203.4 0.9
I suppose the main advantage in performance is in balancing load among multiple
spindles, but this would really only be noticable if multiple processes were
reading/writing the disk at once. With a single process, we aren't going to
gain much. Only in the seek test are there multiple processes, and then only
four.
|
cross
|
|
response 335 of 547:
|
May 21 23:48 UTC 2003 |
Are softupdates turned on on the raid filesystem?
|
janc
|
|
response 336 of 547:
|
May 22 02:31 UTC 2003 |
No. They are not even enabled in the kernel. From what little I understand
of it, it improves performance only with respect to metadata updates -
updating inodes when files are created or destoryed. That wouldn't effect
these benchmarks. I don't get a clear feeling that it is super stable yet
either.
|
cross
|
|
response 337 of 547:
|
May 22 03:29 UTC 2003 |
Every write and every read is also a metadata update (mtime and atime).
Soft updates are definitely stable at this point; they're enabled by
default in FreeBSD. OpenBSD tends to be somewhat more conservative,
though.
Gads; security be damned. Grex would've been better off with FreeBSD.
|
janc
|
|
response 338 of 547:
|
May 22 05:55 UTC 2003 |
Well, I argued that. I have the impression softupdates are more mature in
FreeBSD than OpenBSD. It's not really clear though.
|
janc
|
|
response 339 of 547:
|
May 22 06:30 UTC 2003 |
For the heck of it, I ran eight copies of the Bonnie benchmark simultaneously
on the RAID 5 partition. Below, A through H were started simultaneously.
The last line is just one benchmark process running
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
A 2047 8087 5.9 5867 1.2 331 0.1 1647 2.0 1775 0.4 22.5 0.1
B 2047 1889 1.4 7770 1.4 545 0.2 890 0.9 1646 0.3 14.2 0.1
C 2047 1020 0.8 7038 1.2 417 0.1 1929 2.7 1578 0.3 19.9 0.2
D 2047 8647 6.3 7474 1.3 253 0.1 1905 2.4 4597 1.1 89.4 0.8
E 2047 3997 2.9 6946 1.2 215 0.1 23458 27.9 29250 6.4 155.5 1.4
F 2047 8314 6.2 7149 1.3 369 0.1 1333 1.6 1707 0.3 21.2 0.1
G 2047 8926 6.3 7899 1.4 512 0.2 865 1.1 1132 0.3 15.0 0.1
H 2047 4280 3.2 7861 1.3 458 0.1 954 1.2 1649 0.4 19.1 0.1
raid 5 2000 9520 6.8 7974 1.3 5706 2.0 50932 62.5 63815 13.0 147.9 1.6
They didn't stay well synchronized - you can tell that process E continued
running long after the others had finished (process scheduling doesn't seem
to be very fair). The write speeds didn't suffer too badly from the
competition, but the read times took a terrific beating - they are mostly
around 1/25 of the speed of one process. Note that there were probably some
write processes still running while the read processes were going.
Here's a more sensible test, a comparison against the SCSI and IDE drives,
in non-RAID configuration, with just one process running:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
scsi 2000 53754 43.4 54106 14.1 10090 2.6 60326 70.9 61067 11.5 201.2 0.8
ide 2000 27188 21.7 27038 6.9 9634 2.6 24889 29.9 25640 5.2 99.0 0.8
Seems the SCSI is about twice as fast on most benchmarks, and about the same
on the Rewrite test.
|
gull
|
|
response 340 of 547:
|
May 22 13:07 UTC 2003 |
RAID 5 is always going to be slower than a single disk, especially using
software RAID. There's more processing overhead, and you're doing a
third more reads/writes because of the parity. Still, I'm surprised to
see it 5 times slower. That doesn't seem very acceptable at all.
|
scott
|
|
response 341 of 547:
|
May 22 13:10 UTC 2003 |
RAID would be nice, and if we're making such a huge jump in processing power
then I don't think the performance penalty (assuming it's only 2-1 or
something less) isn't an issue.
|
janc
|
|
response 342 of 547:
|
May 22 13:50 UTC 2003 |
I'm beginning to suspect that some of these some of these fast read times are
coming out of buffers. The drastic crash in read speed when I ran 8 bonnies
could because instead of trying to buffer one 2G file in 1.5G of memory, we
were trying to buffer a total of 16G of files in 1.5G of memory. Some of
these really fast speed (the ones around 50M/sec) are likely being done
largely out of cache. This makes the results pretty meaningless.
Anyway, I ran three simultaneous bonnies on a plain SCSI. I couldn't run
8 becase I didn't have 16 Gig partition.
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
A 2047 15240 16.1 20747 5.0 2018 0.6 3882 4.5 9506 1.7 164.7 0.6
B 2047 16768 13.7 20491 5.3 3016 0.9 4543 5.3 5598 1.1 31.6 0.2
C 2047 16812 13.6 17945 4.6 2977 0.9 4145 4.9 4513 0.8 46.5 0.2
scsi 2000 53754 43.4 54106 14.1 10090 2.6 60326 70.9 61067 11.5 201.2 0.8
scsi/3 2000 17918 43.4 18035 14.1 3363 2.6 20108 70.9 20355 11.5 67.1 0.8
The last line is just the one-process SCSI values divided by three. Notice
the write statistics for the three processes are all pretty close to one
third of the write statistics for a single process. The reads are way lower.
Is this an artififact of buffering? The seeks are a bit hard to tell, because
by that time the processes were pretty much out of synchronization.
The degradation in read performance is similar in magnitude to what we saw
on the raid (keeping in mind that we only have 3 processes instead of 8).
I think there must be a buffering thing going on here. The write statistics
are much better for the RAID - most of the 8 process wrote much faster than
1/8 of the single process.
Note that in both cases, the single processes read faster than they write,
while the multiple process write faster than they read. That's just weird.
|
aruba
|
|
response 343 of 547:
|
May 22 13:59 UTC 2003 |
Jan - can you fool the OS into thinking Grex has less memory than it really
does? Or tell it not to cache disk reads?
|
janc
|
|
response 344 of 547:
|
May 22 14:31 UTC 2003 |
Re #341: We are certainly taking a huge jump in processing power, but the
disk I/O performance improvement, while good, probably isn't as spectacular.
Disk speeds just haven't been growing as fast as processor speeds, and old
Grex's disks aren't nearly as old as it's processor. So the performance jump
in disk I/O from old Grex to new Grex might not be that huge. (Maybe I
should run some benchmarks on old Grex to compare with - will everyone please
log off?). I expect the new Grex will have memory to spare, cpu to spare,
disk space to spare, but maybe not disk bandwidth to spare (and certainly not
net bandwidth to spare).
I think the main benefits of RAID are:
- Availability. If a disk dies, the system can keep running. Performance
degrades, but it still works. If you have a hot spare disk, it can
be brought on line, replacing the dead disk, without interruption in
service.
I do not consider this very important to Grex. We can afford short
downtimes in the case of disaster.
- Data Protection. If a disk dies, the data on the drives is not lost.
This is important to Grex. However, it can be achieved other ways.
We could do daily rsync's from /var, /bbs, /home, and /etc to the IDE
drive (or even another machine). You might copy certain critical files
(/etc/passwd) more frequently. This has a performance penalty, of course.
In the case of a crash, your backup will not be fully up to date, so there
will be some data lose, but it should be tolerable. In the case of
accidental (or deliberate) deletion of data, this gives you a much better
safety net then RAID, so much so that we'll want to do at least some
of this even if we have RAID.
- Performance. RAID can balance the load over the drives nicely.
Yes, but so can ccd (pretty much equivalent to RAID 0).
So this doesn't really make a strong argument for RAID. However, there is
a bit of a flaw in the above break-down. These three aspects are not fully
separable. Suppose we merge our three SCSI drives into one big virtual ccd
drive and parition it up. Load balancing over the drives should be great.
Then one SCSI drive fails. You just lost a third of your data, scattered
randomly all over the system. Though you still have the other two thirds,
but doing anything with it is going to be a nightmare. Effectively a single
drive failure cooks all your data, instead of 1/3 of your data. I don't think
the performance improvement given by ccd or RAID 0 is worth the increased
risk of losing the whole system.
So I think the real alternative to RAID is what I originally proposed -
simple partitions, scattered across the drives in an ad hoc manner in hopes
of balancing the load across the spindles, with rsyncs to the IDE drive
for data protection.
I'm really starting to feel that might be the best choice. The advantages
of RAID for Grex are faint enough so that they don't quite overwhelm the
KISS factor in my estimation.
|
janc
|
|
response 345 of 547:
|
May 22 14:33 UTC 2003 |
Re 343: probably - but I'm not sure how. I thought of just creating a
RAMDISK and letting that eat up much of the memory (I could also run the
benchmark on a ramdisk, which might be interesting), but it looks like
you need to do a lot of kernal work to bring up a ramdisk, and I'm
insufficiently motivated.
|
cross
|
|
response 346 of 547:
|
May 22 14:37 UTC 2003 |
One can lower the amount of memory the kernel will use for caching by
mucking with the kernel. It looks like, when caching is taken out of
the picture, performance between RAID and the straight SCSI disks is
more or less on par?
|
janc
|
|
response 347 of 547:
|
May 22 14:48 UTC 2003 |
Hmmm...the faq (http://www.openbsd.org/faq/faq11.html) talks about the
BUFCACHEPERCENT kernel value. It says the default is 5%. I haven't touched
it, so if I'm reading this right, there should be 75M or less of disk cache.
Hmmm...Linux uses all free memory as disk cache. A much nicer setup.
Well, if that's the case then I'm not sure what make those benchmark numbers
so goofy.
|
gull
|
|
response 348 of 547:
|
May 22 17:12 UTC 2003 |
Re #344: I think that's starting to make sense, yes. Unless it turns
out the performance hit you're seeing is an artifact of your testing
method, we may be better off going with using the disks "straight".
Getting only 20% of the potential performance of the disk subsystem in
exchange for easier recovery on the rare occasions when we have disks
fail doesn't seem like a good tradeoff. I'm still having trouble
believing RAIDframe is *that* inefficient, though.
|
cross
|
|
response 349 of 547:
|
May 22 18:20 UTC 2003 |
So am I; it seems unreasonably slow, and it looks vaguely like the
numbers start to converge when you have many processes working at
once, which is the normal mode of operation. I'd be interested in
seeing what a test simulating a timesharing load would be like.
|
lk
|
|
response 350 of 547:
|
May 23 03:37 UTC 2003 |
One simple way to "fool" the kernel into thinking that NextGrex has less
memory is... to remove all but one memory module. Guaranteed to work. (:
You might also want to test mirroring. Might be more efficent (less CPU
utilization for striping and no extra parity data) while offering both
availability and redundancy. The "cost" here is 50% drive overhead.
The boot disk, with the system partitions (and /tmp or was that IDE?)
could be one disk while the other pair could be mirrored.
|
janc
|
|
response 351 of 547:
|
May 23 13:49 UTC 2003 |
I'm reluctant to take the machine appart for such purposes. Anyway, I'm a
software guy.
I certainly agree that we need better benchmarks, but I'm not sure how to
obtain them. Anyone with better ideas is welcome to suggest them. Those of
you with accounts on the system can probably run them yourselves, as the
relevant disk partitions are permitted 777. We really want to get some sense
of how RAID would effect a realistic multi-user load.
I tried running a benchmark with a really small file, one where you should be
getting lots of use from cache. Here's the 50 MB and 2000 MB results. Explain
this, if you will:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
raid 5 50 24882 20.9 22641 3.5 3731 1.2 7555 9.6 64346 14.7 511.7 3.1
raid 5 2000 9520 6.8 7974 1.3 5706 2.0 50932 62.5 63815 13.0 147.9 1.6
The small run has much faster output, and significantly faster seek times.
The block read is about as fast as the large file (suggesting that it is
mostly reading from buffer). But what's going on with the per char read?
Note that the sequence of the tests is:
Per Char Output
Rewrite Output
Block Output
Per Char Input
Block Input
Seek
So it may be that the Per Char read was from disk, but left the entire file
in cache, so the block read was then very fast. But why wouldn't it already
be in cache after the block output? And why would the same speed be
achieved on the Block Read with the 2M file, which can't have all been
in cache.
I don't think I know enough about how buffering and disk I/O works in openBSD
to really interpret this stuff.
|
aruba
|
|
response 352 of 547:
|
May 23 14:06 UTC 2003 |
The information on the Bonnie web page (http://www.textuality.com/bonnie/)
makes it sound like the tests are designed to correct for caching. There's
some info there on how to interpret the results.
|
janc
|
|
response 353 of 547:
|
May 23 15:00 UTC 2003 |
Maybe to help more people figure out what is being discussed here,
I should give a brief over view of RAID.
RAID stands for "Redundant Array of Inexpensive Disks" (the I-word
varies). Some wrote a paper once upon a time survey various options for
putting a lot of small disks together, and named the variations RAID 1,
RAID 2, RAID 3, RAID 4, and RAID 5. The RAID 0 name was coined later and
isn't really RAID. The interesting ones are RAID 0, RAID 1 and RAID 5.
I'll also discuss RAID 4 because understanding it makes RAID 5 easier
to understand.
Suppose you needed a 100 Gig disk, and all you had was ten 10 Gig disks.
Well, you could put them all together in a box, and write a little
controller that would write the first 10 Gig to disk one, the next 10
Gig to disk two and so on. To the computer, your box would look like
a single disk.
The performance of this disk array wouldn't be so hot though. Most
programs access file sequentially, so as the 100 Gig file was read,
we'd first have disk one very busy, while the other nine sit idle,then
disk two would be busy, and so forth. It'd be nice to balance the load
among the disks.
Which brings us to RAID 0 - also known as striping. We slice the disks
into 32K chunks. As you write a big file to the disk, the first 32K
goes to disk one, the second 32K to disk two, on through the tenth
32K chunk going to disk ten. That completes a stripe. The eleventh
32K chunk goes to disk one again. This balances the load over all ten
disks, so you get better performance. You can vary the chunk size for
different applications.
So RAID 0 gets you a large virtual disk and balances load over your
drives. It doesn't give you any increase in reliability. Quite the
contrary. If a drive dies, than instead of losing a 10Gig hunk of data,
you lose lots of 32K hunks of data scattered through all your data.
This is probably harder to restore.
Load balancing over multiple spindles would be nice for Grex, but not
vital. We don't have just a single process reading the disk sequentially.
Increasing the difficulty of reconstructing the file system after a
disk crash is too high a cost to pay for slightly better load balancing.
I think we can rule RAID 0 out as an option.
There is no Reduncancy in RAID 0 (so it should be called "AID 0").
Real RAID starts with RAID 1 - also called "mirroring". We are still
trying to make a virtual disk out of many real disks. This time we'll
group our ten 10Gig disks into five pairs, disk 1A, 1B, 2A, 2B, etc.
Whenever we write data to disk 1A, we also write a copy of the same data
to the corresponding location on disk 1B. The first obvious effect is
that our virtual disk only contains 50 Gig instead of 100 Gig. But now,
if disk 1B dies, we have an up-to-the-nano-second backup copy. We can
replace the disk 1B with a new disk, copy the contents of disk 1A onto
it, and be back up and running with no loss of data.
Ideally, in RAID 1, we'd do the writes to the two disks simultaneously,
so writing is no slower than reading. (In software implementations of
RAID 1, this may not entirely work.) On reads, we don't have to read
from both disk. We just select the one that is less busy at the moment
and read from that. So, we get decent performance and the capability
to survive a single drive failure, but at the cost of half our disk space.
I've heard of RAID 0+1, but not read much about it. I assume it's just
striping over the 5 pairs of mirrored disks in the example above.
RAID 4 is an attempt to get the same benefits as RAID 1, but with less
loss of disk space. This time we call 9 of our disks "data disks" and
the other one a "parity disk". Parity just means "even" or "odd". The
129th bit stored on the parity disk depends on the values of the 129th
bit stored on the other nine drives. If an odd number of those nine bits
are 1's, then a 1 is stored at that location on the parity disk. If an
even number of them ar 1's then a 0 is stored at that location on the parity
disk. In geek terms, the content of the parity disk is just a bit-wise
exclusive-OR of the contents of all the other drives.
Suppose a drive dies. If it was the parity drive, we can just recompute its
value from the other drives. But what if a data drive dies? Well, we have
all the other drives and the parity drives. So for each bit we have something
like:
data1 data2 data3 data4 data5 data6 data7 data8 data9 parity
1 0 1 X 0 1 0 0 1 1
The parity bit is 1, so we originally had an odd number of 1's on the
data disk. There are 4 ones on the surviving drives, so the bit on the
dead drive must have been 1. (In fact the dead drives contents are just
the bit-wise exclusive-OR of all the surviving data and parity drives, so
the reconstruction process for a dead data drive is identical to the
reconstruction process for a dead parity drive).
So, this is cool. We now have a virtual drive holding 90Gig of data, so
we've lost only 10% of our storage, and we can still reconstruct all the
data on any lost drive.
There are some additional performance costs though. The first problem is
the parity drive. Every time you write data to a drive, you have to update
the data on the parity drive. So though data writing is split over nine
drives, parity writing is all on one drive, so that drive is nine times as
busy as the other drives. It becomes a performance bottle neck.
The solution to this problem is RAID 5 - stripe the parity data over all the
drives. For example, the parity data for the first 32K of all the drives
would be on drive 1, the parity for the second 32K of all the drives would
be on drive 2, and so on. So there is no one parity drive and parity is
spread over all disks. (Note that disk reconstruction doesn't change -
you still just exclusive-OR all the other drives to reconstruct get the
lost drive.)
There is a second performance hit in RAID 4 and 5 though. Like RAID 1, every
write is to two drives - data to one drive and parity to another. However,
before we can write the parity, we have to compute the parity, and that means
we need to read the corresponding data from the other eight data drives. So
a simple write turns into 8 reads and 2 writes.
Also, in RAID 1, we were able to improve read performance by always reading
the data from the less busy drive of the two that had the data. In RAID 4
and 5, the data is only one one drive, so we can only read it from that drive.
However, we like to assume the striping in RAID 5 will balance the load among
the drives pretty well anyway.
There are lots of hardware RAID devices that optimize this kind of thing, but
we can't afford them. The option we are considering is software RAID, which
is implemented in the OpenBSD kernel by a program called RAIDframe. It's
pretty solid and rather nice. You can set up a RAID array, possibly with
spare drives. If a drive fails, and there is a spare on-line, it will
automatically bring the spare on line, reconstruct the lost data and proceed
without interuption of service. If there are no spares, it'll run with a
drive short (in RAID 5, any read from the dead drive is simulated by reading
from all the others and exclusive-Oring them). This is all terrific if you
need a server up 24x7, which Grex doesn't really.
Note that the redundancy in RAID gives you some protection against single
disk failures (it's assumed that you do something before the second disk
dies). It does not replace a backup. If you accidentally delete the wrong
file, or a vandal breaks in and alters all your files, the RAID will give
you nice redundant copies of the altered files, not the original ones.
So RAID is not a subtitute for backups. It's protection against hardware
failure and that's all.
RAID 0 can give you some performance enhancements by load balancing. The
other versions of RAID are all likely to be slower than a non-RAID setup,
especially if implemented in software. RAID 0 doesn't cost you any disk
space. The other versions are going to eat up some of your disk space.
In our case, since we have 3 drives, RAID 1 doesn't quite work and RAID 5
would eat up 1/3 of our disk space.
|
janc
|
|
response 354 of 547:
|
May 23 15:03 UTC 2003 |
OK, that wasn't so brief. But writing it just make me more sure that RAID
isn't right for Grex. The problem it is primarily designed to solve isn't
an important issue for Grex.
I may do some experimenting with rsync, and see if I can get a sense of how
expensive it would be to regularly rsync to the IDE disk.
|
gull
|
|
response 355 of 547:
|
May 23 15:43 UTC 2003 |
Where I work, we use rsync to keep a mirror of about 50 gigs worth of
data. We're doing it across the Internet, via a T1, as well. It does
cause a fair amount of disk thrashing on both ends when it figures out
what files need to be transferred (very much like doing a 'find' across
the filesystem) but overall it seems very efficient. It's worked well
for us. My guess is the "expense" of doing an rsync to another local
disk a couple times a day is going to be pretty low, especially since
you're not transferring over a network and so won't need to involve ssh
or compression.
|
cross
|
|
response 356 of 547:
|
May 23 15:51 UTC 2003 |
I ran a bench mark last night; one of my own design. It's nothing really
fancy or scientific; I wrote it a few years ago to try and get a feel for
how various disk subsystems and filesystem times handled a load I thought
was fairly typical of timesharing style machines. Basically, it just
copies a bunch of 32KB files all over the place.
Running on both the IDE and SCSI drives took about 4 seconds. Running on
the RAID took around 80 seconds.
Something is wrong here; there's no reason RAIDframe should be *20 times*
slower than a `normal' filesystem, I just can't believe it's that bad.
Perhaps I'm wrong about the stripe size; maybe 64 is just to small. Jan,
could you up it to 256 and see if that helps any? I see at least one
post from someone who says they used an interleave size of 168 and got
decent performance, but 32 (and probably 64) was too small.
|
janc
|
|
response 357 of 547:
|
May 23 16:02 UTC 2003 |
Right. I installed rsync from the ports tree (I like the ports tree).
I then went to /sd0 (the test partition on the first scsi disk) and did
time rsync -ax /usr .
This should copy the whole /usr partition from the IDE disk to the SCSI
disk (which is backwards from the direction we would be going) and give
me some statistics. The /usr partition contains 664,632K of data. The
result from 'time' was;
12.0u 24.5s 4:28.34 13.6% 0+0k 161046+660947io 36pf+0w
So it took 4.5 minutes elapsed time, eating 13.6% of an otherwise idle CPU.
I than reran it. In this case it should be checking the two copies against
each other, and copying over only what changed (little or nothing). The
time result was:
3.8u 3.5s 0:46.13 16.1% 0+0k 47000+1454io 1pf+0w
This took 45 seconds.
In real life we'd want the --delete option on the command, so files that
don't exist on the source are removed from the copy, but I didn't do it in
the test because I was paranoid about getting the arguments backward. Even
so, we'd want our target partitions rather larger than the source partitions.
Maybe just one big target partition instead of separate ones corresponding
to the different source partitions, the whole thing readable only by root
and possibly unmounted when it isn't being updated.
Doing this a couple times a day seems a much lower impact way to data
reduncancy than RAID.
It'd be tempting to keep two copies of some partitions, and update them on
alternate days. Dunno if that's necessary.
This is not a substitute for real backups to tape, of course.
|
janc
|
|
response 358 of 547:
|
May 23 16:04 UTC 2003 |
Dan slipped in. I'll try reconfiguring the RAID.
|