Grex Oldcoop Conference

Item 397: Grex needs to buy a new disk, pronto

Entered by steve on Mon Feb 19 04:28:25 2007:

   Grex has a failing disk at the moment.  I'm pleased to say that
it didn't just crap out as we've had in the past, but it's definitely
sick and we're living on borrowed time.

   This last Saturday I spent time on Grex first making backups of
nearly al the system, and then replaced the failing disk with the
replacement disk we got from Seagate when we had our last disk
disaster.  This replacement was a "certified repaired" disk we got,
which of course was certified bad--in the process of restoring
data to our new disk all sorts of random errors cropped up, and
playing with it more only revealed more weirdness.  At this point
it was getting close to 7pm, so I put the original dying disk
back in service.

   So once again we've managed to skirt around a disk disaster,
at least today.  We need to get a new disk, and soon.

   We can't get an 18G disk, they aren't made any more.  We can
still however get a 36G disk, just like last time, for about $250.
This isn't a bad thing, as the three partitions on the disk are
/tmp (4g), /c (5g) and /var/mail (8g).  Having a 36G disk there
would mean we could have a 24G /var/mail partition, so we could
hold more spam. ;-)

   We've been talking about getting a raid system, so in a way
this is spending money only to change things later, but I think
we don't have much of a choice here.  We need a replacement now,
and given the problems with the lack of /var/mail space, getting
a 36G disk makes a lot of sense.  Add the fact that "certified
repaired" disks are all too often not, getting a new disk is the
most reasonable thing.

   I sent mail to Leeron Kopelman to see if his place still
sells disks, since we've gotten things from him in the past.
If he doesn't Newegg.com has them for $250.

   We need to act on this in the next day or two.  We're
being given extra time here.  When we get a replacement I'll
take time off from work to install it if I have to.
26 responses total.

#1 of 26 by cross on Mon Feb 19 04:58:36 2007:

It occurs to me that the disk that is dying is sd2.  However, grex has
plenty of reserved space on sd1 to take over the duties of sd2 until we
could implement a more robust disk storage subsystem.

In particular, currently, /b is empty.  We could dump the contents of /c
into /b (for a neglibible overall reduction in space) and remount /dev/sd0k
(which currently holds /b) on /c.  Similarly, we could dump the contents of
/tmp into sd1e, which is currently being mounted as /alt/usr (and which
isn't likely to change that much over the next few weeks) and remount that
as /tmp; as it is, /tmp is ridiculously oversized and while the partition
we'd copy it onto is only one quarter the size of what we have now, we'd
still be close to 0% utilization on it.  Finally, we could dump /var/mail to
sd1f, which is presently mounted as /alt/usr/local (again, not likely to
change drastically over the next couple of weeks), and remount that onto
/var/mail; that partition and the current partition are close in size.

To summarize:

       CURRENT                        GETS REMAPPED TO
/tmp      (/dev/sd2a)           /alt/usr       (/dev/sd1e)
/c        (/dev/sd2d)           /b             (/dev/sd0k)
/var/mail (/dev/sd2e)           /alt/usr/local (/dev/sd1f)

This increases the load on sd0 and sd1, but only for a short time until we
can get and configure a RAID system and it saves us $250.  Plus, this is
something we can do *now*, instead of waiting for a new disk to be
delivered, someone to install it into grex, partition it, newfs it, etc.


#2 of 26 by cmcgee on Mon Feb 19 13:33:57 2007:

Thanks Steve for spending so much time, and for being willing to take time
off to make sure we stay running!


#3 of 26 by janc on Mon Feb 19 15:49:59 2007:

My vote as board member and staff is to purchase immediately:
  (1) new disk drive as STeve recommends.
  (2) a DVD-W drive for Grex.
The DVD-W drive is to make backups easier, and ordering it at the same time
is so that we can install both drives at the same time.


#4 of 26 by slynne on Mon Feb 19 16:12:56 2007:

That sounds like a good idea to me. I am fully in support of that


#5 of 26 by cross on Mon Feb 19 16:20:12 2007:

I wonder why we want to buy a new disk when we can use the disk we already
have and start moving towards a RAID solution.


#6 of 26 by cross on Mon Feb 19 16:26:11 2007:

Regarding #3, #4; Is there a reason why either of you disagree with #1?


#7 of 26 by nharmon on Mon Feb 19 16:47:30 2007:

The refurbed drive isn't warranteed?


#8 of 26 by steve on Mon Feb 19 17:35:19 2007:

    Dan has a most excellent idea.  I am abashed to say that I had
forgotten all about the /b partition.  I'm used to thinking of /b
as the picospan code, rather than a partition for users.

   With that, I think Dan is right and we have the space to make
the alt partitions usable for other things.  The /tmp space would
be 1/4 the size, but I think we can live with that for the time
being.  /var/mail would be within a few percent of its original
size, and moving /c to /b is about the same thing.

   Thanks Dan -- I think we can do this.  Let me do work work
for a bit as I ponder this; if it didn't work out we can always
get a new disk.


#9 of 26 by cross on Mon Feb 19 17:40:02 2007:

No problem, Steve!  My pleasure to help out!  If you need any backup, and
there's anything I can do, please let me know.  I'm home sick and crawling
the walls with boredom.  :-)


#10 of 26 by drew on Mon Feb 19 21:11:09 2007:

$250 for 18G sounds excessive to me, even for Scuzzy.
Best Buy has hard drives on sale this week:

    160GB Westerd Digital EIDE or SATA, $59.99
    320GB WD (probably EIDE) for $109.99
    250GB Seagate for $99.99
    Instant savings, no rebates involved.

I've been happy personally with Western Digital.

Don't modern motherboards have built-in EIDE controllers? Get a
160GB drive from Best Buy, put it in, and move the whole system
to /dev/hda[1-n].


#11 of 26 by steve on Mon Feb 19 21:23:19 2007:

   We've been using scsi disks because of their speed; the ones
we have are 15K rpm.  When I was testing stuff, I was getting 
about 70M/sec transfer rates.  To contrast that with my laptop
(udma mode 5), I can get about 42M/sec via dd.  These are also
U320 disks; we have a U160 controller currently, but if we 
decided to stay with scsi we could get a u320 disk controller
and have better disk i/o.


#12 of 26 by ric on Wed Feb 21 14:39:25 2007:

If y'all don't mind me asking...  $250 *IS* outrageously expensive.. why don't
you hit ebay for a replacement drive?  Thre are *MANY* listings for 18 gig,
15k RPM U160 drives on ebay.  


#13 of 26 by other on Thu Feb 22 03:52:26 2007:

A study has just come out based on real-world usage of a vast array of disks,
and one of the conclusions was that failure rates between commercial and
consumer grade drives did not substantially differ. Does this mean there are
more inexpensive disks we should consider purchasing?


#14 of 26 by cross on Thu Feb 22 04:03:18 2007:

Interesting, but believable.  Eric, do you have a cite?


#15 of 26 by mcnally on Thu Feb 22 05:41:20 2007:

 There're two papers he could be talking about.  Both made Slashdot headlines
 in the past couple of days; one was from Google and the other was from some
 large research consortium if I remember correctly.


#16 of 26 by other on Thu Feb 22 16:09:49 2007:

Them's the ones. I only saw the lead in my RSS reader.


#17 of 26 by ric on Thu Feb 22 19:39:15 2007:

I wanna get me a couple of those 'perpendicular' hard drives like the
barracude 7200 10...


#18 of 26 by steve on Sat Feb 24 23:59:42 2007:

   The study Eric is talking about compares the normal disks to 
"extended duty" disks.  IBM's travelstar disks in laptops were
like that.  The bottom line is that all of them are getting
better, such that the differences between those two flavors
didn't amount to much.

   However, the type of disk does matter.  You can see it today
in the length of the warranty offered.  IDE disks are typically
1 year warranty.  Their rock-bottom price coupled with ever
increasing performance meant something had to give, and that
was, sadly, quality.  SCSI disks are a lot more, offer much
better transfer rates (well, they used to) and had better
warranties.  The newest kind of disk, SATA are interesting:
they are cheap, have some pretty decent transfer rates, and
have a failure rate in the field of about 0.5%.  I'm trying
to get that study so I can post it.  SCSI disks of the kind
we have are still the fastest disks, in that they rotate at
15K rpm and are ultra-320 speed, for 300Mb/sec rates.  But
they cost a *lot* more, and I'm not sure that Grex's next
generation of disks needs to be SCSI any more.  Time
marches on.

   We're now off of the defective sd2 disk, using other
partitions that wern't used.  Thanks to Dan for that
idea, as now we have a little breathing room without
spending money on another scsi disk.


#19 of 26 by cross on Sun Feb 25 04:10:01 2007:

Here's a pointer to Google's study.  Most of the disks in use in google data
centers are serial and parallel ATA.
http://labs.google.com/papers/disk_failures.pdf

Here's a speed comparison chart Seagate has put together; a 7200RPM SATA
Barracuda is somewhere between a SCSI 10K RPM Cheetah and a SCSI 15K Cheetah,
except that the Cheetah's are between two and three times faster in terms of
access time.  I think a RAID controller with a lot of cache memory would
amortize most of that difference.


#20 of 26 by cross on Sun Feb 25 04:10:45 2007:

Whoops, here's the pointer to the Seagate page:
http://www.seagate.com/www/en-us/support/before_you_buy/speed_consideration
s/


#21 of 26 by jared on Wed Mar 7 19:05:05 2007:

re#11
Laptop drives are some of the worst to test against because they typically
run at lower rpms (even as low as 4200rpm) to keep noise and heat at a much
lower level.

I've been getting cheap disks for my hosts from 3btech.net (located in
Indiana and free ground shipping) for several years now without any
failures.  I typically buy the OEM and/or Refurb disks and use them
for my backup solutions for cheap storage.

http://3btech.net/ideover160.html

We could also use an ATA or SATA hardware raid controller (I have a SATA one
i could donate) to do raid 0+1 across 2 or 4 disks.  Even if they're slower
discs, you will see better than the 70MB/s out of the SCSI disc if you have
multiple spindles and do round-robin reads.

I've also stopped partitioning my systems quite as much as grex
currently is partitioned.  While I agree on a public host you need to
divide things up some, because we're not talking about 20MB disks
these days, going with something like a set of 250G "white label"
(refurb/OEM Western Digital) drives for $56 each would give another ~500g
of space for around $250 (buy 4, plus an ide raid controller, cables, etc..)
of mirrored space.


#22 of 26 by arthurp on Tue Mar 27 13:46:35 2007:

Yep.  Hardware mirroring with hot spares, and good OS support would be
the way to go.  Speed increase on reads.  Auto reliability.  Cheap.

I like 3btech as well.


#23 of 26 by maus on Fri Mar 30 02:24:21 2007:

Is 3btech a vendor that you order through? If I have been looking at
references to the right group (http://3btech.net), they appear to be a
vendor with a very strong reputation. A strong reputation nearly always
beats a bargain, IMHO. Can you get an estimated quote on the stack of
Serial ATA drives and the RAID board and the cage and cables that we
were looking at? 



#24 of 26 by krokus on Mon Apr 2 18:10:54 2007:

I guess that depends on if it's a good reputation, and how severe
the bargain is.


#25 of 26 by ric on Sat May 5 03:57:42 2007:

I'm not sure that TRANSFER SPEED is the most important factor for Grex.  It's
probably more about access time, because Grex is accessing thousands of
relatively small files every second.


#26 of 26 by tsty on Tue May 22 07:55:19 2007:

do we still need didks, controller(s)?


There are no more items selected.

You have several choices: