Grex Coop10 Conference

Item 54: Buy more 4/670 parts?

Entered by janc on Wed Nov 26 17:49:49 1997:

In the past, we've always made sure that we had enough extra bits and pieces
of computers so that if we had a hardware failure, we could swap spare parts
in and get Grex running again quickly.

After we shift to the 4/670, we won't have any spare parts, except a few spare
memory chips.  If the 4/670 fails, we will have to fall back to the 4/260.

This isn't a totally simple fall back.  The 4/260 is a slightly different
architecture than the 4/670.  It runs most of the same binaries, but not all.
So if we have to fall back, we need to make software changes as well as
hardware changes.  We will keep around the current 4/260 disk partitions, but
unless we continue to maintain those, they will rapidly become obsolete.  So
if we have a serious hardware failure, it is likely to take us as long as a
day of downtime to get the 4/260 back up.

Options:

(1) Start shopping for spare 4/670 parts:

      motherboard - this contains scsi controller, sockets for the
        first 128Meg of memory, the ethernet interface, and several serial
        ports.  We have only one of these.  I haven't found any prices.
        I think they may be about $500-$600.  (it's called 501-1686 or
        501-2055).

      cpu board - this is a little piggyback board that contains two CPUs.
        The motherboard can take two of these.  We have one.  We originally
        budgetted money for two and bought two, but one of them turned out
        to have only one working processor on it.  I think Rob never charged
        us for this and still has it.  The money to buy a spare was in the
        4/670 budget.  If we buy one, we not only have a spare, but we can
        experiment with running Grex on 4 processors instead of 2.  (Actually,
        using Rob's half-dead board to run on three processors might work
        better than either - we should probably try to acquire that card).
        (I think the ones we have are SM100's, aka 370-1388 - prices seem to
        range from $150-$450).

      chassis:  The chassis we are running the 4/670 in is a newer model
        than the other three we have.  We don't know if it can be run in
        the older chassis.  My guess is that it can be, especially after we
        change over to the terminal server.  After that the 4/670 will be
        essentially a single-board computer, so the only thing it wants from
        the bus is power.  So I don't think there is a hurry to buy a spare
        chassis.

      memory:  In the old Grex, memory chips weren't socketed, so we had to
        have whole spare memory board.  Now we have SIMMs, so we mainly
        just need spare SIMMs.  We have two or three, I think.  That should
        be fine.

      memory board:  The first 128Meg of memory is socketed in the mother-
        board.  To expand beyond that, we have a memory board.  We haven't
        got any memory to put in it yet, and haven't budgetted for any yet.
        Getting a spare for this is probably not a priority.

     So basically, we'd need to budget money for a motherboard spare.  We
     should buy a cpu spare.  We should keep an eye out for good deals on
     other stuff.

(2) Continue using the 4/260 as a backup machine.

     This means staff needs to improve procedures for maintaining both the
     4/260 and 4/670 software suites.  In theory this shouldn't be too
     difficult, since the differ mainly in the kernels and in a few programs
     that interact closely with the kernel (like "ps").
37 responses total.

#1 of 37 by jep on Wed Nov 26 19:58:24 1997:

You could ask on Usenet News and at a SemiSLUG meeting for donations.  I 
don't know how current this computer is; if it isn't very current, 
someone might have one lying around that they'd be willing to give to 
Grex.


#2 of 37 by other on Thu Nov 27 05:43:14 1997:

I support the notion of parallel maintenance of both machines, especially
since it will be much simpler to do the same or similar changes on both
machines at one time (at the cost of a little extra time) than to try to play
catch-up in a crisis.  Some analysis and discussion to determine the
availability of the necessary time should take place, and if necessary, we
should consider adding more staff.  Certainly this scheme seems to make the
most sense until either we have the necessary parts or a greater sense of the
reliability of the current ones.

I'd like to learn more about programming in the unix environment, though my
background is fairly limited (old BASIC and HYPERTALK, the scripting language
for the Mac HyperCard program, with a very little introduction to HTML).
I've also dabbled with a little scripting in my home directory to facilitate
some frequently used commands...


#3 of 37 by mta on Fri Nov 28 00:44:31 1997:

The only problem with "adding more staff" is that the staff is always on the
lookout for talent and if they knew of someone appropriate and interested,
they would probably have recommended them to the board by now.

Of course they may have their eye on someone who is still in the consideration
period, but if so, I haven't heard about it, so it must be early on.

We want to be careful about adding staffers since no one likes having it not
work out.


#4 of 37 by other on Fri Nov 28 06:43:35 1997:

understandably.  i just want to avoid taxing existing staff beyond burnout.
we place a lot of demands on them, and this is just another one...


#5 of 37 by mta on Fri Nov 28 16:26:40 1997:

Oh, I agree.  I think everyone on staff does.


#6 of 37 by dang on Wed Dec 3 02:04:57 1997:

I'd like to see enough spare hardware to keep the 670 running.  This means,
as Jan said, a spare motherboard, and a spare CPU.  We already have a spare
ALM, for when we will be using that, and a spare terminal server, for when
we will be using that.  Among other things, keeping the old computer in synch
as a hot backup makes it harder to use it for anything else, and I'd like to
see us use it for something.


#7 of 37 by valerie on Thu Dec 4 16:52:00 1997:

This response has been erased.



#8 of 37 by ajax on Tue Dec 16 18:00:31 1997:

SM100s should cost no more than $100, and could be $50 if you find a
deal.  Motherboards are a little hard to come by separately, but you
might find a good deal on a cheap system, if you keep up with Usenet
ads.

If you read Usenet through DejaNews, I'd suggest creating your filter
for the newsgroup misc.forsale.computers.workstation, covering the
last couple months, then for the subject search after creating the
filter, use "'600mp' or '630mp' or '670mp' or '690mp'" as search
criteria.  An example "package" deal listed from this June:

> Sun Sparcserver board out of a 670MP, includes 2 dual cpu
> boards (ross 40mhz) , GX framebuffer, 64 megs of ram, and
> keyboard & mouse.  I would like to get $750 for this.
>
> The case may be available with a 1gig drive and cdrom.

I'm sure we purchased a couple spare SIMMs before.  I reimbursed
Grex for the bad SM100 board, intending to return it, although
since I didn't, I'll give that back as a freebie...it did seem to work
in the 670 with only one functioning CPU, so it should work as a
hobble-along backup processor.  Stuff I've read says running two
SM100s (i.e. four processors, since each SM100 has two CPUs) with
SunOS is usually slower than using one SM100, though it depends on
the type of load on the system.

If you buy another processor, you may want to consider an SM41 or
SM51 (I've forgotten precisely what submodels are compatible with
SunOS), rather than an SM100...I think SM51s might be around
$150-200 each.


#9 of 37 by kaplan on Tue Dec 16 19:52:17 1997:

My understanding of the bad CPU is that it is the secondary CPU which works
and the primary one which does not work.  So this board is a valid backup for
the second CPU, but we would still need a backup primary CPU.


#10 of 37 by janc on Wed Dec 17 20:32:35 1997:

Right.  My memory of the bad SM100 was that the first CPU on it was bad.  The
4/600 boots up on the first CPU, and only starts using the second one later.
I think that means that we could not boot with the bad SM100 card as the only
CPU card.  However, the card may be repairable.  It could be something as
simple as a bad trace on the PC board.  Maybe someone who understands such
things can look at it.

I think Birdsall's Sun Hardware Reference says that SM41 and SM51 modules can
only be used with Solaris 2.something.  That means that to use those, we'd
have to upgrade from SunOS to Solaris.  This may be something we want to do
someday, but it won't be simple.

So I think we should stick with SM100's for now.

Steve Weiss recently ordered and installed another 16 4M memory chips for the
4/670.  It is now running on 128Meg of memory.


#11 of 37 by kaplan on Thu Dec 18 14:09:35 1997:

I assume we bought a license to run SunOS4 when we bought the 
hardware.  Technical questions aside, how much would it cost to license 
a copy of Solaris 2.x if we were thinking about upgrading?


#12 of 37 by ajax on Sat Dec 20 04:01:37 1997:

You can run SunOS 4.1.3 on SM41s and SM51s, but you need
boot PROMs of a certain version (2.8v2 for SM41s and 2.10
for Sm51s), and I'm not positive if you can run multiple
processors with those CPUs.  But having the same processor
module would be best for trouble-shooting, too.  I'll keep
a lookout for SM100s.

By the way, this is a hardly-related tangent other than
being hardware Grex might want, but the latest Corporate
Systems Center catalog (from which Grex got some 2 gig
HP drives) lists an 8.5 gig Micropolis for $300...5.25"
FH SCSI2, new w/warranty.  Seems like a decent deal.


#13 of 37 by scg on Sat Dec 20 07:35:04 1997:

Is somebody else covering Micropolis's warranties now that Micropolis is going
out of business, or is the warranty still as worthless as it was when
Micropolis was in business?  (of the two Micropolis disks that Grex sent back
for warranty replacement, neither was ever seen again)


#14 of 37 by ajax on Sat Dec 20 13:16:16 1997:

Didn't know they were going under...that would explain the
decent price!  The catalog lists a one year warranty, but
not by whom.  You could ask CSC by phone (408.743.8770) or
check their web site (www.corpsys.com) for an e-mail address.


#15 of 37 by valerie on Sat Dec 20 14:05:43 1997:

This response has been erased.



#16 of 37 by tsty on Sat Dec 27 23:00:21 1997:

however, it sounds as if grex has already experienced two not-good
disk deals from micropolis - beware of the third?


#17 of 37 by lilmo on Sat Jan 3 20:09:26 1998:

Have we had other disks from them for which we had no problems?


#18 of 37 by valerie on Thu Jan 8 04:24:13 1998:

This response has been erased.



#19 of 37 by dang on Thu Jan 8 22:18:19 1998:

both of our curent 2 gig disks are HP's.  We have a 1.5 or so gig disk that
I don't know about.


#20 of 37 by lilmo on Tue Jan 13 02:20:49 1998:

Unless we get *really* good deals, such that using a backup is still
economical, then I don't mind patronizing Micropolis again.  Otherwise, let's
not bother with them.


#21 of 37 by djf on Sun Mar 8 22:25:13 1998:

I'm not sure if this is necessarily a good place to volunteer a
donation of hardware, but I didn't see any better place after some
cursory poking around.

I have a not quite two year old 3.5" half-height 1.2G Quantum Fireball
single-ended narrow SCSI-2 drive which I'm not actively using now that
I've upgraded to bigger drives.  It's been run only in a well
ventilated drive enclosure and is in top shape as far as I can tell.
This unit has a three year warranty which expires in June of 1999.

Someone involved in Grex admin just drop me a note if interested in
this drive.


#22 of 37 by janc on Wed Mar 11 17:12:39 1998:

I've replied to this in E-mail - yes, we'd be very interested.  We have
been planing to set up a mail machine and need a drive for it.


#23 of 37 by valerie on Wed Mar 11 20:47:03 1998:

This response has been erased.



#24 of 37 by djf on Thu Mar 12 00:29:48 1998:

You're quite welcome.

I have used M-Net and Grex off and on since around 1985, though mostly
"off" these days.  The Altos was the first UNIX machine I ever used.
It's what prompted me to buy my first UNIX box, a firesale AT&T 3B1,
in 1987.  After having made my living doing UNIX and IP network admin
stuff for several years now I'm glad to be able to give something
back.  In 1985 I'm sure I would have seriously doubted that I'd ever
own a 1.2G drive, much less be giving one away.  It's a testament to
how far things have come since then.


#25 of 37 by arthurp on Tue Mar 17 04:24:37 1998:

I've been away for some time now, but I'll pick up this thread.  I have 
the motherboard and CPU for this system, and would love to put it in a 
case for grex.  I don't have a case for it, or I would have put it 
together already and given it.


#26 of 37 by scg on Tue Mar 17 05:33:25 1998:

We have a case sitting in the Pumpkin.  Let me know when you want to pick it
up (late evenings are probably best).


#27 of 37 by janc on Wed Mar 18 17:23:18 1998:

Or give me a call, my schedule is a bit more flexible than Steve's.  I also
have David's hard disk.


#28 of 37 by arthurp on Fri Mar 20 04:16:20 1998:

Late evenings are fine.  After this weekend I should be free.


#29 of 37 by jared on Mon Mar 23 23:44:11 1998:

Back to parts for the 4/670, since we're up on it, might i suggest
maxing out the number of cpu's?  i seem to recall steve (login: steve)
saying a spare cpu was $30, why not spend $60, and have 4 cpus, and
do it that way.  That should increase the speed of the non io-bound
operations on the system.  I'm going to do some stats as i find
time to see if the system is slow because of io operations or
cpu, etc..


#30 of 37 by dang on Tue Mar 24 04:51:10 1998:

I think the argument was that, since SunOS isn't multithreaded, more CPU's
wouldn't help much.  We do have another one we could possibly put in and test
with.


#31 of 37 by janc on Tue Mar 24 05:14:29 1998:

Well, we should buy another CPU card with two more CPU's as a spare.  Once
we have it, we should try running on both to see if it helps.  Rumor says it
doesn't and can even hurt.  I've also seen some suggestions that SunOS won't
even work with more than two CPUs.  But certainly we should give it a try.


#32 of 37 by mdw on Tue Mar 24 06:25:43 1998:

The SunOS kernel isn't multi-threaded, so only one process can be
executing kernel code at a time.  Grex, like most typical Unix systems,
spends about 50% of its time in kernel mode.  This means Grex can make
good use of 2 processors, because (on average) one processor can be in
the kernel while the other one is executing user code.  In theory, with
3 or more processors, the fact that only one processor can be executing
kernel code becomes the limiting factor, and performance should not be
significantly better than with 2 processors.  It could be worse if there
is any significant penalty for extra processors.

Obviously, in SunOS, the MP support is pretty primitive.  It is possible
to fix this, basically by completely rewriting the kernel.  Sun has done
this, and the result is called Solaris 2.  Solaris 2 can, in fact, make
good use of many processors, however, for a uniprocessor system, the
resulting MP overhead actually hurts performance.

Rather than trying to deploy a single large MP system, however, there is
another very different architectural approach, and that is to implement
a real distributed environment.  Basically, this means deploying a
number of smaller loosely coupled machines, and splitting up the stuff
that you see here on grex, between these multiple machines.  That might
mean, for instance, a few file servers, an authentication server, a mail
server, several login machines, and several terminal servers.  This
architecture has some important advantages over a monolothic large MP
server; for instance, if a machine breaks, it's easier to take it out of
service and replace it.  This architecture also scales better (up to
thousands of online users at once), and is in some ways much more robust
(something that clogs up the mail server with tons of mail may barely
affect anything else.)

Another important advantage (for grex) is that new MP servers are
extremely expensive, and used MP servers are not likely to be at all
common; while the "small server" distributed environment can make good
use of used workstations, which are extremely common.


#33 of 37 by jared on Wed Mar 25 04:20:18 1998:

If you run a modern version of solaris (2.5,2.5.1,2.6) it's nice and
fast.  I run those on various locations, including machines slower than
grex, and it works well.  An upgrade to a multithreaded OS is
something to consider, as for the security part, it's even
simpler:

Most security Issues are with X software, or fancy software that does
fun stuff.  This software is not needed on grex.

The kernel hacks for restricting network access are required for Solaris,
i have nothing to do with those, and can't recommend a good fix for that.

Remove all suid bits, except on well known, trusted software, such as
top, etc..  Nether.net is a very secure system that way :)

Grex can also fix some of it's spending time in kernel mode (it's
actually blocking for IO) by purchasing 7200RPM disks instead of the
slow disks it has, and using those for swap instead.  Using 3600RPM
disks for swap is not a great thing for performance.


#34 of 37 by mdw on Wed Mar 25 23:54:15 1998:

Faster disks will make very little difference with kernel overhead.  The
kernel overhead for a scsi disk of any speed will be virtually
identical, consisting of the time to set up the control blocks, pass
them to the controller, and later, to respond to the interrupt from the
controller when the transfer is done.  The speed of the disk is
completely irrelevant for these 2 activities; they'll take almost
exactly the same amount of CPU no matter how fast (or slow) the disk is.
7200RPM drives will have lower rotational latency, which will certainly
make things more spiffy, but may also have higher transfer rates, which
(if the controller is capable of it), could actually "hurt" CPU
performance by eating more memory bandwidth.  (The "hurt" disappears
when you consider overall system performance; it takes the same # of
memory cycles to transfer disk data to memory, regardless of whether
it's all squashed together or spread out a bit.) 7200RPM drives would
certainly still help performance, but finding a disk with fast seek
times is probably more important.

This is very different from the PC world, where with IDE, a faster disk
might well result in improved CPU utilization.  IDE disks don't do DMA,
but instead, the CPU transfers the data from the disk.  I've seen one
report that on some newer machines, an IDE drive can eat about 50% of
the CPU, vs. about 10% for a similar SCSI disk.  A faster spinning IDE
drive might well be able to manage faster transfer rates, and thus take
less CPU.


#35 of 37 by jared on Thu Mar 26 00:11:16 1998:

I'm talking about 7200rpm scsi disks.  I'm not aware of one of those
that has a seek time above 9ms.  The current disks we have are much
higher in their seek times, and it takes longer to get back because
of the slow spin.  You'd be shocked to notice the speed increase
you'll get in having a disk 2x as fast in rpm, and half the seek time,
which is what you'd get.

I'm not sure why you stuck a useless note in there about IDE, as
that is irrelevant to the way grex operates, as it's not on a PC.


#36 of 37 by srw on Thu Mar 26 04:50:01 1998:

You can get a good idea of how much a faster disk will help throughput 
on Grex by looking at the queue lengths as reported by vmstat. 
Try vmstat 5 and let it run for a minute or two.

Because the "r" queue lengths completely dominate the "b" numbers, I am 
not convinced that a faster disk will speed things up very much. More 
CPU might even help more than disk, even under SunOS 4.1.4, though the 
need for more CPUs is not as great for performance as it is for backup.

Grex effectively uses all the time while waiting for disk IO to complete 
by running other processes. If we did have a disk with higher IO rates, 
though, swap would be the first choice for where to put it, simply to 
make sure that there were always enough processes in memory to keep the 
CPUs busy. It seems there are, though, in our 128M of Ram.


#37 of 37 by mdw on Fri Mar 27 03:32:02 1998:

The main reason I mentioned IDE is that you claimed 7200 RPM disks would
decrease kernel mode time.  That might be true for IDE; it's not likely
to be the case with SCSI.  Although this is certainly not an issue with
the sun-4, it *is* an issue with the proposed 486 based mail server
we've been talking about, or with other similar possible future servers.
Unix does not spend kernel CPU while waiting for an I/O completion
interrupt - it either schedules another process to run (almost always
true on grex), or it idles (likely case on a Unix workstation).

The system is currently managing about 11% user, 30% system, & 58% idle.
There does not seem to be much actual paging going on, so a faster
paging device may not make much difference in system performance.
However, the paging system does seem a bit active in terms of page
attaches and detaches, and that means we're a bit short on memory - so
more memory would certainly be helpful.


There are no more items selected.

You have several choices: