|
Grex > Helpers > #140: Grex System Problems - Spring 2005 | |
|
| Author |
Message |
| 25 new of 457 responses total. |
russ
|
|
response 100 of 457:
|
Apr 6 01:46 UTC 2005 |
Looks like we have edonkey abusers:
duggacone dugga cone *ft - Tue 21:28
duggacone dugga cone *ft - Tue 21:30
duggacone dugga cone *ft - Tue 21:31
duggacone dugga cone *ft - Tue 21:32
duggacone dugga cone *ft - Tue 21:33
duggacone dugga cone *ft - Tue 21:34
duggacone dugga cone *ft - Tue 21:37
duggacone dugga cone *ft - Tue 21:38
|
keesan
|
|
response 101 of 457:
|
Apr 6 13:53 UTC 2005 |
What was making top go up near about 70 a while back?
|
janc
|
|
response 102 of 457:
|
Apr 15 16:07 UTC 2005 |
I'm going to give an incomplete and partial report on why Grex was down so
long.
|
janc
|
|
response 103 of 457:
|
Apr 15 16:14 UTC 2005 |
That was odd. Try again.
I first heard Grex was down in an E-mail message from Krj saying that STeve
Andre was in the hospital. He'd been working on repairing Grex, but had to
be rushed off to the emergency room before he got very far. I'll leave
detailed reports on STeve's health to people who know a bit more about it,
but it was serious and he spent quite a bit of time in the hospital. I
presume he's home now, but haven't heard anything since Tuesday, when I think
he was still in the hospital.
|
mcnally
|
|
response 104 of 457:
|
Apr 15 16:26 UTC 2005 |
That's terrible about STeve. I hope he recovers quickly.
|
janc
|
|
response 105 of 457:
|
Apr 15 16:29 UTC 2005 |
OK, gate seems to be crashing a lot. Not good. But to continue...
It took a bit of time for anyone else to find out what STeve had done.
But he reported that the sd0 drive, which contains the system root,
/usr/local /tmp and /a (half the user directories) had died. He'd
backed up parts of it.
The disk actually seems to be still mostly readable. I haven't actually
seen any error messages from it, but haven't tried to use it very much.
I don't know how sick exactly it is.
The initial feeling was that we wouldn't be able to do anything until we
got a new disk. But on the Grex walk John Remmers and I thought up an
alternate approach. We have some spare partitions designed to be used
to work on a new installation of OpenBSD while still running off the old
version. By using those spare partitions to replace the ones from the
dead disk, plus some space on the big (comparitively) slow IDE drive,
we should be able to bring up the system without getting a new disk.
Then in a couple months when the OpenBSD 3.7 release comes out, we could
rebuild with a new OS and a new disk at the same time.
This was supposed to be a speedy way to get Grex up faster. Didn't
entirely work out that way.
So Saturday afternoon John and I we in and started re-arranging thing.
We restored most things off the mirror partitions on the IDE drive.
Built a new root and a new /usr/local on the sd1 drive. We didn't
know how to make that new root partition bootable, so eventually we
went home to research it some more. We figured that out, and on
Monday (I think) John went back, ran the "installboot" we didn't
know about, and created the /dev partition that we had forgotten
about (it's not mirrored for obvious reasons). This got the system
to a point where we could boot it up, but we still needed to restore
/a.
I think STeve restored /a on Tuesday. I think he was working with his
laptop from his hospital bed. By this time I was hopelessly buried in
my taxes and wasn't really paying attention anymore. (I filed tax returns
for six different entities this year.) I finished taxes yesterday and
got back to looking at Grex today. Nothing appears to have happened since
Tuesday. I really have no idea why not. Well, STeve was in the hospital
and I was in my taxes and I presume other people had things to do too.
So this morning I did some re-arranging of where /a was being temporarily
stored, and fixed some problems with mail and with quotas and turned
things back on again.
I hope that nothing was lost in the crash. There really shouldn't have
been.
I apologize for the excessively long downtime. It hit at a bad time
for everyone.
|
janc
|
|
response 106 of 457:
|
Apr 15 16:32 UTC 2005 |
Grex is probably a bit slower than it was. /a has been moved from the fast
SCSI disk to the less fast IDE disk. So has /tmp.
Everything is scattered over fewer disks. This means all disks are busier
and overall performance will be slower.
We have less swap space than we used to.
Shouldn't be too bad, but might be enough to be noticable at times.
We do intend to replace the dead disk and get back to something closer to our
old configuration.
|
mcnally
|
|
response 107 of 457:
|
Apr 15 16:39 UTC 2005 |
Permissions on /tmp are messed up. This prevents both pine and vi from
functioning properly.
|
mcnally
|
|
response 108 of 457:
|
Apr 15 16:40 UTC 2005 |
grex% ls -ld /tmp
drwxr-xr-x 2 root wheel 512 Apr 15 11:57 /tmp/
As you can see, /tmp is currently 755.
It *should* be 1777.
|
rcurl
|
|
response 109 of 457:
|
Apr 15 17:02 UTC 2005 |
Could not delete mail on pine (which I presume is the result of the above?).
|
mary
|
|
response 110 of 457:
|
Apr 15 17:27 UTC 2005 |
Thank you Jan, STeve and John.
|
rcurl
|
|
response 111 of 457:
|
Apr 15 17:47 UTC 2005 |
picospan has just stopped echoing commands at the OK: prompt.
|
naftee
|
|
response 112 of 457:
|
Apr 15 18:04 UTC 2005 |
whoa !
Does this mean everyone who was on /a is now on /c ?!
|
jep
|
|
response 113 of 457:
|
Apr 15 18:18 UTC 2005 |
Thanks for all of the work you did to get Grex back up and running!
|
remmers
|
|
response 114 of 457:
|
Apr 15 19:08 UTC 2005 |
Re #112: No, it just means that /a is now on a good disk instead of the
one that died.
|
tod
|
|
response 115 of 457:
|
Apr 15 19:09 UTC 2005 |
Did /a join a motorcycle gang and get piercings or something?
|
gelinas
|
|
response 116 of 457:
|
Apr 15 19:12 UTC 2005 |
No, it doesn't, naftee. /a and /c are still on different disks. It's just
that /a is not on the disk it was on.
|
scholar
|
|
response 117 of 457:
|
Apr 15 19:53 UTC 2005 |
I"D LIKE TO PERSONALLY WELCOME GREX BACK TO THE INTERNET>
WE MISSED YOU< BUDDY>
|
naftee
|
|
response 118 of 457:
|
Apr 15 22:11 UTC 2005 |
ahh, thanks sirs.
|
tsty
|
|
response 119 of 457:
|
Apr 16 01:38 UTC 2005 |
much appreaciated efforts and successes for all involved - thank you.
|
slynne
|
|
response 120 of 457:
|
Apr 16 03:10 UTC 2005 |
Thank goodness we have the that we do!
|
aruba
|
|
response 121 of 457:
|
Apr 16 04:51 UTC 2005 |
Thanks to everyone who worked on getting Grex back up.
|
naftee
|
|
response 122 of 457:
|
Apr 16 05:08 UTC 2005 |
thanks, tsty! thanks, slynne ! thanks ,aruba !
|
scott
|
|
response 123 of 457:
|
Apr 16 05:14 UTC 2005 |
Thanks, guys! Working from a hospital bed is serious dedication.
|
naftee
|
|
response 124 of 457:
|
Apr 16 17:42 UTC 2005 |
thanks, scott !
|