You are not logged in. Login Now
 0-24   25-49   50-74   75-99   100-124   125-149   150-174   175-199   200-224 
 222-246   247-271   272-281        
 
Author Message
25 new of 281 responses total.
remmers
response 247 of 281: Mark Unseen   Aug 31 11:37 UTC 2005

Re Step 5:  Typing "n" instead of "/reboot" will also skip to the next
reboot entry.

Also, if you just want to see a list of recent reboots with other login
information filtered out, you can use the Unix 'grep' utility.  Type
this at the shell prompt:

 last|grep '^reboot '|more

Using 'awk', you can get a list of reboots, with each reboot followed by
a list of who was logged in at the time of the immediately preceding crash:

 last|awk '{if (/^reboot /) print $0; else if (/- crash/) print " "$1}'
 |more

(Backtalk wrapped the preceding command; it should be typed all on one
line.)

These reboots are not planned.  For a few days now, Grex has been
crashing a couple of times a day, resulting in downtime of 20 minutes or
so while it reboots itself.  At this point, cause unknown.  Usually the
reboot is successful; when it's not, somebody (usually somebody at our
colo, and on some occasions me) has to push the reset button manually.

I realize the sporadic outages are annoying.  Hopefully we'll get the
problem resolved soon.
keesan
response 248 of 281: Mark Unseen   Aug 31 14:28 UTC 2005

I was logged on twice this week when it happened, I think.  Lucky me.
I have been emailing gelinas each time - is this appropriate?  Should I email
colo instead?  Or phone them?
remmers
response 249 of 281: Mark Unseen   Aug 31 14:46 UTC 2005

As a practical matter, I'm online often enough that most of the time I
notice that Grex is down sooner than another staff member is likely to
notice or to check their email.  So for this particular problem, I don't
think emailing someone speeds up the process of getting Grex back up
when it doesn't successfully reboot itself.

You shouldn't contact the colo directly.  They are just hosting our
server and don't maintain it.  They are willing do something simple,
like power-cycle it or hit the reset button, but for security reasons
only on the direct request of a Grex staff member who is known to them.
albaugh
response 250 of 281: Mark Unseen   Aug 31 17:51 UTC 2005

Is it known yet whether the reboots are hardware-initiated,
software-initiated, or both?
remmers
response 251 of 281: Mark Unseen   Sep 1 12:54 UTC 2005

Not known to me.

No reboots in two days.  (cross fingers)
albaugh
response 252 of 281: Mark Unseen   Sep 1 16:36 UTC 2005

As opposed to "finger cross".  ;-)
cross
response 253 of 281: Mark Unseen   Sep 1 20:32 UTC 2005

This response has been erased.

mcnally
response 254 of 281: Mark Unseen   Sep 2 01:15 UTC 2005

 Think pretty highly of yourself, don't you?
cross
response 255 of 281: Mark Unseen   Sep 2 17:03 UTC 2005

This response has been erased.

tod
response 256 of 281: Mark Unseen   Sep 2 17:04 UTC 2005

Would you settle for an MRE?
cross
response 257 of 281: Mark Unseen   Sep 2 17:13 UTC 2005

This response has been erased.

happyboy
response 258 of 281: Mark Unseen   Sep 2 17:16 UTC 2005

/send dan a big bucket of popeye's wings and a soady-pop
drew
response 259 of 281: Mark Unseen   Sep 3 17:47 UTC 2005

Now it's refusing to let me enter stuff direct-dialed using vi. Just got a
"nasty error message" or something when I tried to enter a response.
richard
response 260 of 281: Mark Unseen   Sep 13 18:54 UTC 2005

grex is back! thanks to staff for what sounds like a lot of work to 
repair the labor day attack.

what exactly happened that caused this mess anyway?
aruba
response 261 of 281: Mark Unseen   Sep 14 14:16 UTC 2005

Thanks to the staff member(s) who got Grex back up.  Could we hear the
story?
eprom
response 262 of 281: Mark Unseen   Sep 14 16:28 UTC 2005

The response time was outragious!  We need some accountability here.
People need to be fired or demoted and a contigency plan should be
drafted up just incase this happens again!
remmers
response 263 of 281: Mark Unseen   Sep 14 16:35 UTC 2005

The staff member who got Grex back up was me, aided by Jan Wolter's
life-saving mirroring software and some helpful advice in email from
Marcus Watts.  I'm only sorry that I wasn't able to devote much
attention to it sooner, due to other commitments last week.

What happened:  Some files in the /etc disk partition (in particular,
the password file) became corrupt, for reasons unknown to me but
probably due to a software glitch (don't know if it was OS software or
application software, either).  I made a trip to our colo and was able
to run some tests and verify that the disks and filesystems were
healthy, but didn't have time to investigate further.  On a subsequent
trip, I booted into single user mode and took some time to look around
the filesystem, eventually discovering that the password file (and
possibly others) had been corrupted.

Grex's important file systems (system directories, user directories,
bbs) are backed up to a spare hard drive every few hours, thanks to some
mirroring software that Jan Wolter wrote.  Because of this, I was able
to restore "good" versions of the files in /etc from the state they were
in about 4 hours before the crash.  Thankfully, that's all it took to
get Grex to boot successfully.  The most that was lost was whatever new
accounts were created via newuser in that 4-hour period, I think.

Diagnosis of the cause of the problem will have to be left to someone
who knows more about OpenBSD than I do.  Until the cause is addressed,
the problem may well recur.  If it does, at least we know where to look
now, and Grex should be up a lot sooner.  I'm sorry that it all took so
long this time.

edina
response 264 of 281: Mark Unseen   Sep 14 16:43 UTC 2005

John, thank you for your assistance.  It is appreciated.
jiffer
response 265 of 281: Mark Unseen   Sep 14 17:01 UTC 2005

I say thanks to all the staff for spending their PRECIOUS time to help restore
grex. So, if you want to complain that it wasn't up faster, get the knowledge,
skill and volunteer to do it.
twenex
response 266 of 281: Mark Unseen   Sep 14 20:25 UTC 2005

Re: #264, #265. Hear, hear!
naftee
response 267 of 281: Mark Unseen   Sep 14 20:42 UTC 2005

Har, har !
rcurl
response 268 of 281: Mark Unseen   Sep 15 05:09 UTC 2005

Re #265: while I agree that the staff are to be thanked heartily for their
efforts in maintaining Grex, I think it is unreasonable to expect everyone
to be come equally skilled before they can complain. After all, the members
of Grex that do not have the skills to do what staff does are still donating
the funds required for staff to do what they do. I think some thanks are
due for even just that - and that members do gain some license to complain
thereby. In addition, it would be a huge waste of time and money for
*everyone* using Grex to become equally skilled as staff, as then how could
all that talent possibly be used simultaneously? Isn't there a suitable
maximum to the number of staff required to adequately service Grex?
nharmon
response 269 of 281: Mark Unseen   Sep 15 12:10 UTC 2005

As the number of talented staff increases, the better the chances that
someone will be available to work on the system at all hours of the day
and night.
bru
response 270 of 281: Mark Unseen   Sep 15 12:13 UTC 2005

How about grex spend a little cash on the machine, hire a tech to come in and
FIX whatever is actually wrong and stop it from crashing.
jep
response 271 of 281: Mark Unseen   Sep 15 14:07 UTC 2005

Minor thing, but isn't it time to remove the MOTD item that some 
loginids are missing but the staff is working to recover them?  It's 
been there for something like 6 months, if not longer.  The staff is 
not working on recovering them at this point, or at lrast so I prefer 
to believe.  That announcement is kind of painful to see, day after day.
 0-24   25-49   50-74   75-99   100-124   125-149   150-174   175-199   200-224 
 222-246   247-271   272-281        
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss