You are not logged in. Login Now
 0-24   10-34   35-59   60-84   85-109   110-134   135-159   160-176   
 
Author Message
25 new of 176 responses total.
tod
response 35 of 176: Mark Unseen   Nov 25 21:14 UTC 2005

I also noticed some items from the parenting cf have come up missing.
Can you restore them, STeve?
naftee
response 36 of 176: Mark Unseen   Nov 25 23:10 UTC 2005

yeah, steVE.  it's bugging us.
cross
response 37 of 176: Mark Unseen   Nov 26 03:11 UTC 2005

Badda bing!
tsty
response 38 of 176: Mark Unseen   Nov 29 04:36 UTC 2005

grex did not go out of its way to destroy any data. borg/staff went out
of its way to favor its phony  invincibility - not an unknown arrogance
inside consensus-only leaderless organizations which eschew 'outsiders'
of any stripe. 
  
grex has, sometimes, 'taken in' 'strays' but only until somethng can
be trumped up into a 'scandal' and then ... poof! 
  
different thoughts are prohibited inside the inner navel-gaze.
  
its sadly predictable but no one would listen - invincible arogance or
something similar. maybe bad history teachers helped. maybe no
systems analyts/engineers permitted
  
it's not STeve .. it's borg.
  

tsty
response 39 of 176: Mark Unseen   Nov 29 04:37 UTC 2005

hey, cross, how ya doin?
janc
response 40 of 176: Mark Unseen   Nov 29 15:08 UTC 2005

I think it is ridiculous to say that Grex lost the mail partition
because of some great organizational fault.  Sure Grex has
organizational issues, and I believe some of them contributed directly
to the excessively long down time for the upgrade, but I don't think
that the mail partition problem was particularly caused by this.

The backups were performed by STeve Andre and John Remmers.  Probably
STeve was typing and talking about what he was doing, and John was
looking over his shoulder (this is the mode they were working in when I
dropped by later).  This is a pretty good way to do things like this,
because the second person has a good chance of catching the first
person's errors.  In this case the error was subtle - I think they had
both forgotten that /var/mail was unmounted because it was done some
time before.  When they tar'ed up /var, they got a huge file.  STeve did
a partial listing of its contents to see if the right sort of thing was
in it, but he didn't look far enough.  He saved a copy on Grex's IDE
disk, and uploaded another to his laptop.  There was supposed to be an
additional safety net - the mirror drive.  Unfortunately, my mirror
scripts weren't smart enough to not mirror an unmounted drive and nobody
remembered to turn them off, so we lost the mirror copy of /var/mail too.

So we had several safety nets in place, and all of them failed.  It's
pathetic and unfortunate, but it's not an organizational failure, and
it's not a failure based on a phoney sense of invincibility.  Like all
computer professionals, we are well aware of our capacity to screw up
and take precautions to protect ourselves.  But sometimes the
precautions fail.  That's life.
tod
response 41 of 176: Mark Unseen   Nov 29 16:49 UTC 2005

 I think it is ridiculous to say that Grex lost the mail partition
 because of some great organizational fault.
Let's step back for a second.  There was a time when it was considered polite
to notify users of intended downtime due to upgrades.  That was so users could
ensure they have their precious data moved offline if they felt the need. 
It was also so they'd know not to plan on being online at that time.
Organizational fault is written all over the last "upgrade."  The Board is
slacking off by letting a bunch of part time hobbyhorse types take the system
offline without forewarning.  The board has a fiduciary responsibility to the
members to keep the system around and available.  No accountability in this
organization, imo.  My recommendation is that the board seek some fresh blood
for the staff and also take some lessons in diplomacy and accountability.
No need to chant "volunteer organization" at me, neither.  That's a dead horse
not worth beating and everyone is tired of that excuse.
keesan
response 42 of 176: Mark Unseen   Nov 29 17:25 UTC 2005

Todd, why don't you start your own bbs and do it right?  Our volunteers are
not perfect, they admit to this, they accept suggestions, and they don't need
more complaints.  A polite request for a few days notice next time grex is
going down would accomplish more than #41.   Are you looking for someone to
volunteer as full-time staff?
nharmon
response 43 of 176: Mark Unseen   Nov 29 17:31 UTC 2005

The function of the staff should be to advise the BoD on technical
issues, implement the decisions made by the BoD, and intervene on their
own initiative in some circumstances.  I say should be, because with the
exception of a few board motions, there is not a lot that defines what
staff's duties, responsibilities, requirements, etc. are.

Ideally, there should be one person appointed by the BoD responsible for
supervising the staff. This person would be accountable to the BoD, and
the other staffers accountable to him/her.

Right now it seems there is a hash of staff members who do a good job of
working together but without any real guidance or direction. I think
finding someone with the time and drive to give direction is something
Grex needs desperately.
nharmon
response 44 of 176: Mark Unseen   Nov 29 17:32 UTC 2005

I think it is a shame that someone takes the time to voice their
recommend on how to improve Grex only to be told to leave and start
their own BBS if they don't like how Grex is run.

That sort of attitude is exactly what ruins good organizations like this.
tod
response 45 of 176: Mark Unseen   Nov 29 18:10 UTC 2005

#42 of 44: by Sindi Keesan (keesan) on Tue, Nov 29, 2005 (12:25):
 Todd, why don't you start your own bbs and do it right?  
I'm a member of Cyberspace, Inc.  I like this BBS (when its online.)

 Our volunteers are not perfect, they admit to this, they accept suggestions,

 and they don't need more complaints.  
How do we provide a community service without community feedback?

 A polite request for a few days notice next time grex is
 going down would accomplish more than #41.
I would request politely if I thought the Board would listen.  I suspect
that the Board is 2nd fiddle to janc and STeve's whims, though.  
Therefore, I'm using a harsher tone in hopes of a constructive response
and action.

   Are you looking for someone to volunteer as full-time staff?
I think the Board seriously should be.  The downtimes of the recent past
have been at the full mercy of staffers with Grex nowhere near the
top of their priorities.  I don't fault them for it but I do fault the
Board for not seeking additional available and willing staff.
Retaining staff should also include a bit more diplomacy in the
way it treats existing volunteers and members.

#43 of 44: by Nathan Harmon (nharmon) on Tue, Nov 29, 2005 (12:31):
 Ideally, there should be one person appointed by the BoD responsible for
 supervising the staff. This person would be accountable to the BoD, and
 the other staffers accountable to him/her.
I agree with you, Nathan.  I'd also interject that the BoD is
ultimately responsible.  As members, we should not be told to shutup
when we ask the Board why no one is being accountable for Grex.
steve
response 46 of 176: Mark Unseen   Nov 29 18:25 UTC 2005

   Grex has never operated with a chief staff person.  It's always been
more of a collective thing.  It's worked out at least as good as work
places I've been at which had an official structure.

   Tod you know as well I as I do that there isn't going to be a
full-time staff person.

mcnally
response 47 of 176: Mark Unseen   Nov 29 18:39 UTC 2005

>  I think it is ridiculous to say that Grex lost the mail partition
>  because of some great organizational fault.

I don't think that's ridiculous at all.

I think the mirroring scheme was set up by someone other than did the
repartitioning and the person doing the repartitioning didn't fully
understand the implications of the backup scheme, namely that unmounted
partitions aren't backed up.

Furthermore I think that there are organizational issues that led to
the mail disaster in other ways, too.  I've refrained from commenting
because I haven't had a good idea how to separate criticism of the
upgrade from criticism of the people who performed the upgrade, but I
personally think it was a very bad idea to upgrade and restore in place.
If we had a spare SCSI disk (or perhaps set of disks) [which we should
have anyway, for disaster recovery] the entire upgrade could have been
performed without ever risking the data on the disk(s) the system had
been running on.  As I understand it Grex has got a not excessive, but
still reasonable amount of money in the bank.  Perhaps we should
invest in preventing exactly this sort of behavior the next time around.

And while I would never suggest that anyone jettisoned the mail on
purpose, I suspect a contributing factor in the mail loss is that
none of the people involved depend on the mail system here in any
way that's truly important to them.  They shouldn't *have to* to
administer the system but it does tend to focus one's attention
when you've got something to lose.
tod
response 48 of 176: Mark Unseen   Nov 29 18:42 UTC 2005

I agree to disagree with STeve about a full-time staff person.  The Board can
at least make an attempt to find such person(s) with flexibility in
availability and accountability.  A staff of several persons with dedicated
timeslots would be ideal but needs to happen by someone taking that task as
the lead.  "We never did it before" and "isn't going to be" are empty excuses,
imo.  Why is improving Grex uptime and maintenance so painful a concept?
steve
response 49 of 176: Mark Unseen   Nov 29 18:46 UTC 2005

   And how do we pay for this person?
steve
response 50 of 176: Mark Unseen   Nov 29 18:50 UTC 2005

   Well Mike, I would have *liked* to have had spare disks for the
upgrade.  Here at work I keep entire spare machines such that an 
upgrade is done on the next machine, with data transfers done onto
the new machine, and a switchover of IP addresses.  I've had as
little as 4 seconds of downtime for such upgrades.

   But I did not succeed in getting the board to move on getting
the PC Weasel because of costs.  yes, I thought of asking for
money for at least one more 36G scsi disk, but I didn't want to
go through that, dealing about money again.

   I lost mail of mine too, Mike, so I felt the pain as well...
mcnally
response 51 of 176: Mark Unseen   Nov 29 19:19 UTC 2005

> But I did not succeed in getting the board to move on getting
> the PC Weasel because of costs.  yes, I thought of asking for
> money for at least one more 36G scsi disk, but I didn't want to
> go through that, dealing about money again.

Right.  Which supports my counter-argument against Jan's statement
and suggests that maybe the recent trouble points to some organizational
problems that we can remedy before the next time we reach a crisis.  
tod
response 52 of 176: Mark Unseen   Nov 29 19:23 UTC 2005

re #49
    And how do we pay for this person?
We can pay them double what you're getting from Grex.  ;)
steve
response 53 of 176: Mark Unseen   Nov 29 19:24 UTC 2005

   I'm not so sure.  The management of Grex has always been prudent
about financial things, to keep the system healthy. Perhaps one
could say there is too much of that at times, but thats life.  No
organization is perfect.  Overall I think Grex does things pretty
well, and, in spite of my thoughts on things, I'd rather have
this organization than even a lot of "professional" organizations
in terms of how they work.  This is not to say that we couldn't
stand improvement, just that we're less screwed up than most
business places, from my point of view.
steve
response 54 of 176: Mark Unseen   Nov 29 20:12 UTC 2005

   Re #40:  It was *my* fault that we lost the mail partition, and mine
alone.
nharmon
response 55 of 176: Mark Unseen   Nov 29 20:31 UTC 2005

Thank you for not displacing blame and taking responsibility Steve. I'm
sure we all know you didn't do it deliberately and will probably not
make the same mistake again. Thank you for being honest.
mcnally
response 56 of 176: Mark Unseen   Nov 29 20:32 UTC 2005

 re #54:  Your error was the proximate cause, but that doesn't mean that
 there weren't contributing issues that we should address in anticipation
 of future mistakes -- anytime people are involved mistakes are inevitable
 but through proper planning and procedures you can make a huge difference
 in the outcomes..

 I think you're seeing this as a discussion about blame, which is probably
 not an unreasonable way to look at it from your standpoint, especially as
 there are still a lot of people who want to discuss blame.  I'd much rather
 try to figure out how best to keep it from happening again, which requires
 some degree of understanding what happened and why, but is a question to
 which blame is pretty much irrelevant.
steve
response 57 of 176: Mark Unseen   Nov 29 20:38 UTC 2005

   No, I'm not looking at it from a blame stanpoint, just the truth.

   I do however fully agree with you that we need to be able to look at
things and do some things better.  Thats a good thing to do.
naftee
response 58 of 176: Mark Unseen   Nov 29 22:04 UTC 2005

hey tod.  how do you say 'fiduciary' in Romanian ?
bhoward
response 59 of 176: Mark Unseen   Nov 29 22:28 UTC 2005

Re#50: It should be noted for the record, as of the last board
meeting, we have reconfirmed that the cost of a PC Weasel is being
covered by an anonymous donation.

Has the order been placed?

Once this and other post-mortem discussions have run their course,
I suggest we (staff) should take the points made, write up a summary
of how we'll go about the next one, making certain the process
described will address the shortcomings identified in the last one.
 0-24   10-34   35-59   60-84   85-109   110-134   135-159   160-176   
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss