In the process of making backups of Grex, I bungled it, such that when I thought I'd backed up the entire /var partition, a piece was missing, namely /var/mail. The /var/mail parition is a seperate thing from /var, and when I did the backup for /var I thought /var/mail was mounted. Oops. Because of this we lost all the mail that was sitting there for people. Perhaps the least affected are the users who use Grex for mail every day--if you dealt with new mail on Friday or Saturday before we went down, then you probably didn't loose much. Sporadic users lost the most, sigh. I lost mail as well, so I'm afraid I know that bad feeling when one realizes that mail is lost. I'm sorry. I bungled that part of the backup.176 responses total.
That's OK, steVE !
Speaking of mail, Did you send any libelous mail about users to anyone this week? A quote from dearly departed staff member Daniel Cross of the MARINES: "Steve wrongly sent an email to gmail claiming that polytarp did something he didn't actually do. He did not retract it even when it was demonstrated to him that he was wrong. In fact, as I recall, he argued that he wasn't wrong, despite clear evidence to the contrary."
Thanks for backing up Grex when you didn't announce it was going offline for a month.
You're welcome, Todd!
wasn't it only a week ? actually, i'm surprised how fast that week went past !
I actually got alot of things done. THANK YOU!
Thanks, Jan!
(good reason not to rely on email provided by an organization with no paid employees)
Yes, my mistake. And it will be the last.
We have said all along not to keep anything on Grex that you cannot afford to lose. I don't even keep stuff on my own website without having it backed up on my home system (and that also gets backed up frequently).
That doesn't mean Grex should go out of its way to destroy people's data.
Why not? It would prevent complacency.
I doubt that grex went out of its way to delete anyone's data. They just screwed up. Hey, shit happens. That said, every guide to upgrading *I've* ever read says to backup to stable media first (ie, a tape).
You think we went out of our way to destroy people's data? You re delusional. I agree about stable media -- we had (have) copies of the partition data in tar files that were stored in two places: Grex's /mirror on the IDE disk, and a travelstar disk on my laptop which I tested by ensuring the checksums of the tar files were the same in both places. If that isn't stable, what is? My blunder was that I did not check that /var/mail was mounted, a specific check for that. I won't make that mistake again.
Regarding #14, First paragraph; Who are you talking to? I said you *didn't* go out of your way to destroy people's data, just made a mistake. Were you referring to me, or polytarp in #11? Please specify. What I was really getting at is that something other than tar should've been used to do the backups. Like, e.g., dump.
I'd like to make it clear that I also didn't say Grex went out of its way to destroy data.
(What I'm trying to say here is that, even though people SHOULDN"T keep unbacked up data on Grex, fact is they DO and the staff ought to take that into consideration when they do things, which they obviously didn't do in this case.)
Come on Dan -- we all know that you didn't say that. I was responding to the eternal noise machine that infests Grex these days. In the case of the error I made, dump would have given the same results. dump is useful at times, but a tar file you can rip apart with vi to extract things. No so with dump. Re #17: if we didn't take into account that people do have data here, why would we have even bothered to try at all? Really, you comment here is simply absurd. Grex staff has had a long history of *saving* data on bad disks, such that disaster was avoided. This is the worst data loss that Grex has ever had in its 14 year history.
And it's because of neglience. YOUR negligence.
You really are having fun, aren't you. Well, I can't expect anything else from you. You don't have the ability to be creative, constructive or helpful. You simply snarl at things. As pissed as I am about you, I feel more compassion in the end: you are a sad unhappy person. Please continue. It is all you can do.
Are there *any* restorable backup tapes of /var/mail? I had personal messages going back a long way in my spool file and some recovery would be better than none. I'm sure there are other people in the same boat.
There will be some backups on 8mm tape, which are pretty old. I'd say at least a year? I have the tape box. I'll look for the latest tape that mentions /var.
Regarding #18; Just clarifying. However, dump *wouldn't* have given the same result: dump works by interpreting the filesystem data on the raw disk devices itself, which means that it doesn't have to be mounted (in fact, it's somewhat better if it *isn't*). And you *can* rip apart a dump file to pull things out with, e.g., a text editor.
Right, but since /var is seperate from /var/mail, dump wouldn't have included it. Given the choice of tar or dump for tearing apart, I'll take tar. It also has the advantage of working on Windows systems. I should have said "not reasonably" with dump. This gets more into philosophical areas. The problem was I made an error overlooking the partitions and I don't think my error would have been different with dump.
Presumably, you'd use dump on every filesystem on the system! That's the whole point! /var/mail was missed because it was unmounted when you ran tar. Dump doesn't care; you just tell it to dump a filesystem and it does it, regardless of whether that filesystem is online at the time. That's the big difference. With tar, it *has* to be online, with dump, it doesn't (and in some ways, it's better if it's not. Dump favors a quiescent filesystem). I'm not sure, in this case, why moving the data to a Windows system could have been useful, though I can see the portability of tar as being an asset in more general situations.
re #21 Thanks, Mike and STeve.
re 20 You're pissed at polytarp about him making fun of your mistake(s) ? I guess that is something to get mildly ticked off at, but oh well@!
Umm.. Steve.. stop responding to him/them.
Be sure to send that RAM in for the lifetime guarantee replacement, STeve.
I'm going to send these two in to Crucial for testing. Indeed they do have a good warranty. Never had to use it before. We'll see.
What two?! Me and naftee?!
I wouldn't bother. We know you're defective.
haha zing
Ah well. I had some personal mail that I would have liked to have kept but I guess it isnt the end of the world to lose it either.
I also noticed some items from the parenting cf have come up missing. Can you restore them, STeve?
yeah, steVE. it's bugging us.
Badda bing!
grex did not go out of its way to destroy any data. borg/staff went out of its way to favor its phony invincibility - not an unknown arrogance inside consensus-only leaderless organizations which eschew 'outsiders' of any stripe. grex has, sometimes, 'taken in' 'strays' but only until somethng can be trumped up into a 'scandal' and then ... poof! different thoughts are prohibited inside the inner navel-gaze. its sadly predictable but no one would listen - invincible arogance or something similar. maybe bad history teachers helped. maybe no systems analyts/engineers permitted it's not STeve .. it's borg.
hey, cross, how ya doin?
I think it is ridiculous to say that Grex lost the mail partition because of some great organizational fault. Sure Grex has organizational issues, and I believe some of them contributed directly to the excessively long down time for the upgrade, but I don't think that the mail partition problem was particularly caused by this. The backups were performed by STeve Andre and John Remmers. Probably STeve was typing and talking about what he was doing, and John was looking over his shoulder (this is the mode they were working in when I dropped by later). This is a pretty good way to do things like this, because the second person has a good chance of catching the first person's errors. In this case the error was subtle - I think they had both forgotten that /var/mail was unmounted because it was done some time before. When they tar'ed up /var, they got a huge file. STeve did a partial listing of its contents to see if the right sort of thing was in it, but he didn't look far enough. He saved a copy on Grex's IDE disk, and uploaded another to his laptop. There was supposed to be an additional safety net - the mirror drive. Unfortunately, my mirror scripts weren't smart enough to not mirror an unmounted drive and nobody remembered to turn them off, so we lost the mirror copy of /var/mail too. So we had several safety nets in place, and all of them failed. It's pathetic and unfortunate, but it's not an organizational failure, and it's not a failure based on a phoney sense of invincibility. Like all computer professionals, we are well aware of our capacity to screw up and take precautions to protect ourselves. But sometimes the precautions fail. That's life.
I think it is ridiculous to say that Grex lost the mail partition because of some great organizational fault. Let's step back for a second. There was a time when it was considered polite to notify users of intended downtime due to upgrades. That was so users could ensure they have their precious data moved offline if they felt the need. It was also so they'd know not to plan on being online at that time. Organizational fault is written all over the last "upgrade." The Board is slacking off by letting a bunch of part time hobbyhorse types take the system offline without forewarning. The board has a fiduciary responsibility to the members to keep the system around and available. No accountability in this organization, imo. My recommendation is that the board seek some fresh blood for the staff and also take some lessons in diplomacy and accountability. No need to chant "volunteer organization" at me, neither. That's a dead horse not worth beating and everyone is tired of that excuse.
Todd, why don't you start your own bbs and do it right? Our volunteers are not perfect, they admit to this, they accept suggestions, and they don't need more complaints. A polite request for a few days notice next time grex is going down would accomplish more than #41. Are you looking for someone to volunteer as full-time staff?
The function of the staff should be to advise the BoD on technical issues, implement the decisions made by the BoD, and intervene on their own initiative in some circumstances. I say should be, because with the exception of a few board motions, there is not a lot that defines what staff's duties, responsibilities, requirements, etc. are. Ideally, there should be one person appointed by the BoD responsible for supervising the staff. This person would be accountable to the BoD, and the other staffers accountable to him/her. Right now it seems there is a hash of staff members who do a good job of working together but without any real guidance or direction. I think finding someone with the time and drive to give direction is something Grex needs desperately.
I think it is a shame that someone takes the time to voice their recommend on how to improve Grex only to be told to leave and start their own BBS if they don't like how Grex is run. That sort of attitude is exactly what ruins good organizations like this.
#42 of 44: by Sindi Keesan (keesan) on Tue, Nov 29, 2005 (12:25): Todd, why don't you start your own bbs and do it right? I'm a member of Cyberspace, Inc. I like this BBS (when its online.) Our volunteers are not perfect, they admit to this, they accept suggestions, and they don't need more complaints. How do we provide a community service without community feedback? A polite request for a few days notice next time grex is going down would accomplish more than #41. I would request politely if I thought the Board would listen. I suspect that the Board is 2nd fiddle to janc and STeve's whims, though. Therefore, I'm using a harsher tone in hopes of a constructive response and action. Are you looking for someone to volunteer as full-time staff? I think the Board seriously should be. The downtimes of the recent past have been at the full mercy of staffers with Grex nowhere near the top of their priorities. I don't fault them for it but I do fault the Board for not seeking additional available and willing staff. Retaining staff should also include a bit more diplomacy in the way it treats existing volunteers and members. #43 of 44: by Nathan Harmon (nharmon) on Tue, Nov 29, 2005 (12:31): Ideally, there should be one person appointed by the BoD responsible for supervising the staff. This person would be accountable to the BoD, and the other staffers accountable to him/her. I agree with you, Nathan. I'd also interject that the BoD is ultimately responsible. As members, we should not be told to shutup when we ask the Board why no one is being accountable for Grex.
Grex has never operated with a chief staff person. It's always been more of a collective thing. It's worked out at least as good as work places I've been at which had an official structure. Tod you know as well I as I do that there isn't going to be a full-time staff person.
> I think it is ridiculous to say that Grex lost the mail partition > because of some great organizational fault. I don't think that's ridiculous at all. I think the mirroring scheme was set up by someone other than did the repartitioning and the person doing the repartitioning didn't fully understand the implications of the backup scheme, namely that unmounted partitions aren't backed up. Furthermore I think that there are organizational issues that led to the mail disaster in other ways, too. I've refrained from commenting because I haven't had a good idea how to separate criticism of the upgrade from criticism of the people who performed the upgrade, but I personally think it was a very bad idea to upgrade and restore in place. If we had a spare SCSI disk (or perhaps set of disks) [which we should have anyway, for disaster recovery] the entire upgrade could have been performed without ever risking the data on the disk(s) the system had been running on. As I understand it Grex has got a not excessive, but still reasonable amount of money in the bank. Perhaps we should invest in preventing exactly this sort of behavior the next time around. And while I would never suggest that anyone jettisoned the mail on purpose, I suspect a contributing factor in the mail loss is that none of the people involved depend on the mail system here in any way that's truly important to them. They shouldn't *have to* to administer the system but it does tend to focus one's attention when you've got something to lose.
I agree to disagree with STeve about a full-time staff person. The Board can at least make an attempt to find such person(s) with flexibility in availability and accountability. A staff of several persons with dedicated timeslots would be ideal but needs to happen by someone taking that task as the lead. "We never did it before" and "isn't going to be" are empty excuses, imo. Why is improving Grex uptime and maintenance so painful a concept?
And how do we pay for this person?
Well Mike, I would have *liked* to have had spare disks for the upgrade. Here at work I keep entire spare machines such that an upgrade is done on the next machine, with data transfers done onto the new machine, and a switchover of IP addresses. I've had as little as 4 seconds of downtime for such upgrades. But I did not succeed in getting the board to move on getting the PC Weasel because of costs. yes, I thought of asking for money for at least one more 36G scsi disk, but I didn't want to go through that, dealing about money again. I lost mail of mine too, Mike, so I felt the pain as well...
> But I did not succeed in getting the board to move on getting > the PC Weasel because of costs. yes, I thought of asking for > money for at least one more 36G scsi disk, but I didn't want to > go through that, dealing about money again. Right. Which supports my counter-argument against Jan's statement and suggests that maybe the recent trouble points to some organizational problems that we can remedy before the next time we reach a crisis.
re #49
And how do we pay for this person?
We can pay them double what you're getting from Grex. ;)
I'm not so sure. The management of Grex has always been prudent about financial things, to keep the system healthy. Perhaps one could say there is too much of that at times, but thats life. No organization is perfect. Overall I think Grex does things pretty well, and, in spite of my thoughts on things, I'd rather have this organization than even a lot of "professional" organizations in terms of how they work. This is not to say that we couldn't stand improvement, just that we're less screwed up than most business places, from my point of view.
Re #40: It was *my* fault that we lost the mail partition, and mine alone.
Thank you for not displacing blame and taking responsibility Steve. I'm sure we all know you didn't do it deliberately and will probably not make the same mistake again. Thank you for being honest.
re #54: Your error was the proximate cause, but that doesn't mean that there weren't contributing issues that we should address in anticipation of future mistakes -- anytime people are involved mistakes are inevitable but through proper planning and procedures you can make a huge difference in the outcomes.. I think you're seeing this as a discussion about blame, which is probably not an unreasonable way to look at it from your standpoint, especially as there are still a lot of people who want to discuss blame. I'd much rather try to figure out how best to keep it from happening again, which requires some degree of understanding what happened and why, but is a question to which blame is pretty much irrelevant.
No, I'm not looking at it from a blame stanpoint, just the truth. I do however fully agree with you that we need to be able to look at things and do some things better. Thats a good thing to do.
hey tod. how do you say 'fiduciary' in Romanian ?
Re#50: It should be noted for the record, as of the last board meeting, we have reconfirmed that the cost of a PC Weasel is being covered by an anonymous donation. Has the order been placed? Once this and other post-mortem discussions have run their course, I suggest we (staff) should take the points made, write up a summary of how we'll go about the next one, making certain the process described will address the shortcomings identified in the last one.
I have a suggestion. Documentation tends to be written and then squirreled away in any of a number of places where it may or may not ever be read or seen again. I propose that Grex operational documentation be kept in a single file or directory, and that the contents be tagged (XML, perhaps?) and the relevant scripts/programs modified so that those scripts and programs, when run, can access and echo to the screen of the calling user any information which they should have in mind before any actions are performed. Ideally, they would require an acknowledgement before continuing. The advantages are: easy updating of documentation (all in the same location); improvement of documentation (since it would constantly be appearing, chances are it would be written or rewritten to better communicate important information); and less time wasted either writing useless documentation or because of lack of documentation where and when it was needed. This is a fairly easy to implement suggestion, and will make it much easier to have new staff trained in the vagaries of Grex whenever there is new staff to train. This may represent a significant allocation of time, but for those who have already spent lots of time writing documentation, it shouldn't be hard to see why this is necessary. It should be prioritized as highly as any other staff responsibility including keeping the system running and secure, because it will make both of those goals easier and faster. Lastly, if anyone thinks they don't need to have stuff documented because either "everyone knows it" or "I'm the only one who does this and I know it," those are the persons who most need to be doing this.
By the way, this scheme easily allows for pointing to additional tools and documentation to supplement the echoed information. For that matter, both tools/scripts and documentation might be collected in a keyword searchable database (using the same tagged source documents) for anyone needing to know how to perform a certain function on the system. The more of this kind of thing that gets done, the less the system is dependent on a few individuals with highly specialized knowledge to do most of the things necessary to keep the system running properly.
re #58 hey tod. how do you say 'fiduciary' in Romanian ? demn de incredere
We have a good start at documentation in the /grexdoc directory and in the staff conference. Both need more work, but we do have a good start for it.
Re #59: I haven't ordered the PC Weasel yet, but I will soon. Someone needs to find out from Provide Net what it's going to cost us per month to have a separate machine running, which is, as I understand it, what we will need in order to make the PC Weasel work. Tod said the board should be trying harder to get more staff. Well, I'm not on the board right now, but I think I speak for them when I say, they're open to suggestions.
I'll send mail to John A again about the cost.
Re #41: You keep saying that it would have been nice if notice had been given before the upgrade. It was in the motd for several days beforehand that the upgrade would happen that weekend if at all possible. The upgrade happening as soon as STeve got everything together and could get with John was discussed in at least one item for a couple of weeks before it was done. How much warning do you need, or did you expect a personal email?
The problem with messages in the motd is that people forget to change them when they get outdated so we tend to ignore them. If there were only relevant messages there I would read them. I don't really want to know that grex was down two weeks ago for a day.
Things like maintaining the MOTD are tasks to give to people who want to join staff as a way of seeing how they handle it. Start him/her off here, and then go up from there.
If not the motd, where? It was also discussed in at least one item here and in Agora. Short of sending email to every account on Grex what are we supposed to do. I manage to glance at the motd every time I log on enough to notice if something new is posted. It only takes a couple of seconds, it isn't that long and it has been rather up to date lately. If you choose it ignore it, that is more your problem than it is staff's. Yes, I agree that outdated things should be removed, but lets get real here.
What would be the impact of sending an e-mail message to every account on Grex? OR, better yet, what about an opt-in mailing list for people who would like to get system announcements.
Now *that* is a good idea, a mailing list for announcements of system work, downtime, etc. Excellent. The impact of staff sending out mail to every acocunt on Grex would be 1) to take about 20 minutes of system pounding to deliver about 29,000 emails, 2) consume about 50M of /var/mail space, and 3) would likely generate a couple hundred emails back with 1/2 asking if this was real, and the other half asking about why and when the system would be back up, regardless of what we said in the mail. ;-)
;) GreX should provide an escort service
Re#67 Sindi, we can certainly remove motd messages more aggressively. Notices for things such as recent outages tend to stay in motd for at least a week to insure that folks not regularly logging in or reading the conferences still will have some idea why the system may have recently crashed or otherwise been unavailable. A weeks notice for major notices is a (hopefully) reasonable balance between those who log in daily and those who hit the system at least weekly (arguably it is a balance between those who only need to be told once and those for whom the message may not register until they've seen it several times).
Sorry Bruce but I don't think we should commit to that. Access to Grex's hardware is simply too limiting. Back when Grex was starting to crash every day some months ago, I wanted to get to Grex and do things for a week, every day, and simply couldn't get there in time to be able to do anything with the 10pm curfew we live under now. Yes, its a *good thing* to give advance notice on shutdowns, I fully agree. But let's not lock ourselves to it.
Steve, I was referring to Sindi's complaint that motd has messages about past crashes and outages too long *after* the event. I made no comment as to how much warning there should be before there is an outage. I think our current routine of announcing system down time several days in advance for scheduled downtime, and best effort warning for anything else is sufficient. On a related note, I think we should commit to updating the hvcn page with current system status *before* commencing any system work, emergency or otherwise, that will keep grex down or unavailable for more than a few minutes.
Sigh. OK, upon rereading this I see what you mean. Don't type before coffee should be my mantra on these rare days when I'm up before 8am. Yes, putting an announcement on the hvcn page is something we need to do.
an operation as large as an os upgrade ought to have had a written checklist - and that checkllist could have been discussed looong before hand in public, agora &/or coop. but then shoulda/coulda/woulda only has recrimination value after the fact. since this type of operation isn't about to happen too often, the disaster is just lurking around - AS ALWAYS - waiting for memories to fade . teh previous upgrades were not as complicated, had much more notice, and were thought through with more precision - might even have had a written checklist handy!
re hvcn ... i went there for info but had to call mary and ask where it was on hvcn .. there was nothing (at that time) that would have led anyone to know where to click --- unless you already knew in advance and book marked it. re #74 -- what 10pm curfew??? i thought 24/7/365.25 was the deal? we got out of ken's wharehouse for the same curfew problem, adn now we are back into another curfew? guess i wasn't payig enough attention. btw, back on the checklist thought ... at least those with military training wold have, by default, created their own instructions sheet. not that you have to have had military training to figger that out but it helps. and systems engineering 101, remedial, would have demanded a check list ... system analysis 099, non-remedial, would have had checklist provisions built-in to the coourse. hell, the repetitive event sequence of starting up an airplane is done by two pople with a checklist! hell, back wehn *i* halted grex, by accident, i wanted that non-existant checkllist to provide some thoughtful path. wasn't one. left grex 'as is'. nothing was damamged, noting was lost except my staff responsibillities - backups (how ironic). at least one of hte ppl (me) who forsaw precisely this disaster *sometime* in the future and volunteered to backstop it from being the disaster it now is, was 'offed' and dissed in teh process. future borg adnfuture staff *could* have an adgenda item: monthly backup accomplished? checklist adgenda item: yes/no first things first. secure your environment, what is contemporary, before wnadering off into the unknown future with NO ROUTE HOME IN PLACE. it is only isn the last few years that i no longer enter a new environment without already knowinig THE OTHER WAY OUT, just in case. catastrophy theory (my masters subject) says, 'you can't get back from here.' therefore you prepare everything so that you will never *HAVE* to go back where you cant get to. if yo can't get back and you cant cover your ass, you fscking STAY PUT until another path is found/created to prevent exactly this sort of catastrophy. my explanation of that, simplified, back when i halted grex was apparently unintelligible to the recipeint(s). or, it was forgotten over time, which is more of what i think, just damn forgotten. blithly erased from teh cache of collected wisdom. not much cache in that account anymore, eh?
I'm not really sure how to respond to what you are saying. I don't think you understand the nature of the upgrade, in that there was no path back. The op system had problems with both the networking card and filesystem issues, and as an extra treat hardware problems. This all dances around the critical issue that no one should keep valuable mail on Grex only. Keeping valuable mail in the /var/spool area is even worse; it's an active filesystem, the most active one on Grex and as such is more prone to failures than anything else.
I don't think anyone can coherently respond to one of tsty's posts :(
(hah)
Re #79: You know, STeve, I just don't think that's a good enough answer. Yes Grex is run by volunteers, and yes people can't expect the same accountability from Grex that they can from someone they're paying to keep their data safe. But if Grex is to be anything people care about, then the board and staff have to themselves care enough to do their best for Grex. Frankly, I'm glad some people are really pissed about losing their mail. I wish more people were. If no one gave a damn, well, then Grex would really be nearing the end of its life as a viable community. I agree with tsty that backing up the system ought not to be an ad hoc thing that someone works out as he goes along. THere ought to be a procedure in the GrexDoc which gets followed each time. If the system changes so that it gets done differently, then the changes ought to go into the documentation.
And you don't think I care about Grex? Good God Mark, I made a mistake. WHERE do you think that I a) don't feel badly about missing it, and b) that I don't care?
I didn't say that or mean it, STeve. But I don't think telling people that they shouldn't ever store anything that matters on Grex is a good enough answer. I believe Bruce when he says the mistake you made could have happened to anyone. *But I don't think it should have been you and John alone in the room doing the backup and upgrade*. I think there should be an institutional procedure in place for these things, so that the collective knowledge and experience of all the staff is brought to bear on the procedure. Grex shouldn't be dependent on one person's judgement. And actually, I thought there *was* a procedure in place, in the person of the GrexDoc. And I thought you promised the board you would follow the GrexDoc when you did the upgrade. Maybe the doc wasn't complete - I don't know if it covered the backup part of the upgrade. Did it?
I haven't logged into Grex in the last year or two, but I've been lurking on the staff mailing list. If I lost anything in this backup error, it wasn't anything I cared about. That said, I, too, am somewhat puzzled at the procedure that was followed. I got my start doing Internet stuff as a member of the Grex staff more than ten years ago, so I remember the constraints we had to work under then. Grex was a rare piece of ancient Sun hardware, disks were really expensive, and none of it was any more reliable than most of the other stuff running on the Internet in those days. When something needed to be done, it often meant taking the system offline, sometimes for a full weekend in the case of a few major upgrades or disk crashes. We had a much bigger staff back then, and for many of us whose social lives revolved around the Grex community it was a pretty high priority, so when something needed to be done there were typically lots of people around to work on it. I can certainly see how doing things as we did them then, but with a smaller and less focused on Grex staff, would lead long periods of downtime. But I'm puzzled about why I see the same methods being used on Grex now, when hardware is considerably cheaper and staff time appears to be a much scarcer resource. My perspective is arguably a bit skewed. The non-profit where I'm now a paid full-time staff member is pretty impoverished, but still has a budget a couple of orders of magnitude higher than Grex's, and I tend to come at systems stuff as a manager rather than as a hands-on sysadmin these days. Still, it doesn't look to me like the problems that are being talked about here are difficult to solve. If I recall correctly, Grex is now running on PC hardware that's at least two or three years old. In other words, getting some equivalent systems should be cheap (or free, given that that's replacement age at a lot of places, and Grex is 501(c)3). Installing new software versions on new hardware, testing, and then copying over whatever is dynamic at the last minute, seems pretty obvious. Falling back to the old system at that point if something doesn't work is at most a matter of moving an ethernet cable. Likewise, having spare systems ready to copy whatever is dynamic onto is a good way of dealing with hardware failures. This really, I think, comes down to whether anybody still cares enough about Grex to make it worth dealing with. My own view is that the community I once cared about seems to have gone on to other things, and the services Grex is providing aren't anything special anymore. But if people care about keeping Grex operating, it looks like something needs to change.
I'm upset about the mail debacle too but unfortunately, anytime any constuctive criticism is offered, it's somehow taken as a personal attack on that person. Back when I was in the AF, every section had a recall roster, and a binder with documentation of a basic contigency plan and checklist. I think that would be a good idea for grex. Too bad that would be construed as tieing staffs hands or micromanaging. I also think the heirachy of operation should change from every staff operating as equals to someone volunteering as a main sys-admin, who is accountable and reports to the board. It seems that the current way of doing things is broken.
I think SCG (hi, Steve!) hits a few important points in his resp:85. In particular, I would like to stress the need to evolve our thinking from "This computer is Grex," to "This internet service is Grex, and these are the hardware components we have to support our service." Right now, *everything* is a single point of failure for Grex, and as we just learned, staff can't back out of an upgrade because the upgrade is done on top of the old disks. Amazon.com and LiveJournal don't go dark for a week while they do upgrades; they acquire the hardware they need so that upgrades can be rolled into production with a minimum of disruption. ---- Longer term: all of the community-building services Grex offers are now offered, for free, by large organizations with professional support staffs. The one thing which isn't common is the open access to a shell prompt; but that's also one thing which creates huge social/behavior management problems. It's also unclear to me if that's a core function of Cyberspace Communications as it was organized, rather than the tool towards the community-building goals which was available 14 years ago.
Amazon.com and LiveJournal have massive capital investments in hardware and engineers that Grex simply can not and will not ever provide. Further, the financial impact of outages is different for Grex than it is for those two.
You're right that Grex doesn't have as much money to spend as Amazon or LiveJournal, though I don't think that point escaped anyone even before you explicitly stated it. A more salient point is that Grex has enough money in the bank to afford a backup disk. We just didn't plan to use it for that.
...Or the money could be spent on a colo that would give us 24/7 access to the machine, thus giving staff a larger window to recover from outages. You see, I think this is the sort of direction that some have been saying Grex lacks. We're not sure what takes precedence. Another suggestion: Grex has security goals, why not have overall system goals? Maybe even a mission statement? These goals could be put in order from most important to least important...they could be things like: "Maintain a conference system void of censorship", or "provide for limited dialup internet access in the ann arbor area", or "provide for user data integrity through fault-tolerant disk storage and regular backups". Then, when it came to making decisions on expending resources, everyone would be on the same page as to what problems took priority.
I know that because of his work hours and long commute, physical access during the day and the early evening is not feasible for STeve, but when 24/7 access is suggested nobody ever says who's hypothetically going to be fixing the system at 3 AM, so I'm not sure access hours are the real issue.
I care about Grex and M-Net (for different reasons). And I still think that anyone who uses either system with the expectation that their files are safe OR secure is a fool, and I don't have any sympathy for people who lost important email they had stored on grex.
Won't ric be surprised when he finds out I used my staff access to delete his home directory, conference participation files, and uid! Just kidding, of course, but if he thinks users shouldn't expect their e-mail to be safe from sudden disappearance I'm not sure what else on the system ought to be sacrosanct..
If most of the users agree with you mcnally, then that should be one of Grex's goals.
Mike in resp:91 :: before Grex left the Pumpkin, there were numerous times when I dropped Steve off there after we got back from work, and he worked on Grex for some hours in the very late evening or early morning.
I seem to remember a few times that you dropped him off at the Pumpkin when you got back into town, and picked him up there in the morning to go back to work. For those advocating having an equivalent system for doing upgrades and recoveries: where do we store it? The colo charges for space. If a staffer stores it we still have problems with access unless that staffer is the ONE doing the upgrade/recovery.
re 93 - I would be surprised, and I'd probably ask for your removal if you did that on purpose without good reason. But it wouldn't really bother me much. I'd just create a new account. I participate in two conferences - coop and agora. And I have used the forget statement on all but one item in agora. I don't have any files in my home directory that are important to me. The only thing that might upset me is if I was unable to get my username "ric" back, since I'm pretty much been known as "ric" in the mnet/grex world since 1986. (Though I think there was a period of time in the mid 90s where someone else had that ID on Grex cuz I got reaped)
Even though we are just a small organization, there is nothing wrong with us doing the best we can in all situations. I also think that criticism is ok although I sometimes think that some people around here have trouble presenting their criticism in the best possible way. It is pretty easy to start feeling defensive about things. As for the email loss. It was a mistake. It cant be undone and that is that. No one did it out of malice. And even the most competant technical people make mistakes sometimes and email sometimes gets lost even at for-profit firms. As for what we can do to prevent such a loss in the future...Well, there are a lot of good ideas being presented here. I dont know what the answer is. Our finances arent great and I know that there is a reluctance to spend a lot of money. However, exploring backup options is really something we should do.
I just found the info that I had saved in a recent email and it is actually nice not to have to go through all 200 or so old mails deciding if there was anything important in them, so I am actually grateful now, and pine starts up so much faster with an empty inbox. I wish spamassassin would work again.
Have you tried in the last few days. I reinstalled spamassassin and spamd a day or so ago.
I had been using a copy in someone else's account, because he said he updated it more often. I will switch to the grex version, thanks. I had gone back to my old filter, which is about 10 pages long and lets some things through.
re #79 ... excuse me! it wouild seem, that *i* undrsatnd "the nature of the upgrade," one hulluva lot better than either you or other staff or other borg! shit! "there was no path back" --- that is *precisely* the sysadmin situation for which i have been * t r a i n e d * !! whtether it is an air defense missle ssytem or a fscking os upgrade - the cover-yur-ass attributes are identical. somewhare along the line i copied this: Worse, a great deal of the delay was because we as staff really failed to work together effectively. We ran into deep differences in basic philosophy about how grex should be run that cost us extra days. Because we didn't all agree on what we were going to be doing before we started, our preparation for the rebuild was not complete. We ended up redoing significant portions of the job more than once. i don t know, at this moment, where it came from, but i did *not*write it. some rooty-tooty (not sTeve) did .. and borg & staff are imtimately responsible for the fsxk-up. mostyl borg! 'in-place' .... WTF!
I believe that Jan said that.
StEve steVE sTeve STeVE
Quick! What's 5 choose 2? Answer: (5!)/((5-2)!(2)!) = (5!)/((3!)(2!)) =
(5*4)/2 = 5*2 = 10. Think of the permutations of the capitalization of
letters in Steve's name this way: Given a string of 5 characters, taken
from the Alphabet {0, 1}, how many ways may I write such a string with
exactly two 1's? Clearly there are 5 choose 2 such ways, and as we have
seen, that means 10 possibilities. Now, I take a 1 to mean a capital
letter and a 0 to be a lowercase letter and enumerate:
STeve = 11000
StEve = 10100
SteVe = 10010
StevE = 10001
sTEve = 01100
sTeVe = 01010
sTevE = 01001
stEVe = 00110
stEvE = 00101
steVE = 00011
These make up the set of permutations of Steve's name with his preferred
number of capitals (though his preferred choice is one specific element).
His preferred choice came about by accident. When he first started using conferencing systems, he didn't release the shift key fast enough. He went to the National Computer Conference and while in a conversation someone asked him if he was the S T eve. He laughed and replied that he was and they had a great time talking. He decided that it was a good thing to keep and has used it, purposely, even since.
I purposely pervert his choice by being a wiseguy and taking the other oddity
of his name ('), which is applied to his last name, and applying it to his
first name.
I didn't do this because I had a great time talking.
re 105 I prefer using my calculator to solve those types of problems, but really; i was just goofing around !@
that's also the number of handshakes in a party with 5 ppl, if everyone shakes handswith everyone else.
That's true. Think of each bit as being two people shaking hands.
I agree 100% that we shouldn't have rebuilt the system by overwriting the old disk partitions. One of the recommendations I made in my post-mortem item immediately after the new system came up was to never do that again. Alas, I did not make that recommendation before the rebuild - though that was certainly part of the upgrade method defined in Grexdoc - that's why the ALT partitions exist. But I'm really not an experienced system adminstrator anyway. I'm not sure that the need to avoid a destructive rebuilt was as clear in my head before this fiasco as it was afterwards. Live and learn. In any case, I wasn't around to give any recommendations. Before the upgrade, John was really the only active staff member. He was doing the reboots. He was debugging grexdoc on another machine. He was reluctant to undertake the rebuild by himself though. My impression was that there was something of a panic at the board meeting. Grex was crashing regularly, and their wasn't much of staff plan to do anything about it. STeve, a board member and a staff member, responded to the emergency by committing his next weekend to a Grex upgrade. I had been neglecting Grex so completely that I didn't even know about it until I talked to John and Mary on the Grex walk the morning before the upgrade. There was never really any staff meeting to discuss the upgrade. If there had been, we might have given it enough thought to realize that there were alternatives to doing a destructive rebuild. In fact, I think we have a spare (rebuilt) 18G drive laying around. I think with that we could have managed the rebuild without buying a new disk. But buying a disk would have made sense too. We rushed into the upgrade. It felt like Grex was in crisis. If we had held a staff meeting first, I'm not sure anyone except John would have shown up.
Tod said the board should be trying harder to get more staff. Well, I'm not on the board right now, but I think I speak for them when I say, they're open to suggestions. Could have fooled me. I see nothing but excuses being made and "we do/did ENOUGH already" Excuse me for asking for something more than an MOTD, decent backup, and effort to find staff with more availability. How dare me for making suggestions. Shame shame.
shame on you.
THanks Michael Moore!
thanks tod :L)(
re 100 - I'm not suggesting that staff doesn't do the best they possibly can to avoid email loss and other such things. I'm suggesting that we as users should not expect or demand anything more. The fact is, if this were a commercial organization, there would be daily tape backups, stored off site, our hardware would probably be more "enterprise" level and all sorts of such things - policies in place to prevent such occurences, and paid employees whose PRIMARY responsibility is maintenance of the server(s). Grex is nobody's primary responsibility. I'm pretty sure it's nobody's secondary responsibility - at the very best, I would expect Grex to come somewhere after job and family.
Grex is nobody's primary responsibility. Grex is the fiduciary responsibility of all elected volunteer board members. If someone is not willing to be responsible for Grex's operation, they shouldn't be on the Cyberspace board of directors. Its that simple.
re #117: Are you seriously arguing that the board has an *obligation* to ensure that Grex is run at the same level of service and reliability as a commercial service? If not, what *does* your statement imply?
re #118 Obligation: "ensure that Grex is run"-ning for such purposes as "public education and scientific endeavor through interaction with computers, and humans via computers, using computer conferencing.." because "The Corporation assumes all liability to any person other than the Corporation or its members for all acts or omissions of a volunteer director incurred in good faith performance of their duty as an officer" I'm not saying people are going to get sued or that businesses will crumble as a result of downtiem. What I am saying is that "good faith effort" should be a minimum goal of any director of Cyberspace Communications when assuring Grex stays online and maintained.
And who says that it isn't? You're talking about an obligation which is so vaguely defined that in legal terms, someone would have to be actively subverting the system or sabotaging it to be provably NOT complying with your demand. It is a volunteer organization, with a volunteer staff, and a volunteer board. As such, the reality is that it will get whatever benefits of goodwill it gets in terms of money and time, and that's it. You can't make it something it isn't, and something that it isn't is a service with the possibility of being held to the standard of performance of a commercial service provider with contractual obligations.
re #120 I'm not making demands. I'm simply reflecting on the current status of Cyberspace. A status that lacks some leadership in the management of Grex when it craps out. Is that so much to ask? These cries of pay-for-service levels are spin. We've had numerous outages and waited days on end before someone could get to Grex. And then when they did, it was ad-hoc, and files were lost. I'm simply looking for a lil assurance from the Board that somebody is in charge and that everyone knows who that is. Who is accountable next time Grex goes offline for a week? Answer that.
No one. It's all volunteer. But then, you're saying that's a problem (and so it is).
It is a problem but it isnt one I see an easy answer to. I am not going to demand that a volunteer give more time than they offer to give. I try to remember to let them know I appreciate their efforts but I am admitedly not the best at that. I do really appreciate all the volunteer time that goes into running this place though. And frankly, if someone with more energy than me were to step up to do a better job, I would gladly step out of their way to let them do it. So who is accountable the next time Grex goes offline for a week? I dont know. Whichever staff person steps up. We are pretty lucky that we have anyone at all really. Maybe next time no one will do anything and then the board will have to scramble to figure something out although I hope it never comes to that because I honestly dont have any idea what I would do in such a sitution.
I refuse to believe that Cyberspace's elected directors can't do a better job with staffing Grex.
The organizations I volunteer with do not accept the excuse "I'm only a volunteer". If you ask me, thats a learned attitude in an organization.
lol do u volunteer for gay fags 4 america lol u probably do and ur excuse is im straigt lol
When it is necessary for your volunteers to bring with them a certain and specific skill set, and there are not large numbers of people from which to choose to fill volunteer positions, then you have to accept less commitment. That's just reality. You can't change it by wishing it away or declaiming it. The other thing is that just because Grex has been more stable in the past than it has been recently doesn't mean anyone was any more accountable then or that anything has changed in the organization. The only thing that is substantively different is that the machine is less accessible when it is convenient for those who can do something with it to do so, and those volunteers who are able to do something may be less available for whatever reason now than they may have been in the past. This too may pass. Bitching about the situation and blaming the existing volunteers for having lives and responsibilities other than Grex only serves to make those volunteers feel less like the efforts they do make are appreciated and that very likely has the natural consequence of making their efforts here a lower priority in their lives than other things they may find more rewarding. This has been a particularly wordy way of saying "There's really nothing that can be done about it, so get over it and stop potentially making it worse."
For some people here, Grex IS their life. Check it out - barely an hour or two can go by without their jumping in with commentary. They live here. I'm not surprised they have a hard time seeing that not everyone sets the same priorities. But I certainly wouldn't wish that level of involvement on anyone who wasn't being paid to do a job and then get on with real life.
Hey Tod, I'm going to walk across the street and ask the volunteer firemen what would happen if they showed up to fight fires whenever they wanted.
Also be sure to ask them what would happen if some person chewed them out for not going back into their burning house to save their photo album, and if a bunch of people joined in and started piling on about how poor the firefighting had been lately and how they really needed to commit themselves more "or else."
Or just do something simple and go in and ask "Who's in charge?" I haven't seen any "or else" demands. That's just more spin. Mary is right. Some of us take downtime a little more seriously. I appreciate Lynne's participation in this discussion because she is honest without throwing rocks. I ask who is in charge and she says "the first staff person to step up." That sounds logical. Every disaster's first incident responder is obviously the first person on the scene. Now, how incidents are handled after that are where things could probably improve. There needs to be a "go to" so when the system goes down, the rest of the Board knows who to call for a status..and then the members can ask any Board member available and get some sort of decent response. I'm not saying it has to be chinese fire drills and all corporate red tape but at least just some sort of formal person that shows up at board meetings to represent staff. If that's STeve or Remmers or whoever, great. I'd at least like to see the Board address it at their next meeting and come up with something.
I'm not trying to pick on staff too much Mike. You guys really go a fantastic job.
No, actually, lately we don't, which is clearly a problem. I'm not trying to sugar-coat what happened or shut down criticism. What I would like to do, however, is promote a pragmatic view of the situation. We do have a problem, but we also have very limited resources with which to fix it. Arguing about what "should" happen is kind of pointless at this point unless it's something that also *could* happen. Until/unless a proposed solution is possible with the constraints we have to deal with it's kind of a waste of time to spend a lot of time arguing about it.
Regarding #128; Maybe you should encourage some of them to become staff. Oh, wait.... You know, something the board *could* do is advertise a position for a staff liason person; something that someone could run for if they chose. Them taking that position would sort of make them the chief staffer, but also make them accountable. If circumstances in their life changed so that they couldn't handle it anymore, they could resign. Since they volunteered for that position, with the additional responsibilities it entails, there really shouldn't be much of a problem with asking them to do whatever extra it entails.
i'm proud to call GreX my home!
I'm seeing a lot of comments here about how things work in commercial environments, making it sound as if there's one way of doing things in such places. In fact, from what I've seen, there's a pretty wide spectrum. Commercial organizations have a wide variety of experience, budgets, resource constraints, contractual obligations, perceived levels of importance, and operational philosophies, even if they're providing services that may look quite similar from the outside. It seems non-useful for people to say, "commercial content providers do X, therefore Grex should too." It likewise seems non-useful to say, "Grex isn't a commercial organization, so it can't do what commercial organizations do." It's perhaps worth taking a look at change management procedures in some of the slowest changing but most stable network operators -- traditional phone companies. At the one I worked in the web hosting division of, nothing could be done without filling out lots of change management documentation: extensive documentation about the change procedure, including exact commands that would entered, test procedures, backout plans, justification of why the change was needed, who was going to be involved, when it was going to happen, what the impacts were going to be and to which customers, and so forth. This all had to go through a committee, which might approve it a couple of weeks after it was submitted. It wasn't fun. Nobody did anything just because they thought it might make some small incremental improvement. Problems were often left alone until they became emergencies, because the bureaucracy involved in fixing them would become somewhat easier then. But at the same time, human error-caused outages became pretty rare. The committee that reviewed these things didn't really know how to do anything other than see if the questions had been answered, but answering the questions forced people to think through things carefully. Adopting a very stripped down version of that protocol, asking people to answer a list of standard questions to their own satisfaction before diving into major changes, gets a lot of the same benefits and doesn't cost much. There are also the comments I've seen here about enterprise-class hardware that Grex can't afford. A lot of commercial sites also can't afford it, or decide it's not worth the cost. A lot of services which the Internet would be perceived as not working without -- some of the root and top level DNS infrastructure, Akamai caches, Google, etc. -- involves standard off the shelf hardware deployed in large enough numbers that if some piece of it breaks, end users won't notice in the few days it may take to fix it. What sort of hardware to use, how much of it, and how much support to provide in case it breaks, are interrelated decisions with costs associated, and different organizations come up with different answers. Managing volunteers is different than managing employees. Managing employees who are paid less than they could earn elsewhere is different than managing employees who are paid more than they could earn elsewhere. A general question to ask is, "are we getting more out of this person than we're paying them." I've dealt with employees who have been hard to deal with, but who were occasionally doing things that were really important, and they've seemed worth keeping. I think I've even been such an employee at a few former jobs. At my current non-profit employer, I've "fired" volunteers who were taking more of my time to manage than it would have to do the work they were doing. At the same time, if somebody isn't doing anything, is known to not be doing anything, and isn't costing anything, telling them to go away probably isn't all that useful. Having volunteers who occasionally do something that wouldn't otherwise get done can be a very useful thing. Telling anybody to go away before you're sure you want them gone can have some less than desirable consequences. On the other hand, having somebody be in charge, with at least the authority to tell voluneers what not to do, may have more positive impact than its cost in ruffled feathers.
Tod - Yes, the elected officers have a fiduciary responsibility to manage Grex. It's still not their primary responsibility. I'd feel sad for anyone who felt running Grex was the most important thing in their life. Aren't you on the arbornet board? Seems to me that it is YOUR fiduciary responsibility to have the annual meeting that was required by law which still has not occurred. But you see, Arbornet is not your primary responsibility, is it? It's not even your secondary responsibility. I bet your family and job come first. I bet there's a lot of things you consider more important than your obligations as a volunteer on the Arbornet Board of directors.
Please, that's just deflecting responsibility. Someone really does need to be "in charge" of Grex. Besides, arbornet not having its annual meeting isn't necessarily Todd's fault.
System downtime vs. annual meeting Shall we take a poll on order of importance? Governance has not been an issue for Arbornet, nor has accountability of staff and system maintenance. Let's talk about Grex since this is where we are.
a very Romanian response.
re 138 - people are "in charge" of grex. Where did I say they weren't? Nor did I say it was Todd's "fault" that Arbornet hasn't had it's legally required annual meeting. It's the Arbornet Board of Director's "fault". People are in charge of Grex. People are responsible for Grex. But those people have more important things in their life than Grex, and I don't blame anyone for that. I have a responsibility to my job because without it, I can't provide for my family. What is Steve's responsibility to Grex? He does these things as a volunteer, but you can be sure that his job and his family are more important to him than Grex. (Speak up, Steve, if I am wrong). That being said, if Grex is down for 3 days because Steve (or any other staff member) doesn't have time to fix it because of family and job obligations, I think it is ridiculous to criticize them for those decisions. And if a MISTAKE is made during the operation of Grex, what are you going to do, fire the staffer who made the mistake? I don't see a huge line of people volunteering to run these organizations. Most of M-Net's volunteers left for Grex or left the conferencing world entirely. It doesn't look like there's a ton of volunteers here on Grex either, so you take what you can get. the fact that either of these systems still exist is nothing short of amazing.
Being volunteers doesn't remove them from the responsibility to do quality work when they decide to use the powers over the system they're given. The whole backup thing was terribly poor work. Even the most novice, inexperienced of system administrators know how important backups are. The people involved in the mail mishap are apparently a gaggle of fools with FAKE pocket protectors.
I, for one, appreciate the volunteer efforts of anyone willing to do such jobs. And I realistically understand that these people are volunteers and have many other more important responsibilities in other areas of their life. I choose to not rely on systems operated by such people, and therefore, I've never lost anything important do to such issues. You may choose to rely on systems operated by volunteers. You may try to hold someone responsible for mistakes leading to loss of data or anything else that may arise from system downtime. You'd be a fool to do so and you probably won't get anywhere trying.
The loss of data wasn't caused by system downtime. It was caused by people not making proper backups. Even in a volunteer organization, there must be some work ethic.
What do you intend to do to force that? Have them all removed?
I don't have to be able to "force" something for it to be the right thing.
Such adamant defenses for complacency. I'm glad none of these folks work for larger non-profits.
And how do you suppose you could do better? Backups were made. A listing was made of the said backups to see that all the files were there, the listing report the mail directory and files were there, it just didn't say how big it was. Is the person doing the backups supposed to go in and look at all the 100s of thousands of files individually to make sure that the sizes are correct? When I do backups, I do listings to see that the major files exist, I usually don't unzip them and look at the size, with that many files there just isn't enough time to do so, especially when there are time limitations.
ric is like richard, except he types better
It's not particularly difficult to compare the size of files in an archive to the size of files in a directory, though the fact you think it is difficult speaks to your ignorance of Unix. It's also not particularly difficult to make sure the backup is done right in the first place.
Regarding #148; That's impossible. If Steve's account was accurate, none of the spool files would have shown up in the file listing. Regarding #141; Oh please. Call a spade a spade. No one is saying that people need to make grex the primary focus of their life. But someone needs to be accountable for it, and no one is. No one takes the responsibility for making sure grex is running. If they did, it wouldn't stay down for a week at a time. Now, I'm not saying people shouldn't make the decisions they do, just that grex needs to solicite someone to step up to the plate when no one else does. Of course, I expect I'll be flamed to pieces for challenging the status quo and not being an apologist. The grexists are a lot like the neocons when it comes to questioning things. They just don't like it when anyone challenges anything. Sad, really. And people wonder why grex isn't as popular as it once was.
re #148: > And how do you suppose you could do better? I've tried to refrain from criticizing STeve's mistake for a number of reasons -- (1) it doesn't get the deleted mail back, (2) I suspect he feels (or felt) bad enough, and (3) nobody else was stepping up to volunteer to get the job done and it's unfair how much of the responsibility has devolved onto STeve, but.. Your defense, while commendable from a family loyalty standpoint, is wholly misguided from a technical standpoint. A couple of really serious mistakes were made (chiefly, the backup was badly botched and* the decision had been made to repartition in place.) The results turned out to be a minor disaster for many of us, and it's insulting to pretend that there was no way it could have been prevented.. > Backups were made. As it turned out, some were, some weren't. That's the issue. > Is the person doing the backups supposed to go in and look at all > the 100s of thousands of files individually to make sure that the > sizes are correct? Actually, it's not that hard to write a program to do that, but even if you don't want to go to that much trouble one can get a pretty good idea by comparing the size taken up by the backup with the size taken up by the originals.
re #152 Thanks, Mike. I didn't even want to go there but you present a pretty simple guideline for next time.
Actually, this would have been avoided had Steve used the dump program instead of tar to do the backups, as I suggested. Steve wrote something somewhere that I thought was funny that seemed to indicate he thought it wouldn't have made a difference; actually, it would. Dump doesn't go through the filesystem to get the data it backs up; rather, it looks at the filesystem data on the raw disk devices. Tar goes through the file system; hence when it's sensative to whether the disk was mounted at the time. A better way to do the backups would have been to use dump. But I really don't want to beat up on Steve about this. I've done the exact same thing myself (luckily, I only deleted the mail spool of one user, but he was still pretty pissed off). Hey, live and learn. My major concern is with grex as a whole, and the idea that no one really seems to be in charge, despite claims to the contrary.
Again, i'm not saying it could not have been prevented, and I'm not suggesting that people don't try to do better "next time". i'm just saying that we all know how Grex operates, and we should set our expectations accordingly.
If we all know how grex operates, and should set our expectations accordingly, then you *are* suggesting that people don't try to do better next time. You are, without a doubt, saying that the status quo is perfectly fine. I am not.
Dan, Don't you realize that most Grex folk get seasick if there is even the slightest boat rocking?
Oh, sorry. My bad.
To be quite honest, yes - the status quo works for me because I don't rely on Grex for anything. If my participation files get hosed, I'll get over it pretty quickly. I don't rely on Grex for email either because in my opinion, nobody should rely on email hosted by an organization with no employees and nobbody whose primary job responsibility is maintaining that s ystem. I haven't seen anything suggested here that would make things on Grex any better - other than simple acknowledgement of mistakes made, and some hope that lessons have been learned. I don't know what YOU got out of this "situation" but for me, it's just an affirmation that relying on grex for anyting is foolish.
Well, I made a suggestion that grex solicit a staff member to be `in charge' in the case of a failure. Others suggested that a written plan be made prior to a major change (such as an upgrade). Both of those seem like suggestions that could make things better.
I asked for a "go to" from the Board. I think it makes sense to ask who we should contact for status updates when Grex horks. Ric doesn't care either way and that's fine for him. I don't see what his point is other than "this place is unreliable"
> To be quite honest, yes - the status quo works for me because I don't > rely on Grex for anything. How nice for you.. What does it have to do with the rest of us?
Actually, it's "This place is unreliable and if you lose anything important that is stored here it's your own damn fault"
(but you were close)
i take it back, I rely on Grex for this sad and pathetic form of social interaction that I've grown accustomed to over the last 20 years. On grex it's more party than BBS.. on m-net it's more BBS than party). So I donate financially to Grex and M-Net in hopes that they will both remain "alive"... it's nice having both so that when one goes down (because they are both unreliable) I can go hang out on the other.
I dont like the status quo on grex either. Like other people, I wish there were lots of people with lots of energy running things. Instead, we have a pretty good board (in my opinion) with weak leadership. A lot of that has to do with the amount of time I am willing to put into grex. I dont know the solution to the issue except to say that I also wish things were different. I wish we had a lot more financial support. I wish that more people were willing to become voting members and participate more actively in grex. I wish there were so many great people running for board that I got voted out. Wishing for things to be different doesnt change anything.
Change the board of directors titles to Official Eor
what ever happened to that 24/7/365.25 access concept at provide? what are the hours/days now?
Provide Net never offered us all night access.
Not that we'd have someone on staff running over there sooner than 48 hours anyhow.
fwiw, i live about a spit and a hollar from hewitt adn mich ave. and if you notice i have a resonable amount of free time at all sorts of odd times of the day/week. i;ve pissed off a whole buncha folks - including myslef now and then - over the years but when shit needs to get done, i get it done. fwiw.
also, whilst thoughts of the above reverberate .... how is the 8mm tape recovery coming? mcnally adn i and a *bunch* of us are curious, seriously interested, hopefully awaiting, etc.
now that grex has returned, maybe some action on 168-173 ??
How 'bout shutting off the idle daemon?
hmmmm, seems neither borg nor staff has interpreted 168-173 ... yet.??
TROGG IS DAVID BLAINE
You have several choices: