1 new of 176 responses total.
I haven't logged into Grex in the last year or two, but I've been lurking on the staff mailing list. If I lost anything in this backup error, it wasn't anything I cared about. That said, I, too, am somewhat puzzled at the procedure that was followed. I got my start doing Internet stuff as a member of the Grex staff more than ten years ago, so I remember the constraints we had to work under then. Grex was a rare piece of ancient Sun hardware, disks were really expensive, and none of it was any more reliable than most of the other stuff running on the Internet in those days. When something needed to be done, it often meant taking the system offline, sometimes for a full weekend in the case of a few major upgrades or disk crashes. We had a much bigger staff back then, and for many of us whose social lives revolved around the Grex community it was a pretty high priority, so when something needed to be done there were typically lots of people around to work on it. I can certainly see how doing things as we did them then, but with a smaller and less focused on Grex staff, would lead long periods of downtime. But I'm puzzled about why I see the same methods being used on Grex now, when hardware is considerably cheaper and staff time appears to be a much scarcer resource. My perspective is arguably a bit skewed. The non-profit where I'm now a paid full-time staff member is pretty impoverished, but still has a budget a couple of orders of magnitude higher than Grex's, and I tend to come at systems stuff as a manager rather than as a hands-on sysadmin these days. Still, it doesn't look to me like the problems that are being talked about here are difficult to solve. If I recall correctly, Grex is now running on PC hardware that's at least two or three years old. In other words, getting some equivalent systems should be cheap (or free, given that that's replacement age at a lot of places, and Grex is 501(c)3). Installing new software versions on new hardware, testing, and then copying over whatever is dynamic at the last minute, seems pretty obvious. Falling back to the old system at that point if something doesn't work is at most a matter of moving an ethernet cable. Likewise, having spare systems ready to copy whatever is dynamic onto is a good way of dealing with hardware failures. This really, I think, comes down to whether anybody still cares enough about Grex to make it worth dealing with. My own view is that the community I once cared about seems to have gone on to other things, and the services Grex is providing aren't anything special anymore. But if people care about keeping Grex operating, it looks like something needs to change.
You have several choices: