|
|
Staff Meeting - February 11, 2007
Present:
Jan Wolter
Steve Weiss
Steve Andre
Joe Gelinas
Things to do, in no particular order:
Zoneinfo update.
Daylight savings time starts three weeks earlier. If we leave in old
zoneinfo files, Grex's clock will be an hour off for three weeks.
For new files, upload the latest version of zoneinfo from the latest
OpenBSD-current release and install. Need to run zic to compile
zoneinfo files, which may not be entirely simple.
CVS Server.
Grexdoc is currently maintained on a CVS server hosted by John
Remmers. John wants to shut down this machine soon. Steve Andre
will set up a new grexdoc server in his office.
CVS Checkin.
Jan should check in latest changes to grexdoc.
Spam Filtering.
We'd be interested in trying to set up global spam filtering on
Grex, using spamd. Plan would probably be to discard definate
spam. There is some doubt about whether Grex's machine would be
able to handle the load of running Spam Assasin on every bit of
incoming mail, but it seems worth a try.
Optional Incoming Mail.
Probably a more effective way to reduce the resource drain of
incoming mail would be to make it option. New grex accounts
would not be emailable by default. There would be a command
which users could use to turn incoming email for their account
on or off.
The theory is that many grex users don't actually want to receive
mail on their grex accounts. If they all turned off mail, then
Grex would have much less mail to process.
Outbound Mail Approval Script.
Outbound mail is currently turned off for new account. There are
volunteers who would be willing to handle the process of approving
people who want outbound mail turned on. We need a tool that would
enable the mail-approvers to turn on mail for a user, basically by
adding their names to /usr/local/etc/outbound.
This needs to be implemented so that when an account is reaped and
a new account is created by the same name, then new account does
not inherit email approval from the old one.
Jan may do this, but won't mind if someone else does.
Mail to Staff.
Currently users who are not approve for offsite email can't send
mail to staff because staff is off-site. Need to fix this.
Outbound Mail Limit.
There was a plan approved by board to limit outbound mail to 50
messages per user per day. This could be done by exim script,
but possibly the ouutbound mail approval thing is easier to
implement and might suffice to solve the problem.
Outbound net access script.
Currently new accounts are created in group 1003, which does not
have outbound internet access. We want to be able to have a list
of non-root users move people into group 1002 so they can have
outbound internet access. Need a su-root script to do this.
Probably a task for Jan.
Fix rmuser.
The rmuser script dumps core, so we can't do reaps right now.
Reaping would be good.
Newuser changes
Dan Cross wants to make some fixes to newuser. Jan will
install these when they are available.
RAID and more Disk
There is some interesting in buying a hardware RAID controller
and a number of large fast disks to use with it. This plan
would vastly increase disk space and reliability. Cost might
be around $1600, but STeve will make better estimate.
OpenBSD 4.1 upgrade
OpenBSD 4.1 will be released later this year. We should
upgrade Grex to that.
Might be a good time to also do things like adding RAID disks.
118 responses total.
All the staff people at the meeting recognized that the biggest staff problem right now is a lack of active staff. Too many of the current staff have too much else on their plate, and many of us are not really very active. We really all want to do more, but we aren't promising much. Still, we thought it would be good to have a list of things that we think ought to be done, so that the consensus building part of any staff task is at least partly out of the way, and all we need to do is find someone to do them. The day after the meeting STeve noticed problems with one of the disks. We're going to need to replace that before it entirely fails, probably in a shorter time frame than we could possibly implement RAID in. Luckily we do have a spare disk.
Could grex reap mail accounts which have not been accessed for 3 months even if the user is still active? Not including accounts that are forwarding any mail that comes to them.
Regarding #1; Once again, I can do staff work if needed. Regarding #0; A couple of points. (a) OpenBSD upgrades are only supported through point releases. Right now, we're running OpenBSD 3.8 on grex. 4.0 is current right now. To do a supported upgrade, we really should upgrade to 3.9, then to 4.0, and then to 4.1.... It's a pain, but would be the best way to approach it. (b) With respect to CVS servers for grexdoc, I'd *really* rather see grexdoc hosted on grex and replicated to other machines offsite by CVSup or anoncvs or something similar. (c) What does rmuser do that userdel doesn't? Userdel is an OpenBSD tool; could we shoehorn it into the reap process? (d) Mail to staff could be handled by putting an intelligent agent midway in the process. Something like RT could be leveraged to good effect on grex and solve this problem at the same time. http://www.bestpractical.com/.
I was also at the staff meeting, as a staff member.
What's being done to recruit new staff?
Todd read my mind. Jan got about halfway there, then stopped.
Just curious where the 1600$ estimate came from. Are we planning to keep going with SCSI drives? I know a first order aproximation based on one vendor's webpage was barely over half that price, and drive prices are slowly coming down. Or are we looking at a big-ass 8 drive array so we get mondo space and redundancy?
I've created an item in the garage conference for further discussion/research into a RAID storage solution for Grex.
re #6 I read "...lack of active staff..." "...not promising much..." "...all we need to do is find someone to do them..." So I'm kinda wondering if the Board is going to help or is staff being relied upon? Jan & Dan, you're both on the board. What say you about the recruiting efforts for new staff members?
I think I would classify myself as "inactive staff". I do not see my schedule opening up for large amounts of staff work. I'm still hope to be available for helping out with system upgrades and some light software development and bug fixes, but even that is going to be inconsistant. I recognize that we need regular staff. This includes especially people who are local to Ann Arbor and can do things like reboots and hands-on system work. I haven't the slightest idea how to recruit them. Meanwhile, I've done some checkin of grexdoc, and have implemented a command that would let a selected list of users enable outgoing mail for other users. I'm told some people volunteered for this, but I don't know who they were.
I think rmuser is really called zapuser. As usual, I've forgotten everything
about it, even though I wrote it.
I seem to recall that one of the points of it was that it was optimized for
removing large numbers of users in one fell swoop. Some of the alternative
tools I'd seen would rebuild the hashed password file after each user
deletion. When you are deleting 10,000 users at once, this is a very bad
idea. I don't know of userdel is smart about this or not. The manual page
seems to suggest that it takes only one user name as an argument, which
suggests that it probably would require 10,000 passes through the password
file, and 10,000 hash rebuilts (which take many minutes these days if my
recent experience with vipw can be trusted) to remove 10,000 users.
Looking through the source code, I see it also has a lot of other checks:
- won't delete users with uid's under 1000 (or whatever is configured)
- won't delete members or staff
- won't delete accounts unless home directory is in one of the usual
spots for grex home directories.
- won't delete accounts that are on the immortals list
The user directory deletion is done in a paranoid mode, su-ing to the user
before beginning the deletion. This a final failsafe to avoid traps where
a user subdirectory is swapped with a symlink to /etc between the time
zapuser checks if it is a symlink or a directory and the time that zapuser
cd's into it. Of course, we have other safeguards against this to, like
checking if the inode number of the directory we cd'ed into matches the inode
number of the directory we through we were going to cd into, but it never
hurts to be extra safe, and by running as the user we know we can't be fooled
into deleting anything we shouldn't.
The directory deletion code is also designed to be able to delete arbitrarily
deeply nested directories, something older versions of "rm -R" generally
failed at.
In addition to deleting the home directory, it deletes mailboxes
screen and layer files. I should probably add something to it to
delete people from the exim.outgoing mail file.
My apologies to Glenda for omitting her from the list of attending staff members. Relying on my memory is always a perilous thing.
I've repaired the crash in zapuser. I have not yet taught it to scrub people
from the exim.outbound file.
Grex hasn't done a reap in a long time. I don't know that I know the correct
rules for reaping. I think it is:
Guest accounts which either:
- were created more than 3 weeks ago, and have never been logged
into, or
- have not been logged into for more than 90 days
If that's true, then about 48,000 of the 53,000 accounts on Grex are overdue
for reaping.
Reaping them would be a very good thing. Probably all those accounts are
receiving spam. Getting rid of them would greatly reduce the incoming mail
load, and make running a global spam filter much more likely to be possible.
Many of these accounts probably also have full mailboxes. Getting rid of them
might even allow us to increase the mailbox size limit for users who actually
log on sometimes.
The tools set for doing reaps should now be in place. However I have not
yet run one, because I don't know if the criteria above is actually still the
current criteria.
That seems like a high number of inactive accounts (and a low number of active ones). Are you sure the method you are using to check for logins is reliable, Jan? I gather that whatever finger uses has not been reliable since we moved to OpenBSD.
I think finger's output is reliable; Mark, what gives you the impression that it is not? The procedure for running a reap was to run the reap collection program (which would generate the list of accounts to be reaped), then look at it to make sure there were no `errors' (ie, staffers being deleted by accident), and then run the actual reap.
Thank you, Jan. I will undertake to perform a reap over the weekend. It'll probably take me a little while to uncover the directions; at least I know where to look. :) One other check is for fairwitnesses that have been reaped; I vaguely remember there being a separate tool for that purpose. It's also documented, in the same place as the rest of the reap.
Thank you all.
Followup to #11: I confirmed that if we used the standard OpenBSD userdel to reap users, we'd have to run the program once for each user. It can't do multiple users in one pass. That means the hash files would have to be rebuilt once for each user. Currently, rebuilding the hash files takes almost exactly 10 minutes. Multiple that times 48,000 users, and you have a pretty good idea why we need zapuser. (OK, you can probably divide that by two, since the rebuild times will get faster as the password file gets shorter, so run time will only be about half a year.) Zapuser now deletes from the outbound mail file. Yes, I am concerned that 48,000 of 53,000 users have been selected for deletion. It seems high. But every spot check I've done seems OK. I don't see users with files modified after their supposed last login dates. I've confirmed that http logins are correctly updating the last log file. If anyone has evidence that the last login dates shown in by 'finger' and 'laston' are incorrect, I'd be interested in knowing. I think we just don't have all that many active accounts. Joe is planning to do a reap this weekend.
If someone could do a backup before the reap, I'd have a warmer, fuzzier feeling about deleting 48,000 users and their mail and their files. If we've got drives that may be developing problems it'd probably be an excellent thing to have handy in any event. I wouldn't think it would be a problem to bring over a firewire or USB 2.0 enclosure with a cheap IDE disk in it and do a dump of critical filesystems to it. Even a dump of live filesystems would be better than no dump at all.
Good point, Mike. This will probably delay the reap, but better a late reap than a trashed system.
Regarding #18; Ten minutes to rebuild the password hash on this machine seems like a really, really long time to me. Hey, if userdel won't work, then it won't work; worse things have happened. I'm a bit surprised that it doesn't use the `-u' option to pwd_mkdb, which just updates a single record, instead of the entire hash file (notice that changing passwords and adding users doesn't take that long). But anyway it's a moot point. I suspect that Jan is right: we just don't have that many active users.
Re finger: I used to use finger to decide if members who weren't responding to my hails had disappeared from Grex. I'm afraid I don't remember details, but I quit using finger because it gave me results that seemed wrong. I remember asking about it (in agora?) a while back, and being told (I think by you, Dan) that there was no good way to tell when someone had last logged on. It's possible I'm confusing the last-logged-in time with the last-checked-mail time.
Hmm, I don't remember that conversation. I'd say it might be the last-checked-mail time that we were talking about, if anything; logging in interactively or via backtalk or (maybe) via FTP will certainly will update the lastlog file, which is what finger looks at to tell you when the last time a user logged in was. But, finger's details about a person's email reading habits aren't particularly useful, since people can forward email off of grex, and lots of programs will change the timestamps on the mail spool file without the user actually *reading* the mail.... (For instance, the `from' command will modify the atime of the mail spool file.)
You know, you may be right. I didn't actually try userdel. 10 minutes is how long hte password rebuild takes after a zapuser, which does do a full rebuild (even if only one user has been deleted). So I dunno. Maybe userdel would be fast after all. Some kind of backup would be a good idea. Another possibility would be to run zapuser without the -d flag. In that case, it doesn't delete the user's home directory, but stashes it in /a/deleted. It always saves their passwd file line. However, zapuser always deletes the mail file, so this isn't a great option if you serious expect to want to restore the user. We should backup.
I know in the past I've tried to finger those in conferences that I'm a fw of, to see who and how long its been since those users have checked into the conf. But it was taking forever and a day, so I finally got out of it. Though I'm not sure how relevent that is to this current conversation. :-)
Re #23: The conversation I was referring to was resp:agora56,4,87-106 Unfortunately all of Dan's responses have been scribbled. I can't quite reconstruct what was said, but it's clear I was confused by the answer, and I'm still confused. Bruce Howard said this in response 97: It appears if you log in with a non-interactive shell, for example: "ssh grex.org bash -i" no login record is made.
STeve is planning on doing backups and, if possible, replacing the flacky disk tomorrow.
Regarding #26; Yah, that was sort of for work. Long story (and one I can't really explain anyway). Perhaps the discussion was for non-interactive shells. E.g., when one does, ``ssh cyberspace.org ls'' and things like that (incidentally, that's what is happening when one does ssh cyberspace.org bash -i...).
Well, STeve did a backup, and Joe did a reap (probably the first since we moved to OpenBSD). I guess we'll soon find out if this was a problem. I'd be surprised if there weren't at least a few people among the 48,000 or so that we deleted that maybe shouldn't have been. I did spend some time before the reap adding people to the immortals list whom I thought ought to be there.
That should fix /var/mail for quite a while. Thanks.
Yes, it was the first reap since December, 2004. Since /var/mail is down to 38%, it probably will be fine. For a while.
Did you reap a lot of mail accounts without any mail or spam in them?
Why do some accounts have 20MB of mail in them? For instance munkey, whose mail account is dated Sept 18. I thought we had a 1MB limit. Is something broken? Munkey last logged on Feb 15 but may have abandoned the mail account to the spammers. Is there some way to tell if an account is being used to forward mail, and if not, reap it after 3 months of disuse?
Re your first question: No idea; we don't track that statistic. Re your second question: the quota was raised some time back. As to your third question: What (other) people do, or don't do, with their mailboxes is not really any of my, or your, business.
I agree with Joe.
Re question three, since new users are not being given mail accounts without requesting them, and since many or most of the old users with mail accounts probably are not using them for anything, would it make sense to reduce the number of unused mail accounts of people who are not doing ANYTHING with their mailboxes and did not want them in the first place. Note the '3 months of disuse'. I was not suggesting keeping people from forwarding mail.
I think the reap process works rather well; we just cleared out over 40,000 accounts. Unfortunately, we're not doing opt-in email yet.
What is the new mailbox limit? I have been forwarding anything over 100K to another account. What are 5000 people still using grex for if not mail?
Assuming the reap was the standard "not active in the past 90 days" and that newuser was off for nearly 9 months, I'm amazed and delighted that we still provide service for 5000 people!! Now, for the sobering thought, how do we know how many of those are spammers?
| Last 40 Responses and Response Form. |
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss