|
|
| Author |
Message |
janc
|
|
Staff Meeting Report
|
Feb 14 17:16 UTC 2007 |
Staff Meeting - February 11, 2007
Present:
Jan Wolter
Steve Weiss
Steve Andre
Joe Gelinas
Things to do, in no particular order:
Zoneinfo update.
Daylight savings time starts three weeks earlier. If we leave in old
zoneinfo files, Grex's clock will be an hour off for three weeks.
For new files, upload the latest version of zoneinfo from the latest
OpenBSD-current release and install. Need to run zic to compile
zoneinfo files, which may not be entirely simple.
CVS Server.
Grexdoc is currently maintained on a CVS server hosted by John
Remmers. John wants to shut down this machine soon. Steve Andre
will set up a new grexdoc server in his office.
CVS Checkin.
Jan should check in latest changes to grexdoc.
Spam Filtering.
We'd be interested in trying to set up global spam filtering on
Grex, using spamd. Plan would probably be to discard definate
spam. There is some doubt about whether Grex's machine would be
able to handle the load of running Spam Assasin on every bit of
incoming mail, but it seems worth a try.
Optional Incoming Mail.
Probably a more effective way to reduce the resource drain of
incoming mail would be to make it option. New grex accounts
would not be emailable by default. There would be a command
which users could use to turn incoming email for their account
on or off.
The theory is that many grex users don't actually want to receive
mail on their grex accounts. If they all turned off mail, then
Grex would have much less mail to process.
Outbound Mail Approval Script.
Outbound mail is currently turned off for new account. There are
volunteers who would be willing to handle the process of approving
people who want outbound mail turned on. We need a tool that would
enable the mail-approvers to turn on mail for a user, basically by
adding their names to /usr/local/etc/outbound.
This needs to be implemented so that when an account is reaped and
a new account is created by the same name, then new account does
not inherit email approval from the old one.
Jan may do this, but won't mind if someone else does.
Mail to Staff.
Currently users who are not approve for offsite email can't send
mail to staff because staff is off-site. Need to fix this.
Outbound Mail Limit.
There was a plan approved by board to limit outbound mail to 50
messages per user per day. This could be done by exim script,
but possibly the ouutbound mail approval thing is easier to
implement and might suffice to solve the problem.
Outbound net access script.
Currently new accounts are created in group 1003, which does not
have outbound internet access. We want to be able to have a list
of non-root users move people into group 1002 so they can have
outbound internet access. Need a su-root script to do this.
Probably a task for Jan.
Fix rmuser.
The rmuser script dumps core, so we can't do reaps right now.
Reaping would be good.
Newuser changes
Dan Cross wants to make some fixes to newuser. Jan will
install these when they are available.
RAID and more Disk
There is some interesting in buying a hardware RAID controller
and a number of large fast disks to use with it. This plan
would vastly increase disk space and reliability. Cost might
be around $1600, but STeve will make better estimate.
OpenBSD 4.1 upgrade
OpenBSD 4.1 will be released later this year. We should
upgrade Grex to that.
Might be a good time to also do things like adding RAID disks.
|
| 118 responses total. |
janc
|
|
response 1 of 118:
|
Feb 14 17:20 UTC 2007 |
All the staff people at the meeting recognized that the biggest staff problem
right now is a lack of active staff. Too many of the current staff have too
much else on their plate, and many of us are not really very active. We
really all want to do more, but we aren't promising much. Still, we thought
it would be good to have a list of things that we think ought to be done, so
that the consensus building part of any staff task is at least partly out of
the way, and all we need to do is find someone to do them.
The day after the meeting STeve noticed problems with one of the disks.
We're going to need to replace that before it entirely fails, probably in
a shorter time frame than we could possibly implement RAID in. Luckily we
do have a spare disk.
|
keesan
|
|
response 2 of 118:
|
Feb 14 18:04 UTC 2007 |
Could grex reap mail accounts which have not been accessed for 3 months even
if the user is still active? Not including accounts that are forwarding any
mail that comes to them.
|
cross
|
|
response 3 of 118:
|
Feb 14 18:59 UTC 2007 |
Regarding #1; Once again, I can do staff work if needed.
Regarding #0; A couple of points. (a) OpenBSD upgrades are only supported
through point releases. Right now, we're running OpenBSD 3.8 on grex. 4.0
is current right now. To do a supported upgrade, we really should upgrade
to 3.9, then to 4.0, and then to 4.1.... It's a pain, but would be the best
way to approach it.
(b) With respect to CVS servers for grexdoc, I'd *really* rather see grexdoc
hosted on grex and replicated to other machines offsite by CVSup or anoncvs
or something similar.
(c) What does rmuser do that userdel doesn't? Userdel is an OpenBSD tool;
could we shoehorn it into the reap process?
(d) Mail to staff could be handled by putting an intelligent agent midway in
the process. Something like RT could be leveraged to good effect on grex and
solve this problem at the same time. http://www.bestpractical.com/.
|
glenda
|
|
response 4 of 118:
|
Feb 14 19:37 UTC 2007 |
I was also at the staff meeting, as a staff member.
|
tod
|
|
response 5 of 118:
|
Feb 14 19:47 UTC 2007 |
What's being done to recruit new staff?
|
cyklone
|
|
response 6 of 118:
|
Feb 14 23:54 UTC 2007 |
Todd read my mind. Jan got about halfway there, then stopped.
|
maus
|
|
response 7 of 118:
|
Feb 15 01:04 UTC 2007 |
Just curious where the 1600$ estimate came from. Are we planning to keep
going with SCSI drives? I know a first order aproximation based on one
vendor's webpage was barely over half that price, and drive prices are
slowly coming down. Or are we looking at a big-ass 8 drive array so we
get mondo space and redundancy?
|
nharmon
|
|
response 8 of 118:
|
Feb 15 15:02 UTC 2007 |
I've created an item in the garage conference for further
discussion/research into a RAID storage solution for Grex.
|
tod
|
|
response 9 of 118:
|
Feb 15 15:18 UTC 2007 |
re #6
I read "...lack of active staff..." "...not promising much..." "...all we need
to do is find someone to do them..."
So I'm kinda wondering if the Board is going to help or is staff being relied
upon?
Jan & Dan, you're both on the board. What say you about the recruiting
efforts for new staff members?
|
janc
|
|
response 10 of 118:
|
Feb 15 16:36 UTC 2007 |
I think I would classify myself as "inactive staff". I do not see my schedule
opening up for large amounts of staff work. I'm still hope to be available
for helping out with system upgrades and some light software development and
bug fixes, but even that is going to be inconsistant.
I recognize that we need regular staff. This includes especially people who
are local to Ann Arbor and can do things like reboots and hands-on system
work. I haven't the slightest idea how to recruit them.
Meanwhile, I've done some checkin of grexdoc, and have implemented a command
that would let a selected list of users enable outgoing mail for other users.
I'm told some people volunteered for this, but I don't know who they were.
|
janc
|
|
response 11 of 118:
|
Feb 15 16:56 UTC 2007 |
I think rmuser is really called zapuser. As usual, I've forgotten everything
about it, even though I wrote it.
I seem to recall that one of the points of it was that it was optimized for
removing large numbers of users in one fell swoop. Some of the alternative
tools I'd seen would rebuild the hashed password file after each user
deletion. When you are deleting 10,000 users at once, this is a very bad
idea. I don't know of userdel is smart about this or not. The manual page
seems to suggest that it takes only one user name as an argument, which
suggests that it probably would require 10,000 passes through the password
file, and 10,000 hash rebuilts (which take many minutes these days if my
recent experience with vipw can be trusted) to remove 10,000 users.
Looking through the source code, I see it also has a lot of other checks:
- won't delete users with uid's under 1000 (or whatever is configured)
- won't delete members or staff
- won't delete accounts unless home directory is in one of the usual
spots for grex home directories.
- won't delete accounts that are on the immortals list
The user directory deletion is done in a paranoid mode, su-ing to the user
before beginning the deletion. This a final failsafe to avoid traps where
a user subdirectory is swapped with a symlink to /etc between the time
zapuser checks if it is a symlink or a directory and the time that zapuser
cd's into it. Of course, we have other safeguards against this to, like
checking if the inode number of the directory we cd'ed into matches the inode
number of the directory we through we were going to cd into, but it never
hurts to be extra safe, and by running as the user we know we can't be fooled
into deleting anything we shouldn't.
The directory deletion code is also designed to be able to delete arbitrarily
deeply nested directories, something older versions of "rm -R" generally
failed at.
In addition to deleting the home directory, it deletes mailboxes
screen and layer files. I should probably add something to it to
delete people from the exim.outgoing mail file.
|
janc
|
|
response 12 of 118:
|
Feb 15 16:57 UTC 2007 |
My apologies to Glenda for omitting her from the list of attending staff
members. Relying on my memory is always a perilous thing.
|
janc
|
|
response 13 of 118:
|
Feb 15 23:06 UTC 2007 |
I've repaired the crash in zapuser. I have not yet taught it to scrub people
from the exim.outbound file.
Grex hasn't done a reap in a long time. I don't know that I know the correct
rules for reaping. I think it is:
Guest accounts which either:
- were created more than 3 weeks ago, and have never been logged
into, or
- have not been logged into for more than 90 days
If that's true, then about 48,000 of the 53,000 accounts on Grex are overdue
for reaping.
Reaping them would be a very good thing. Probably all those accounts are
receiving spam. Getting rid of them would greatly reduce the incoming mail
load, and make running a global spam filter much more likely to be possible.
Many of these accounts probably also have full mailboxes. Getting rid of them
might even allow us to increase the mailbox size limit for users who actually
log on sometimes.
The tools set for doing reaps should now be in place. However I have not
yet run one, because I don't know if the criteria above is actually still the
current criteria.
|
aruba
|
|
response 14 of 118:
|
Feb 15 23:18 UTC 2007 |
That seems like a high number of inactive accounts (and a low number of
active ones). Are you sure the method you are using to check for logins is
reliable, Jan? I gather that whatever finger uses has not been reliable
since we moved to OpenBSD.
|
cross
|
|
response 15 of 118:
|
Feb 16 00:09 UTC 2007 |
I think finger's output is reliable; Mark, what gives you the impression that
it is not?
The procedure for running a reap was to run the reap collection program (which
would generate the list of accounts to be reaped), then look at it to make
sure there were no `errors' (ie, staffers being deleted by accident), and then
run the actual reap.
|
gelinas
|
|
response 16 of 118:
|
Feb 16 02:16 UTC 2007 |
Thank you, Jan. I will undertake to perform a reap over the weekend. It'll
probably take me a little while to uncover the directions; at least I know
where to look. :)
One other check is for fairwitnesses that have been reaped; I vaguely remember
there being a separate tool for that purpose. It's also documented, in the
same place as the rest of the reap.
|
keesan
|
|
response 17 of 118:
|
Feb 16 04:03 UTC 2007 |
Thank you all.
|
janc
|
|
response 18 of 118:
|
Feb 16 04:13 UTC 2007 |
Followup to #11: I confirmed that if we used the standard OpenBSD userdel
to reap users, we'd have to run the program once for each user. It can't do
multiple users in one pass. That means the hash files would have to be
rebuilt once for each user. Currently, rebuilding the hash files takes almost
exactly 10 minutes. Multiple that times 48,000 users, and you have a pretty
good idea why we need zapuser. (OK, you can probably divide that by two,
since the rebuild times will get faster as the password file gets shorter,
so run time will only be about half a year.)
Zapuser now deletes from the outbound mail file.
Yes, I am concerned that 48,000 of 53,000 users have been selected for
deletion. It seems high. But every spot check I've done seems OK. I don't
see users with files modified after their supposed last login dates. I've
confirmed that http logins are correctly updating the last log file. If
anyone has evidence that the last login dates shown in by 'finger' and
'laston' are incorrect, I'd be interested in knowing.
I think we just don't have all that many active accounts.
Joe is planning to do a reap this weekend.
|
mcnally
|
|
response 19 of 118:
|
Feb 16 04:36 UTC 2007 |
If someone could do a backup before the reap, I'd have a warmer,
fuzzier feeling about deleting 48,000 users and their mail and
their files.
If we've got drives that may be developing problems it'd probably
be an excellent thing to have handy in any event.
I wouldn't think it would be a problem to bring over a firewire
or USB 2.0 enclosure with a cheap IDE disk in it and do a dump
of critical filesystems to it. Even a dump of live filesystems
would be better than no dump at all.
|
gelinas
|
|
response 20 of 118:
|
Feb 16 04:55 UTC 2007 |
Good point, Mike. This will probably delay the reap, but better a late reap
than a trashed system.
|
cross
|
|
response 21 of 118:
|
Feb 16 13:54 UTC 2007 |
Regarding #18; Ten minutes to rebuild the password hash on this machine seems
like a really, really long time to me. Hey, if userdel won't work, then it
won't work; worse things have happened. I'm a bit surprised that it doesn't
use the `-u' option to pwd_mkdb, which just updates a single record, instead
of the entire hash file (notice that changing passwords and adding users
doesn't take that long). But anyway it's a moot point.
I suspect that Jan is right: we just don't have that many active users.
|
aruba
|
|
response 22 of 118:
|
Feb 16 13:55 UTC 2007 |
Re finger: I used to use finger to decide if members who weren't responding
to my hails had disappeared from Grex. I'm afraid I don't remember details,
but I quit using finger because it gave me results that seemed wrong. I
remember asking about it (in agora?) a while back, and being told (I think
by you, Dan) that there was no good way to tell when someone had last logged
on.
It's possible I'm confusing the last-logged-in time with the
last-checked-mail time.
|
cross
|
|
response 23 of 118:
|
Feb 16 14:07 UTC 2007 |
Hmm, I don't remember that conversation. I'd say it might be the
last-checked-mail time that we were talking about, if anything; logging in
interactively or via backtalk or (maybe) via FTP will certainly will update
the lastlog file, which is what finger looks at to tell you when the last time
a user logged in was. But, finger's details about a person's email reading
habits aren't particularly useful, since people can forward email off of grex,
and lots of programs will change the timestamps on the mail spool file without
the user actually *reading* the mail.... (For instance, the `from' command
will modify the atime of the mail spool file.)
|
janc
|
|
response 24 of 118:
|
Feb 16 14:47 UTC 2007 |
You know, you may be right. I didn't actually try userdel. 10 minutes is
how long hte password rebuild takes after a zapuser, which does do a full
rebuild (even if only one user has been deleted). So I dunno. Maybe userdel
would be fast after all.
Some kind of backup would be a good idea.
Another possibility would be to run zapuser without the -d flag. In that
case, it doesn't delete the user's home directory, but stashes it in
/a/deleted. It always saves their passwd file line. However, zapuser always
deletes the mail file, so this isn't a great option if you serious expect to
want to restore the user.
We should backup.
|