No Next Item No Next Conference Can't Favor Can't Forget Item List Conference Home Entrance    Help
View Responses


Grex Coop Item 11: Staff Meeting Report
Entered by janc on Wed Feb 14 17:16:07 UTC 2007:

Staff Meeting - February 11, 2007

Present:

   Jan Wolter
   Steve Weiss
   Steve Andre
   Joe Gelinas

Things to do, in no particular order:

   Zoneinfo update.
   
      Daylight savings time starts three weeks earlier.  If we leave in old
      zoneinfo files, Grex's clock will be an hour off for three weeks.
      For new files, upload the latest version of zoneinfo from the latest
      OpenBSD-current release and install.  Need to run zic to compile
      zoneinfo files, which may not be entirely simple.

   CVS Server.

      Grexdoc is currently maintained on a CVS server hosted by John
      Remmers.  John wants to shut down this machine soon.  Steve Andre
      will set up a new grexdoc server in his office.

   CVS Checkin.
   
       Jan should check in latest changes to grexdoc.

   Spam Filtering.

       We'd be interested in trying to set up global spam filtering on
       Grex, using spamd.  Plan would probably be to discard definate
       spam.  There is some doubt about whether Grex's machine would be
       able to handle the load of running Spam Assasin on every bit of
       incoming mail, but it seems worth a try.

    Optional Incoming Mail.
    
       Probably a more effective way to reduce the resource drain of
       incoming mail would be to make it option.  New grex accounts
       would not be emailable by default.  There would be a command
       which users could use to turn incoming email for their account
       on or off.

       The theory is that many grex users don't actually want to receive
       mail on their grex accounts.  If they all turned off mail, then
       Grex would have much less mail to process.

    Outbound Mail Approval Script.

       Outbound mail is currently turned off for new account.  There are
       volunteers who would be willing to handle the process of approving
       people who want outbound mail turned on.  We need a tool that would
       enable the mail-approvers to turn on mail for a user, basically by
       adding their names to /usr/local/etc/outbound.

       This needs to be implemented so that when an account is reaped and
       a new account is created by the same name, then new account does
       not inherit email approval from the old one.

       Jan may do this, but won't mind if someone else does.

    Mail to Staff.

       Currently users who are not approve for offsite email can't send
       mail to staff because staff is off-site.  Need to fix this.

    Outbound Mail Limit.

       There was a plan approved by board to limit outbound mail to 50
       messages per user per day.  This could be done by exim script,
       but possibly the ouutbound mail approval thing is easier to
       implement and might suffice to solve the problem.

     Outbound net access script.

        Currently new accounts are created in group 1003, which does not
        have outbound internet access.  We want to be able to have a list
        of non-root users move people into group 1002 so they can have
        outbound internet access.  Need a su-root script to do this.

        Probably a task for Jan.

      Fix rmuser.

        The rmuser script dumps core, so we can't do reaps right now.
        Reaping would be good.

      Newuser changes

         Dan Cross wants to make some fixes to newuser.  Jan will
         install these when they are available.

      RAID and more Disk

         There is some interesting in buying a hardware RAID controller
         and a number of large fast disks to use with it.  This plan
         would vastly increase disk space and reliability.  Cost might
         be around $1600, but STeve will make better estimate.

      OpenBSD 4.1 upgrade

          OpenBSD 4.1 will be released later this year.  We should
          upgrade Grex to that.

          Might be a good time to also do things like adding RAID disks.

118 responses total.



#1 of 118 by janc on Wed Feb 14 17:20:31 2007:

All the staff people at the meeting recognized that the biggest staff problem
right now is a lack of active staff.  Too many of the current staff have too
much else on their plate, and many of us are not really very active.  We
really all want to do more, but we aren't promising much.  Still, we thought
it would be good to have a list of things that we think ought to be done, so
that the consensus building part of any staff task is at least partly out of
the way, and all we need to do is find someone to do them.

The day after the meeting STeve noticed problems with one of the disks.
We're going to need to replace that before it entirely fails, probably in
a shorter time frame than we could possibly implement RAID in.  Luckily we
do have a spare disk.


#2 of 118 by keesan on Wed Feb 14 18:04:01 2007:

Could grex reap mail accounts which have not been accessed for 3 months even
if the user is still active?  Not including accounts that are forwarding any
mail that comes to them.


#3 of 118 by cross on Wed Feb 14 18:59:49 2007:

Regarding #1; Once again, I can do staff work if needed.

Regarding #0; A couple of points.  (a) OpenBSD upgrades are only supported
through point releases.  Right now, we're running OpenBSD 3.8 on grex.  4.0
is current right now.  To do a supported upgrade, we really should upgrade
to 3.9, then to 4.0, and then to 4.1....  It's a pain, but would be the best
way to approach it.

(b) With respect to CVS servers for grexdoc, I'd *really* rather see grexdoc
hosted on grex and replicated to other machines offsite by CVSup or anoncvs
or something similar.

(c) What does rmuser do that userdel doesn't?  Userdel is an OpenBSD tool;
could we shoehorn it into the reap process?

(d) Mail to staff could be handled by putting an intelligent agent midway in
the process.  Something like RT could be leveraged to good effect on grex and
solve this problem at the same time.  http://www.bestpractical.com/.


#4 of 118 by glenda on Wed Feb 14 19:37:15 2007:

I was also at the staff meeting, as a staff member.


#5 of 118 by tod on Wed Feb 14 19:47:58 2007:

What's being done to recruit new staff?


#6 of 118 by cyklone on Wed Feb 14 23:54:00 2007:

Todd read my mind. Jan got about halfway there, then stopped.


#7 of 118 by maus on Thu Feb 15 01:04:30 2007:

Just curious where the 1600$ estimate came from. Are we planning to keep
going with SCSI drives? I know a first order aproximation based on one
vendor's webpage was barely over half that price, and drive prices are
slowly coming down. Or are we looking at a big-ass 8 drive array so we
get mondo space and redundancy? 


#8 of 118 by nharmon on Thu Feb 15 15:02:28 2007:

I've created an item in the garage conference for further
discussion/research into a RAID storage solution for Grex.


#9 of 118 by tod on Thu Feb 15 15:18:25 2007:

re #6
I read "...lack of active staff..." "...not promising much..." "...all we need
to do is find someone to do them..."
So I'm kinda wondering if the Board is going to help or is staff being relied
upon?

Jan & Dan, you're both on the board.  What say you about the recruiting
efforts for new staff members?


#10 of 118 by janc on Thu Feb 15 16:36:36 2007:

I think I would classify myself as "inactive staff".  I do not see my schedule
opening up for large amounts of staff work.  I'm still hope to be available
for helping out with system upgrades and some light software development and
bug fixes, but even that is going to be inconsistant.

I recognize that we need regular staff.  This includes especially people who
are local to Ann Arbor and can do things like reboots and hands-on system
work.  I haven't the slightest idea how to recruit them.

Meanwhile, I've done some checkin of grexdoc, and have implemented a command
that would let a selected list of users enable outgoing mail for other users.
I'm told some people volunteered for this, but I don't know who they were.


#11 of 118 by janc on Thu Feb 15 16:56:11 2007:

I think rmuser is really called zapuser.  As usual, I've forgotten everything
about it, even though I wrote it.

I seem to recall that one of the points of it was that it was optimized for
removing large numbers of users in one fell swoop.  Some of the alternative
tools I'd seen would rebuild the hashed password file after each user
deletion.  When you are deleting 10,000 users at once, this is a very bad
idea.  I don't know of userdel is smart about this or not.  The manual page
seems to suggest that it takes only one user name as an argument, which
suggests that it probably would require 10,000 passes through the password
file, and 10,000 hash rebuilts (which take many minutes these days if my
recent experience with vipw can be trusted) to remove 10,000 users.

Looking through the source code, I see it also has a lot of other checks:
  - won't delete users with uid's under 1000 (or whatever is configured)
  - won't delete members or staff
  - won't delete accounts unless home directory is in one of the usual
    spots for grex home directories.
  - won't delete accounts that are on the immortals list

The user directory deletion is done in a paranoid mode, su-ing to the user
before beginning the deletion.  This a final failsafe to avoid traps where
a user subdirectory is swapped with a symlink to /etc between the time
zapuser checks if it is a symlink or a directory and the time that zapuser
cd's into it.  Of course, we have other safeguards against this to, like
checking if the inode number of the directory we cd'ed into matches the inode
number of the directory we through we were going to cd into, but it never
hurts to be extra safe, and by running as the user we know we can't be fooled
into deleting anything we shouldn't.

The directory deletion code is also designed to be able to delete arbitrarily
deeply nested directories, something older versions of "rm -R" generally
failed at.

In addition to deleting the home directory, it deletes mailboxes
screen and layer files.  I should probably add something to it to
delete people from the exim.outgoing mail file.


#12 of 118 by janc on Thu Feb 15 16:57:34 2007:

My apologies to Glenda for omitting her from the list of attending staff
members.  Relying on my memory is always a perilous thing.


#13 of 118 by janc on Thu Feb 15 23:06:46 2007:

I've repaired the crash in zapuser.  I have not yet taught it to scrub people
from the exim.outbound file.

Grex hasn't done a reap in a long time.  I don't know that I know the correct
rules for reaping.   I think it is:

    Guest accounts which either:

      - were created more than 3 weeks ago, and have never been logged
        into, or

      - have not been logged into for more than 90 days

If that's true, then about 48,000 of the 53,000 accounts on Grex are overdue
for reaping.

Reaping them would be a very good thing.  Probably all those accounts are
receiving spam.  Getting rid of them would greatly reduce the incoming mail
load, and make running a global spam filter much more likely to be possible.
Many of these accounts probably also have full mailboxes.  Getting rid of them
might even allow us to increase the mailbox size limit for users who actually
log on sometimes.

The tools set for doing reaps should now be in place.  However I have not
yet run one, because I don't know if the criteria above is actually still the
current criteria.


#14 of 118 by aruba on Thu Feb 15 23:18:14 2007:

That seems like a high number of inactive accounts (and a low number of
active ones).  Are you sure the method you are using to check for logins is
reliable, Jan?  I gather that whatever finger uses has not been reliable
since we moved to OpenBSD.


#15 of 118 by cross on Fri Feb 16 00:09:27 2007:

I think finger's output is reliable; Mark, what gives you the impression that
it is not?

The procedure for running a reap was to run the reap collection program (which
would generate the list of accounts to be reaped), then look at it to make
sure there were no `errors' (ie, staffers being deleted by accident), and then
run the actual reap.


#16 of 118 by gelinas on Fri Feb 16 02:16:49 2007:

Thank you, Jan.  I will undertake to perform a reap over the weekend.  It'll
probably take me a little while to uncover the directions; at least I know
where to look. :)

One other check is for fairwitnesses that have been reaped; I vaguely remember
there being a separate tool for that purpose.  It's also documented, in the
same place as the rest of the reap.


#17 of 118 by keesan on Fri Feb 16 04:03:14 2007:

Thank you all.


#18 of 118 by janc on Fri Feb 16 04:13:31 2007:

Followup to #11:  I confirmed that if we used the standard OpenBSD userdel
to reap users, we'd have to run the program once for each user.  It can't do
multiple users in one pass.  That means the hash files would have to be
rebuilt once for each user.  Currently, rebuilding the hash files takes almost
exactly 10 minutes.  Multiple that times 48,000 users, and you have a pretty
good idea why we need zapuser.  (OK, you can probably divide that by two,
since the rebuild times will get faster as the password file gets shorter,
so run time will only be about half a year.)

Zapuser now deletes from the outbound mail file.

Yes, I am concerned that 48,000 of 53,000 users have been selected for
deletion.  It seems high.  But every spot check I've done seems OK.  I don't
see users with files modified after their supposed last login dates.  I've
confirmed that http logins are correctly updating the last log file.  If
anyone has evidence that the last login dates shown in by 'finger' and
'laston' are incorrect, I'd be interested in knowing.

I think we just don't have all that many active accounts.

Joe is planning to do a reap this weekend.


#19 of 118 by mcnally on Fri Feb 16 04:36:48 2007:

 If someone could do a backup before the reap, I'd have a warmer,
 fuzzier feeling about deleting 48,000 users and their mail and 
 their files.

 If we've got drives that may be developing problems it'd probably
 be an excellent thing to have handy in any event.

 I wouldn't think it would be a problem to bring over a firewire
 or USB 2.0 enclosure with a cheap IDE disk in it and do a dump
 of critical filesystems to it.  Even a dump of live filesystems
 would be better than no dump at all.


#20 of 118 by gelinas on Fri Feb 16 04:55:01 2007:

Good point, Mike.  This will probably delay the reap, but better a late reap
than a trashed system.


#21 of 118 by cross on Fri Feb 16 13:54:14 2007:

Regarding #18; Ten minutes to rebuild the password hash on this machine seems
like a really, really long time to me.  Hey, if userdel won't work, then it
won't work; worse things have happened.  I'm a bit surprised that it doesn't
use the `-u' option to pwd_mkdb, which just updates a single record, instead
of the entire hash file (notice that changing passwords and adding users
doesn't take that long).  But anyway it's a moot point.

I suspect that Jan is right: we just don't have that many active users.


#22 of 118 by aruba on Fri Feb 16 13:55:26 2007:

Re finger: I used to use finger to decide if members who weren't responding
to my hails had disappeared from Grex.  I'm afraid I don't remember details,
but I quit using finger because it gave me results that seemed wrong.  I
remember asking about it (in agora?) a while back, and being told (I think
by you, Dan) that there was no good way to tell when someone had last logged
on.

It's possible I'm confusing the last-logged-in time with the
last-checked-mail time.


#23 of 118 by cross on Fri Feb 16 14:07:12 2007:

Hmm, I don't remember that conversation.  I'd say it might be the
last-checked-mail time that we were talking about, if anything; logging in
interactively or via backtalk or (maybe) via FTP will certainly will update
the lastlog file, which is what finger looks at to tell you when the last time
a user logged in was.  But, finger's details about a person's email reading
habits aren't particularly useful, since people can forward email off of grex,
and lots of programs will change the timestamps on the mail spool file without
the user actually *reading* the mail....  (For instance, the `from' command
will modify the atime of the mail spool file.)


#24 of 118 by janc on Fri Feb 16 14:47:35 2007:

You know, you may be right.  I didn't actually try userdel.  10 minutes is
how long hte password rebuild takes after a zapuser, which does do a full
rebuild (even if only one user has been deleted).  So I dunno.  Maybe userdel
would be fast after all.

Some kind of backup would be a good idea.

Another possibility would be to run zapuser without the -d flag.  In that
case, it doesn't delete the user's home directory, but stashes it in
/a/deleted.  It always saves their passwd file line.  However, zapuser always
deletes the mail file, so this isn't a great option if you serious expect to
want to restore the user.

We should backup.


#25 of 118 by denise on Fri Feb 16 15:48:52 2007:

I know in the past I've tried to finger those in conferences that I'm a fw
of, to see who and how long its been since those users have checked into the
conf.  But it was taking forever and a day, so I finally got out of it. Though
I'm not sure how relevent that is to this current conversation. :-)



#26 of 118 by aruba on Sat Feb 17 01:22:22 2007:

Re #23: The conversation I was referring to was

  resp:agora56,4,87-106

Unfortunately all of Dan's responses have been scribbled.  I can't quite
reconstruct what was said, but it's clear I was confused by the answer, and
I'm still confused.  Bruce Howard said this in response 97:

It appears if you log in with a non-interactive shell, for example:
   "ssh grex.org bash -i"
no login record is made.


#27 of 118 by glenda on Sat Feb 17 03:02:21 2007:

STeve is planning on doing backups and, if possible, replacing the flacky disk
tomorrow.


#28 of 118 by cross on Sat Feb 17 04:19:54 2007:

Regarding #26; Yah, that was sort of for work.  Long story (and one I can't
really explain anyway).

Perhaps the discussion was for non-interactive shells.  E.g., when one does,
``ssh cyberspace.org ls'' and things like that (incidentally, that's what is
happening when one does ssh cyberspace.org bash -i...).


#29 of 118 by janc on Mon Feb 19 15:53:09 2007:

Well, STeve did a backup, and Joe did a reap (probably the first since we
moved to OpenBSD).  I guess we'll soon find out if this was a problem.  I'd
be surprised if there weren't at least a few people among the 48,000 or so
that we deleted that maybe shouldn't have been.  I did spend some time before
the reap adding people to the immortals list whom I thought ought to be there.


#30 of 118 by keesan on Mon Feb 19 16:05:23 2007:

That should fix /var/mail for quite a while.  Thanks.


#31 of 118 by gelinas on Tue Feb 20 02:56:12 2007:

Yes, it was the first reap since December, 2004.

Since /var/mail is down to 38%, it probably will be fine.  For a while.


#32 of 118 by keesan on Tue Feb 20 03:51:13 2007:

Did you reap a lot of mail accounts without any mail or spam in them?


#33 of 118 by keesan on Tue Feb 20 03:57:13 2007:

Why do some accounts have 20MB of mail in them?  For instance munkey, whose
mail account is dated Sept 18.  I thought we had a 1MB limit.  Is something
broken?  Munkey last logged on Feb 15 but may have abandoned the mail account
to the spammers.  Is there some way to tell if an account is being used to
forward mail, and if not, reap it after 3 months of disuse?


#34 of 118 by gelinas on Tue Feb 20 04:05:46 2007:

Re your first question: No idea; we don't track that statistic.

Re your second question: the quota was raised some time back.

As to your third question: What (other) people do, or don't do, with their
mailboxes is not really any of my, or your, business.


#35 of 118 by nharmon on Tue Feb 20 13:09:44 2007:

I agree with Joe. 


#36 of 118 by keesan on Tue Feb 20 15:48:50 2007:

Re question three, since new users are not being given mail accounts without
requesting them, and since many or most of the old users with mail accounts
probably are not using them for anything, would it make sense to reduce the
number of unused mail accounts of people who are not doing ANYTHING with their
mailboxes and did not want them in the first place.  Note the '3 months of
disuse'.  I was not suggesting keeping people from forwarding mail.


#37 of 118 by cross on Tue Feb 20 16:10:43 2007:

I think the reap process works rather well; we just cleared out over 40,000
accounts.  Unfortunately, we're not doing opt-in email yet.


#38 of 118 by keesan on Tue Feb 20 18:05:14 2007:

What is the new mailbox limit?  I have been forwarding anything over 100K to
another account.  What are 5000 people still using grex for if not mail?


#39 of 118 by cmcgee on Tue Feb 20 18:16:30 2007:

Assuming the reap was the standard "not active in the past 90 days" and that
newuser was off for nearly 9 months, I'm amazed and delighted that we still
provide service for 5000 people!!

Now, for the sobering thought, how do we know how many of those are spammers?


Next 40 Responses.
Last 40 Responses and Response Form.
No Next Item No Next Conference Can't Favor Can't Forget Item List Conference Home Entrance    Help

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss