|
Grex > Coop11 > #239: brainstorming solutions for the full disk problem |  |
|
| Author |
Message |
| 25 new of 203 responses total. |
keesan
|
|
response 50 of 203:
|
Mar 9 15:19 UTC 2001 |
I have just realized that there is no longer any need to move files from the
web via my home directory. With the new improved Kermit it is possible to
download directly to my own computer. The older version required typing in
some settings to run it at more than glacial speed and to get around the Y2K
problem (it would not download files at all without doing something about the
date attribute). This way I will not be forgetting to remove 1M files from
'local disk'. Kermit is Option 2 for downloads from the web with lynx.
Steve Weiss - thanks again for the new Kermit.
|
i
|
|
response 51 of 203:
|
Mar 10 04:16 UTC 2001 |
If i understand correctly, most of the problem comes from just-using-us-
as-they-pass-through folks, and longer-term users (members or not) are
pretty well behaved. Could we have a two-tier systems that was newcomers/
non-newcomers (instead of non-members/members), with some pretty draconian
limits just on the former and a "we're very sorry, but due the the huge
number of people just looking for a system they can abuse in a hurry..."
up-front message explaining it?
|
gull
|
|
response 52 of 203:
|
Mar 10 05:18 UTC 2001 |
Maybe we just need to reap inactive accounts quicker. What's the current
time for that?
|
aruba
|
|
response 53 of 203:
|
Mar 10 18:45 UTC 2001 |
3 months. If Walter's idea could be implemented in some reasonable way, it
sounds promising. It gets around the whole "perks to members" problem.
|
pfv
|
|
response 54 of 203:
|
Mar 10 19:19 UTC 2001 |
Using a small program and dbopen() would work well and quickly: just
use the logid for the key, and a date-in-the-future for tha data.
Alternately, have a root-owned file in the user-space? Or, just base
it from his account-creation date?
Now, what would Tier Zero do, (assuming that Tier 1 is what guests
currently have)?
What programs would be affected? How would you affect them?
( I suspect you'd need a wrapper around lynx, w3m, etc - that checks
that database/those-files. What about file-attachments?)
Tiering sounds a lot like adding a new, more limited group akin to
member or guest.
Being promoted to "guest" would still leave the OTHER wastrels, but
I suspect most will have been winnowed out at Tier Zero.
Isn't there a global, system-defined .logout and .login file that
grex uses? On a per-user basis, it seems to me that cleaning right
up after a user would be very light on the system. Cleaning up or
denying them .login might also be of use, (not sure in the latter
case).
|
keesan
|
|
response 55 of 203:
|
Mar 10 19:46 UTC 2001 |
Wasn't there some discussion of dropping people in less than three months if
they had only logged on once? Say in one month instead of three? And maybe
only if they did not have any email.
|
scott
|
|
response 56 of 203:
|
Mar 10 21:13 UTC 2001 |
It already happens.
|
carson
|
|
response 57 of 203:
|
Mar 12 00:17 UTC 2001 |
(is there still an auto-reaper, or has that gone by the wayside?)
|
don
|
|
response 58 of 203:
|
Mar 16 04:26 UTC 2001 |
A completely unorthodox and probably not-very-useful suggestion: probably the
biggest consumer of space for non-malicious userrs is mail, right? Most
people's reason for not deleting mail is so that they can refer back to it.
Space could be readily freed up if these mail archives were removed. So, a
possible solution would be to activate POP mail services. With a local copy
of their mail, people would be much more inclined to remove their mail from
grex. There wouldn't be any increaesd bandwidth used, since either way the
mail goes from grex to the user. The only drawbacks I could see with this
would be reduced incentive to participate in the grex community (although
opening elm or pine at a bash prompt doesn't provide much incentive either)
and an increase in freeloaders looking for free POP mail (this doesn't seem
to be a problem with m-net's POP service, although they have a slightly
different situation over there). A small advantage would be freeing up ports
due to a lower number of people logging in to read mail.
|
pfv
|
|
response 59 of 203:
|
Mar 17 18:15 UTC 2001 |
I don't believe it's mail.. Unless you use procmail, the mail is on
a different drive.
Do we have any consensus, yet?
I've been tinkering with scripts, which I hate, and - as you would
expect - it's not a LOT of users.. There are a small number of users
that use vastly more space than they need; it's a small number of
users that simply refuse to read newuser; it's a small number of
users that download jpg's, jpegs, gifs, mpegs, and even mp3's. Exe's
and com files.. It's also "dead files" that never get reaped.
Someone feel like posting a member/staff overview?
|
russ
|
|
response 60 of 203:
|
Mar 18 14:35 UTC 2001 |
Are there any common elements between the people filling up the disk,
like downloading eggdrop? Or are they all different?
If there are one or a few common profiles we could do something to
address them. Just having ftp look for common eggdrop file names and
just making a symlink to a local copy would save tons of bandwidth and
disk space. If it saved a record of doing this, a daemon could follow
up later and remove the mess automatically.
These people have nothing to offer Grex. The faster we convince them that
Grex is a waste of their time, the faster we can get rid of their files.
I like the idea of keeping a "serious users" partition which hosts
long-term users, people who participate in BBS, and the like. Let
the newusers fill up the newuser partition. Better yet, put the
login names of offenders in the MOTD and hold them up to public shame.
Nothing like a few dozen nastygrams in the mail to wake people up.
|
gull
|
|
response 61 of 203:
|
Mar 18 18:52 UTC 2001 |
I'd bet most of the offenders never come back once they find out eggdrop
won't run. They probably never read their mail or the MOTD, and could
probably care less what the rest of us think of them. We should just reap
their accounts quicker. Anything else is just a waste of time, probably.
|
pfv
|
|
response 62 of 203:
|
Mar 18 19:11 UTC 2001 |
This is why, above somewhere, I asked about ".logout".
If, (at logout), they have ANY files we object to - remove them,
(f we REALLY object to them, just reap the account).
|
jp2
|
|
response 63 of 203:
|
Mar 22 14:36 UTC 2001 |
This response has been erased.
|
tenser
|
|
response 64 of 203:
|
Mar 30 08:09 UTC 2001 |
Regarding #63. Perhaps I'm being naive, but I'm almost willing to bet
that the people who are downloading lots of pictures, copies of eggdrop,
etc are not imaginative enough to name them ``addressbook.txt'' in
order to disguise the fact. Probably, they download eggdrop, realize
it won't run (if they can even compile it), and split leaving a pretty
hefty tarball in their home directory. What's more, said tarball will
be named something like, yup, you guess it: eggdrop.tar.gz.
A skulker script that runs nightly or on the weekends ought to be able
to look for certain file globs, and delete them if it finds them. (eg,
eggdrop*.tar.gz, nekked*.jpg, *.mp3, etc). If the list of acceptable
deletable file names is small enough, and widely published, I don't
see how many conflicts would arise in practice. The only one I can
realisticially think of is eggdrop_soup_recipes.tar.gz :-).
Another solution is to turn on disk quotes. Maybe not the best idea for
SunOS 4 (Although I used it back in the day; I don't recall it being
that bad, but I never tried it with 30,000 users. Others I knew did,
though, and didn't complain. :-) If the soft quota were set to, say, 1
meg, and the hard quota to, say, 10 megs, then it seems that those users
wishing to move transient files around would still be accomodated while
the goal of putting an upper `cap' on a user's total disk usage would
still be met. (Note for those unfamiliar with how Unix disk quotas work:
You can have two numbers; one is the ``hard'' quota which you are not
allowed to violate. The other is the soft quota, which you are allowed
to violate for a short amount of time, but which becomes the hard limit
if it's violated for too long.)
Moving to, eg, Solaris could help make this a reality. While I think
that it's kind of cool that grex still runs SunOS 4, I just don't see
that being a realistic thing to do for much longer (read a few years
at the max).
In order to gather more meaningful data, I'd be interested in what
the output of, eg, ``quot -v /dev/sd<up> | sort -nr | fgrep -vf \
list_of_staff_accounts'' is. Could someone from the staff please post
the output of that command somewhere? (If it's felt this would be too
much of an invasion of privacy, then perhaps strip out the login using
awk or cut first). It would be interesting to see what percentage of
the user population accounts for what percentage of the space, and also
what percentage of the total disk space hasn't been accessed in the last
3 months. Also, what's the correlation between disk usage and activity
on the system? Can output of sa(8) of something similar be matched
up against the output of the above pipeline?
One other thing which might help is to move staff home directories to
their own filesystem. I say this because staff users have a legitimate
need to use largish amounts of disk space during the course of their
work (imagine compiling emacs :-), but it seems unfair for this work
to infringe upon general user space. Also, from my own days as a Unix
admin, it helped to have administrative users on their own FS in order
to prevent the possible damage from an ``oops.'' I won't elaborate on
that, as it brings back bad memories. :-)
Another thought for those using grex to stage files from the Internet: you
might find it convenient to consider downloading the files to a public,
but much smaller, filesystem such as /tmp/$USER, or maybe a new /scratch
or /var/stage or something. These directories can be policed pretty
heavily, and with the understanding that files left in them absolutely
will be deleted after a day or two. (I say it's convenient since the
burden for cleaning up can be left to a nightly script.... :-)
Finally, on a somewhat related note.... Might I suggest putting script
wrappers around the most popular MUA's that conditionally print out a
message saying something like, ``Hi, before we start the mail program,
may we suggest that if you're only here for the free email, you instead
use something like Hotmail or Yahoo!'s free email service? The load
email places on our server is very high... (etc)'', waits a few seconds,
and then starts the MUA? Printing of the message could be related to
the existence of an environment variable or file in one's home directory.
Eg, in /usr/local/bin/elm:
#!/bin/sh
[ -f $HOME/.email_only ] && cat /etc/email_only_notice && sleep 10
exec /usr/local/real-elm
or something similar. The ``in your face'' nature of this scheme might
help in convincing people that there are better places to get email....
The delay helps, since a >1sec delay has the effect of making people
lose track of what they were doing, thus making them do things they
might not normally do, like read text on the screen. :-) Those who
really need to read mail on grex can simply ``rm $HOME/.email_only''
to avoid the notice and the delay.
- Dan C.
|
carson
|
|
response 65 of 203:
|
Mar 30 08:53 UTC 2001 |
(wow. that's a lotta suggestion. kewl.)
|
gelinas
|
|
response 66 of 203:
|
Mar 30 14:08 UTC 2001 |
I don't like Swiss Army knives; they look good, but they try to do too much,
in my opinion. I feel the same away aboout most tools. So suggesting
HoTMaiL and its ilk isn't going to impress me. I want my kids to have e-mail.
Right now, I can't run a mail server in the house. So grex is the next best
thing. To the best of my knowledge and belief, they are not using picospan.
Eventually, they may. Discouraging them from using grex because they are
just reading mail seems to me short-sighted.
|
pfv
|
|
response 67 of 203:
|
Mar 30 14:33 UTC 2001 |
I suspect the general use of mail is not a problem, and I'm aware
that grex mail is size-restricted anyway, (although I think it gets
the whole thing and THEN complains about oversize).
Otoh, spam-mail MIGHT be a problem.. And, certainly, pine and elm
(and sendmail?) leave droppings around in many directories.
Further droppings are left via crazy web-users saving freaky "pages"
(the names are wild, and I suspect the content as well).
Downloading gifs and jp[e]gs and suchlike.. And, yes.. The infamous
eggdrops as well as ssh, bnc, pysbnc, irc, bitchx, mIRC and lord
knows what else. Further, it isn't JUST the tarballs, it's the
expansions.. the objects.. the useless compiles, hell the cpu wasted
FOR the compile..
I'm still waiting for a staffer to report if there is a global
.logout being run, or CAPABLE of being run.. Yes, a cleanup therein
is probably too little, too late - but it also means that the
staffers could waste less time, (and grex less cpu), by checking on
a per-user basis.
Perhaps "login" should fork a root process that checks 'df' and can
check 'last' for, ohh... the last 1000 lines? Just checking THOSE
user-spaces might manage to clean up most of the mess.
|
scott
|
|
response 68 of 203:
|
Mar 30 14:43 UTC 2001 |
Perhaps Grex should just stop allowing people to use the system; it would
certainly be easier to administer. ;)
|
carson
|
|
response 69 of 203:
|
Mar 30 15:17 UTC 2001 |
(that's a great idea, scott! it seems to work for M-Net.) ;)
|
pfv
|
|
response 70 of 203:
|
Mar 30 15:19 UTC 2001 |
<shrug> Could be, but that wasn't suggested anywhere else.
|
gelinas
|
|
response 71 of 203:
|
Mar 30 15:34 UTC 2001 |
(Thank you, Pete, for reminding me *why* my umask is set to 077.)
|
pfv
|
|
response 72 of 203:
|
Mar 30 15:39 UTC 2001 |
(You are quite welcome.)
|
tenser
|
|
response 73 of 203:
|
Mar 30 18:27 UTC 2001 |
Regarding #66. I'm sorry, but I don't quite follow what you were
saying.
- Dan C.
|
tenser
|
|
response 74 of 203:
|
Mar 30 19:23 UTC 2001 |
Regarding #67. It's been my experience that email can be a pretty big
resource hog. It's not just the disk usage, but also the memory, CPU,
and network bandwidth which is consumed, as well as the load on the
I/O system's bandwidth for spooling and generally moving data around.
Consider, for instance, what happens if grex goes offline for a few
hours and then comes back on.... The load associated with the resulting
``mail spike'' is nontrivial.
(Note also that I'm not advocating turning off email access; just,
by default, making a little more inconvenient.
Csh and derived shells, when invoked as a login shell, will run a
``.logout'' file when a user logs out if it exists in the user's home
directory and is readable. However, the user can ``turn this off''
by removing ~/.logout, and it's not entirely clear what would be in
it anyway. For instance, does it do a find and look for common ``junk''
directory and file names? What does it do then? Wouldn't a ``normal''
user be kind of bothered if they had to sit there trying to log out
for 5 minutes while something walks their directory tree and does a ton
of stat(2) calls? Do let it run in the background after the (On this
machine, I'm not sure that's unlikely even for a user with a relatively
small directory...). shell has exited, and risk it being killed by this
``robocop'' program, possibly leaving the user's directory in a terribly
inconsistant state?
No, it's much better to run such things in a controlled way, via an
administrative program invoked by cron.
As for having login fork and run a program that invokes df and last,
what exactly does that do? I'm confused. :-)
Any sort of skulker script which looks around the filesystem and deletes
things could look for directories named things like eggdrop* (in addition
to files) and remove them.
As for the CPU cycles lost in people compiling things that just won't
run on grex. Yeah, but what can you do about that, realistically?
You could disable the C compiler, but that seems overly harsh. Maybe a
solution would be to move it to a directory that's not ordinarily in the
user path (Solaris puts it in /opt/SUNWspro/bin, but then the compiler is
an add on product under Solaris), and then put a note in the FAQ saying,
``if you want to use the C compiler, put the following in your PATH:
....'' That might solve a lot of problems anyway, but is a pain.
|