|
|
| Author |
Message |
don
|
|
Solving the space problem
|
Dec 9 21:24 UTC 1999 |
The recent disk space crunch has not really been discussed in bbs. I would
like to instigate a forum here, and provide a few suggestions:
1) Take newuser offline. I know this will result in many people saying that
this contradicts grexes mission and its openness, but look at it this way:
we have over 26500 users. We can live without a few more for a week or two
if it means saving a whole lot of space.
2) If taking newuser offline is too radical, then just block newuser for
different areas of the world. Ie, one week stop new users from south america,
the next week europe, the next africa, the next asia... This will still allow
for new people while cutting down on the influx.
3) Ban files of over, say, .75 MB. Give people a few weeks' notice, then
delete em. This will get rid of most eggdrops and the like, and since nobody
should be using more than a meg anyways there won't be any morality problems.
It would be much easier than eggdrop sweeps and could be proccessed at an
off-peak time (I doubt it's going to cause too much of a problem when there
are only 20 people logged in.) You can nice it to some large number so it
wouldn't disrupt grex too much.
Make a shell script that tar's up people's bbs participation files, and then
untar when they run picospan. This would save up a huge amount of space. The
script I'm already using for this is:
tar -xzvf cf.files
rm cf.files
picospan
tar -czvf cf.files .*.cf
rm .*.cf
5) Get those new disks in.
|
| 73 responses total. |
gull
|
|
response 1 of 73:
|
Dec 9 22:10 UTC 1999 |
#2 won't work. There's no sure way to tell what part of the world someone's
coming in from just from their IP address. Reverse-lookups sometimes work,
but they aren't reliable either, and they'd add overhead.
The idea of compressing the conference participation files isn't a bad one,
though. I may use your script.
|
flem
|
|
response 2 of 73:
|
Dec 9 22:54 UTC 1999 |
Before doing anything as drastic as what has been suggested above, I
think it would be a good idea to get an accurate report on what it is
that is using all the disk space, so we have some idea what kinds of
things might actually help. Anybody on staff feel like commenting on
where the disk space is going?
|
don
|
|
response 3 of 73:
|
Dec 10 00:14 UTC 1999 |
Though I can't give you an indication of disk usage, /a and /c are purely
user files.
|
don
|
|
response 4 of 73:
|
Dec 10 00:18 UTC 1999 |
re 1, you can just run a symbolic link to my script if you want:
ln -s ~don/pspan pspan
|
orinoco
|
|
response 5 of 73:
|
Dec 10 00:52 UTC 1999 |
The first too seem a bit too drastic for my taste, but #3 (banning files of
a certain size) sounds plausible. What would the technical issues surrounding
that one be? Would it be as much of a processor load as checking constantly
to see if x user is over a certain disk quota? Would people just be able to
circumvent it by downloading eggdrop (or whatever) in a few chunks?
|
gull
|
|
response 6 of 73:
|
Dec 10 02:54 UTC 1999 |
Re #4: Thanks, but I don't link to other user's scripts or binaries. That's
just asking to be hacked, not that I'm saying you'd do that.
Now that I think about it, it's likely that none of the individual files
involved in compiling eggdrop are over 3/4 meg. You'd almost have to look
for subdirectories that were greater than that size.
Question: What happens when your size-checking script hits a really deep
directory tree created by some malicious user? Or a tree with a directory
that's a hardlink to a directory higher up? Will it loop forever? These
are real issues you have to think about when designing things like this.
|
prp
|
|
response 7 of 73:
|
Dec 10 06:06 UTC 1999 |
As I understand it, there is an additional disk in the pumpkin, just
waiting to be installed. This will be done ASAP, but most staff members
are busy with there personal lives.
Perhaps the problem here is the threshold used to determine when it is
time to add a new disk. That is rather than waiting for the disks to be
say 90% full most of the time and 100% full sometimes and people complaining,
disk space should become a top priority when the disks are 80% full.
|
mdw
|
|
response 8 of 73:
|
Dec 10 06:07 UTC 1999 |
You can't make a hard link to directories. It is possible to make
deeply nested directories and various other bad things. What happens at
that point depends on the tool. A sufficiently dumb tool can core dump,
which could provide the vandal an interesting opportunity to hi-jack the
tool (if he can construct a "filename" with executable code & addresses
at just the right offsets...) This is one of many reasons why it's
important to be careful about what's run as root.
So far as running tar & gzip each time to run PicoSpan - this would be a
bad thing to do system-wide. The first problem is a simple one of what
to do if something goes wrong. No disk space is an obvious problem for
which the script has no provisions. Another more serious problem is
that gzip in particular, but also gunzip, are cpu and memory hogs. Disk
space is a lot cheaper than more CPU or memory. Yet another issue is
that, for more users, running gzip et al is in fact a lose; most people
only have one participation file that is fairly small. Running tar &
gzip actually produces a larger file. A typical example is a 2444 byte
.agora31.cf file will turn into a 10240 byte tgz archive. Just for the
record, the block size on /a and /c is 8192 bytes, the fragment size is
1024 bytes. A difference in size that does not cross a 1K boundary will
not save any space at all.
A more interesting way to save space would be to fix pine so that it
writes out .pinerc's that don't have helpful comments in them. The
.pinerc that pine currently creates is about 10322 bytes in size,
stripped of all comments, there's only 1208 bytes of "real" data in it.
Pine is a notorious CPU hog but nevertheless, we apparently encourage
all new users to use it, and pine is far & away much more popular than
PicoSpan. Also, mail of various forms accounts for much more disk usage
than computer conferencing. At a shear guess, it's very probable that
unmodified .pinerc's account for more disk usage on grex than all the
participation files put together.
The right answer to *all* of this is simply to install more disk.
|
prp
|
|
response 9 of 73:
|
Dec 10 06:53 UTC 1999 |
A potential solution to the "Eggdrop" problem: Limit transfer into Grex FTP
access to "participating id's"; where a participation id is defined as one
that has entered responses in three separate twenty-four periods over the
last year, or had an email exchange with staff on why I would like to use
FTP, or something similar that's easier to implement.
|
keesan
|
|
response 10 of 73:
|
Dec 10 18:17 UTC 1999 |
Jdeigert and I occasionally use grex to download a file over 1M (for instance
DOS-based software for dealing with gifs and jpegs). We try to get the larger
files deleted within a day or two.
|
other
|
|
response 11 of 73:
|
Dec 10 18:31 UTC 1999 |
how about launching an elm education campaign and making elm the default
mailer on grex. it seems less resource intensive, and if the .elmrc files
are large we can do the comment remaoval there, too...
and the features elm offers are not insubstantial...
|
pfv
|
|
response 12 of 73:
|
Dec 10 18:44 UTC 1999 |
Why on *earth* would you use grex to DL *DOS* software?
|
gull
|
|
response 13 of 73:
|
Dec 10 19:04 UTC 1999 |
Re #11: Ugh. I used elm for a while. It's *nasty*. I particularly hate
its inability to deal with attachments in a non-brain-dead manner. It seems
to lack much of the power of Pine, and struck me as being kind of cobbled
together.
Re #12: It makes sense if Grex is one's only ISP.
|
pfv
|
|
response 14 of 73:
|
Dec 10 19:47 UTC 1999 |
Grex is not an ISP - that's commonly stated.
Sorry, sans' attachments, elm is primo. For attachments, pine is
certainly preferred.. Does this then imply that THAT is why all
these "people" use pine..? For attachments..?
(gee, that's sure a suprise *sigh* )
|
don
|
|
response 15 of 73:
|
Dec 10 20:53 UTC 1999 |
re 5, it can be done simply with the following command:
find . -maxdepth 7 -type f -size +750k -exec rm -f {} \;
basically, when this is run from /a and /c, it will search through 7 directory
levels (so it won't get messed up when some idiot nests a ton of directories)
for files (so it won't delete directories) of a size of greater than 750 kb,
and delete them. This is the simplest way to do it, although I'm sure
enhancements could be made to fix technical problems. I've been running this
on /a and have found 4 big files (2 core dumps, one of which was mine,m which
I promptly deleted, and 2 binaries in droy's directory). So it seems to work.
re 6, then you can just copy it. If 750k doesn't seem effecitve enough, it
can easily be changed. and the -maxdepth switch will protect against directory
trees.
I think there's also a way to add a little sendmail message, but my bash
programming skills are rusty and it would take me a while to figure out how
to have two exec commands within find.
|
flem
|
|
response 16 of 73:
|
Dec 10 21:41 UTC 1999 |
It seems like it wouldn't be too difficult to write a script that would
strip stock comments out of a .pinerc, while still leaving custom user
comments.
|
mdw
|
|
response 17 of 73:
|
Dec 11 04:21 UTC 1999 |
#15 - using "exec" with find in that fashion is dangerous - if you're not
sure why, then you shouldn't be coding find scripts to run as root.
People use pine because we tell them to. For instance, new users are
given a .mailrc that tells people to use pine instead.
|
gull
|
|
response 18 of 73:
|
Dec 11 04:23 UTC 1999 |
Re #14: I practically never get attachments here (except that stupid http
email stuff), but I still use Pine because it's easy to use and I'm used to
it.
|
don
|
|
response 19 of 73:
|
Dec 11 17:38 UTC 1999 |
re 17, yes, I understand why it would be dangerous -- the rm could go on a
rampage killing every 750+k file in sight. But that's what its *supposed* to
do in this case. None of the system-sensitive stuff is on /a or /c, so all
that will be killed would be large space wasters. And, like I said, that was
just a skeletal idea; of course some sort of technical change would have to
be made to have it work better. Take out the -f or change exec to ok if you
want.
|
mdw
|
|
response 20 of 73:
|
Dec 11 18:30 UTC 1999 |
Nope, you don't understand the dangers. Just for starters -- exec looks
for "rm" in the path -- if . is before /bin, then it's trivial for the
vandal to make find run his own program, which gives him instant root
access. Usually root has a path that doesn't have . in it, but it's
still good practice to say "/bin/rm" not "rm" in such a script.
Secondly, find does an "lstat" to look at entries, and later on does a
"chdir" to traverse directories. The vandal can play race games here,
and by rapidly renaming files, can probably trick "find" into escaping
out of /a and /c, and erasing all those big files we don't really need
in / and /lib, like vmunix, libc.a, etc. Now, "-xdev" will stop that,
but it's still possible for the vandal to remove any file he wants on /a
or /c by doing a variation on the race game, with some patience.
|
don
|
|
response 21 of 73:
|
Dec 11 19:47 UTC 1999 |
Okay, easy way to fix that... make exec call /bin/rm instead, and add the
-xdev. And as for vandals, I'm sure there are ways to fix that kind of hole
just as everything else on this system has been patched up for added security.
Perhaps there's a better script/program that could do this. I was just
offering an idea. But a vandal who wants to wreak havoc is going to wreak
havoc sooner or later anyway. That's why there are backup disks. Do you have
a better idea to fix the disk space problem?
|
devnull
|
|
response 22 of 73:
|
Dec 12 00:30 UTC 1999 |
Re #9: I'm inclined to suspect that just requiring people to send mail to
some alias, and having something automatically skim the mail looking for
keywords in some cases to approve ftp might work, or at least improve the
situation.
|
mdw
|
|
response 23 of 73:
|
Dec 12 07:27 UTC 1999 |
I'm one of the people who got to figure out how to restore some files on
/etc that got destroyed by accident. Are you *sure* you want to tell me
this is a trivial process?
|
i
|
|
response 24 of 73:
|
Dec 12 13:56 UTC 1999 |
Re: #21
- Grex has NO backup disks that i know of. Backup *tapes*, yes....but
tape backups are only done weekly, and recovering some files from tape
is probably an hours-long job for our all-volunteer staff.
- Grex has been pretty successful in disproving that "...is going to
wreak havoc sooner or later..." thesis. With anonymous free shell
accounts from almost anywhere in the world, c compiler, about 25,000
users, etc., conventional sysadmin wisdom is that grex should be a
doormat/piece of cake/etc. for vandals. Well, grex isn't - thanks
to a bunch of security experts on staff - who have good reason to be
paranoid about opening a little security hole while trying to fix
another problem.....
|