|
Grex > Coop11 > #239: brainstorming solutions for the full disk problem |  |
|
| Author |
Message |
| 25 new of 203 responses total. |
valerie
|
|
response 123 of 203:
|
Apr 6 20:20 UTC 2001 |
This response has been erased.
|
pfv
|
|
response 124 of 203:
|
Apr 6 20:26 UTC 2001 |
I sympathize, Val.. I really do.. You'll not GET "a solution" and
the way things are arranged, no one that WOULD help you COULD help
you - even tentatively. Between calcification and perms, a potential
solution can't even be tested.
I'm sorry, I wish I could help.
|
i
|
|
response 125 of 203:
|
Apr 6 22:42 UTC 2001 |
Was there any plausable way to implement my very-limited-newbies/not-so-
limited-old-timers idea (back around #51)? (Did it have enought basis in
fact to be really usable anyway?)
If we could get a fair fraction of a cure by putting a few more of our
2GB drives on line, and the too-limited time of the few staffers who grok
the hardware well enough to do that is the problem, could we organize a
volunteer drive to mow the lawns, watch the kids, wash the cars, etc. of
those staffers while they get the extra drives on line?
If Technical Solution X would sufficiently cure the problem, but a lot of
folks are highly opposed to X, do we need to go to a segregated system
where users pick between partition /X_but_usable and /non-X_often_full
and live with their choice?
|
tenser
|
|
response 126 of 203:
|
Apr 7 20:24 UTC 2001 |
Regarding #120; Sorry I haven't had time to respond recently; it's that
great American passtime of chasing the all-mighty dollar. :-)
I don't understand your example using hard links.
In particular, it's easy to see how the *original* problem worked
(ie, I have two users, mary and joe; joe sees that Mary's maildrop in
/var/spool/mail doesn't exist and so does ( cd /var/spool/mail && ln joe
mary ), causing all sorts of merry problems [pun intended]). However,
when moving mail to the user's home directory, this attack no longer
works because joe *cannot* write to mary's directory to create a hard
link to his maildrop. Sure, he can hardlink *her* file to his own, if
her's already exists, but what does that get him? His mail goes to her?
I'm not sure that would even work....I don't know for sure, but I'm
willing to bet that even sendmail isn't stupid enough to deliver into
a file if it's neither owned by the intended user nor owned by root.
He certainly can't read it, since the permissions in the inode don't
give him that access (recall that two files that are ``hard linked'' to
each other are just two directory entries that point to the same inode,
and that permissions are stored in the inode).
So, either Joe's email would go to mary (and what would the point of
that be? The only malicious thing I can think of would be a denial
of service attack, but there are other ways of doing the same thing),
or Joe's email would just bounce (again, only really good for denial
of service, this time against grex itself, but there are other ways of
doing the same thing).
Regarding NFS vs. AFS.... I was not aware that AFS does encryption;
does OpenAFS? Can you point me to a documentation reference so that I can
turn it on? :-) Certainly, there are features in AFS that don't exist
in NFS (in particular, the AFS caching model remains superior to that of
NFS) but then they're not the same product, and there are ``features''
in NFS that aren't in AFS (paying attention to the Unix permission bits,
for instance).
You bring up good points regarding locking, semantics, and legacy
support. Yes, any network filesystem must have different semantics
than the ``normal'' Unix file model presents. It's not clear that one
must teach *every* program how to deal with that, though, especially if
compatability hooks are provided (eg, an fsync(2) call that ``does the
right thing''). Indeed, even under Unix, just beause write(2) returns
doesn't necessarily mean that my data is on disk. I must still fsync(2)
or close(2), and applications should already be written assuming as much.
The jump from that to a network filesystem (as long as it has decent
locking) isn't really that big for most code.
Regarding reliability for mail and network filesystems, it would seem
that on a system like grex, the best thing to do would be deliver
mail onto the file server. If using NFS, you end up dealing with the
traditional FFS semantics for reliability by doing so. It's worth
nothing that mail is weird; it's really something that needs ACID style
transactional semantics, but the Unix file model isn't really robust
enough to provide that. Certainly, I don't know any DFS's that are.
Yes, the concepts of root and setuid become particularly hairy when
extended over the network, but both concepts are flawed. root was an
architectural necessity at the time Unix was written, and setuid was
a workaround for limitations in the overall Unix model (in particular
the lack of per-process file namespace manipulation and a complete form
of IPC; half-duplex pipes aren't really good for much other than simple
coroutine sequences). It's worth noting that in the follow on systems
that have been developed at Bell Labs, both concepts have been eliminated.
In the modern Unix model, we now have decent IPC and ways to work around
a lot of the root issues, and indeed, network filesystem access provides
hooks (all though of a rather limited nature) to weaken or eliminate
the concepts of root and setuid.
The only other thing I have to say is that I disagree with the idea that
grex is unique. Grex represents the union of several things, which when
combined may be somewhat unique, but the discrete components are not.
In particular, the idea of untrusted access by possible miscreants is seen
by any university that uses Unix. Many have been using networked Unix
systems in these environments for decades now. Another thing is high I/O
requirements. Most database servers running Oracle under Solaris see 10
times the load that grex sees with few problems. Many of these also exist
in untrusted networked environments (consider a populare database-backed
web site). I think it's a mistake not to leverage the wisdom that these
sites have gained over the course of years and apply it here. Indeed,
both the user load and the I/O load are somewhat light in comparison to
many other sites which have successfully employeed network filesystems
for an untrusted userbase for years, with few problems.
|
mdw
|
|
response 127 of 203:
|
Apr 9 02:39 UTC 2001 |
I haven't time to respond to all that just now, but one quick thing: I
*work* at a large university. Trust me, universities *can* kick people
out for doing bad things with computers, and the "big stick" model works
very well there. I suspect sites that use oracle in production don't
order bottom-of-the-line IDE drives.
|
tenser
|
|
response 128 of 203:
|
Apr 9 04:46 UTC 2001 |
Regarding #127; Yes, but I used to work for a large university as well.
(In fact, a football rival of yours, but I digress....). Yes, you can
kick some students out, but not all of them, and it's better to assume
evil rather than good in such a setting. I certainly know that it's
easier to do so than to clean up after some overzealous undergraduate
trying to impress the ``cute girl'' in CS 101.
Is grex ordering bottom of the line IDE drives now? I never suggested
such a thing, though someone else did. In fact, I argued against it.
btw- you'd be surprised what some folks run Oracle on in production....
Hmm, on second thought, perhaps ``mortified'' would be a better word.
But the point is that it's just not accurate to assume apriori that grex
is in some way unique with respect to load or user behavior profile.
|
mdw
|
|
response 129 of 203:
|
Apr 9 04:58 UTC 2001 |
I see you aren't done sweating. Yes, it's true, *in theory* you
shouldn't be able to hardlink *to* other people's directories. "Should"
is an important weasel word there; there are people who carelessly
permit their home directory 775 or even 777, and there is the
ever-present trojan horse (granted, there are other more obvious
problems with trojan horses.) With symbolic links, it's possible to have
more "fun"; exactly what depends on how other things are implemented.
For instance, does the mail delivery agent run as root? Or as the user?
In what order and using what calls are the file mode/perms checked?
There isn't an atomic way to open a file and avoid symbolic links in
Unix, so whatever set of calls are used, there's probably some trickery
that can be done. Also, I never said it was impossible to do all this
right, just difficult. The real problem is what we have here is a
"complex" problem, and in the security world, "complex" is a bad word.
|
tenser
|
|
response 130 of 203:
|
Apr 9 05:44 UTC 2001 |
Regarding #129; if I chmod my directory 777 or even 775, I have far
bigger problems than someone hard or symbolic linking my mail spool file
somewhere else. The attack you describe requires this as a prerequisite
for working, but at this point, I'm already compromised.
No, I'm not done sweating, but then, I never really began sweating.
That's because extremely qualified people have spent quite a bit of time
analysing this problem already, and they have done the sweating for me.
Those people have determined that it's perfectly safe to deliver into
$HOME, and indeed, that doing so is safer than delivering into /var/mail.
Some of them even offer cash rewards to those who can prove otherwise.
If that's not intellectual honesty, I don't know what is.
On the other hand, vague rumblings about hard links and soft links and the
ordering of system calls are far less convincing, since the techniques
for doing all this securely have been well understood for a decade.
These remarks are especially unconvincing if they are predicated
upon users doing something which completely compromises themselves
as a prerequisite. They become nearly inconsequential when put into
the context of, ``all the hard work has already been done for you.''
Yes, email is complex because of all the security domains it crosses,
but I really don't understand why you seem to continue implying that it
hasn't been done already, and with a demonstrated good security track
record, to boot....
btw- the point of going to home directory delivery is to avoid running
things like delivery agents as root, in addition to freeing up space.
One other thing I'd like to point out is that the large, monolithic,
setuid root program that grex currently uses to deliver mail has been
demonstrated over the course of many years to have a far worse track
record with respect to security than almost any other commonly used
software package on the net. It seems to me that, in the interest of
security, one ought to be seeking to replace that as soon as possible
instead of worrying about largely theoretical attacks against already
compromised users.
http://www.qmail.org/
http://www.postfix.org/
(btw- Why do I feel like I'm on USENET all of a sudden? :-)
|
mdw
|
|
response 131 of 203:
|
Apr 9 05:56 UTC 2001 |
Actually, grex doesn't use sendmail to deliver mail.
|
tenser
|
|
response 132 of 203:
|
Apr 9 16:00 UTC 2001 |
No, true, sendmail isn't what actually writes to the user's mailbox,
that's another horribly insecure piece of code (unless you've really
hacked mail.local). But then I meant ``deliver mail'' in the all
emcompassing sense.
But semantic nits doesn't mitigate the security problems that running
sendmail presents.
|
gull
|
|
response 133 of 203:
|
Apr 9 17:37 UTC 2001 |
This debate is interesting, particularly to someone like me who's
fascinated by security issues. But it doesn't really address the
problem.
Since quotas, automatic clean-up programs, and limits on file transfer
sizes have all been rejected as solutions, I'm going to suggest that we
simply reduce the amount of time an account can be idle before it's
reaped. Since most eggdroppers probably give up on Grex after realizing
eggdrop won't work here, this would reduce the amount of disk space
consumed by causing their accounts and the associated files to be
removed more quickly.
I'm not sure what the reap time is currently, but 60 days with no
activity is pretty common on free web sites. If you're concerned about
inadvertantly reaping college students during summer vacations, etc.,
there are two options: Allow people to request that they be exempted
from reaping, or make the reap time longer for people who have logged in
more than, say, 10 or 20 times. I'm guessing few eggdroppers stick
around that long, though it'd be interesting to research that. Maybe
pfv could check, since he seems to enjoy locating this stuff.
I'm less concerned about things like stored mail. Those are reasonable
uses for the system. If we get rid of most of the abuse problem, and we
still have space problems, then it's time to add more disk. But we need
a better way to manage the abuse problem than staff manually going
through the disk.
|
scott
|
|
response 134 of 203:
|
Apr 9 18:17 UTC 2001 |
Reap time is something like one week if an account is only ever logged into
once.
|
steve
|
|
response 135 of 203:
|
Apr 9 23:38 UTC 2001 |
Reap time is after 90 days of inactivity, OR if the account was used
only in the first 24 hours after its creation, it can be reaped after 21
days.
|
mdw
|
|
response 136 of 203:
|
Apr 10 05:07 UTC 2001 |
We really hacked mail.local. Actually, *I* really hacked mail.local,
hence my lack of enthusiasm for changing horses. If we were doing this
"from scratch" today, it's very likely sendmail wouldn't be our first
choice. Back when we switched *to* sendmail, it really was the best
choice (um, you do know why we switched *to* sendmail, don't you?) At
this point, sendmail (now sadly multilated from its original form) seems
to be serving people well enough functionally speaking. I think if
nothing else, we can agree that switching from sendmail to something
else is not going to help the home directory disk usage problem in the
slightest, which means it's not really relevant to the topic at hand
here.
I had a look at the kernel config. So far as I can tell, we'd need to
build a new kernel that knows about a 2nd controller, but the controller
should then be just a drop-in solution. We might be able to install up
to 4 controllers, if we wanted. I'm not sure whether we have any
cabling issues, but I believe we have enough spare disk enclosures to
have more drives (even 3.5" drives, if we had any.)
|
tenser
|
|
response 137 of 203:
|
Apr 10 23:03 UTC 2001 |
I assumed you switched to sendmail to get rid of the abomination that
was sendmail.mx. Ahh yes, one still sees it on the disk, all though
chmod'ed to 400 now. I myself escaped on at least a couple of systems
via the Zmailer, smail, or MMDF routes. The latter being, as Henry
Spencer might put it, ``a particularly revolting invention of satan.''
(Actually, I never throught it was that bad, but it was weird.)
I disagree that you wouldn't save any disk space, as you could put
mail drops in user home directories at that point (at the expense of
many wasted feet of backup tape), which is what the whole argument was
about, anyway. But I, too, am tired of discussing it, so it's probably
best to just agree to disagree. :-)
|
i
|
|
response 138 of 203:
|
Apr 11 00:21 UTC 2001 |
Ignoring for the moment the really-cool-to-read technical discussions,
are there any semi-decent partial solutions to the original problem in
the offing? If so, what are they? What's between us and having them
implemented?
|
tenser
|
|
response 139 of 203:
|
Apr 11 04:11 UTC 2001 |
Regarding #128; Walter, it seems that the problem is creating a net big
enough to let the legitimate users come in and feel comfortable, but
small enough to catch the IRC weenies, thus encouraging them to go away.
It seems that a lot rides on stopping the incoming flood of, eg, eggdrop,
but we really don't know.
Let me ask this. Valerie, you said you spent all day the other day
cleaning up grex's disks.... What did you actually delete? Knowing that
gives us the basis for figuring out where to best target our resources.
Also, does anyone know how this stuff actually gets on the system? Ie,
is it via FTP, email, SSH, or something else? Do people frequetly use
lynx to download eggdrop or other goodies? (Don't worry, I'm not going
to suggest making lynx go away... :-)
If a lot of the problem is junk coming in via FTP, then perhaps one thing
to do would be modify the FTP daemon so that users can't transfer files
larger than a certain size (say, 30-50KB) on the first day that they
have an account. *After* the first or second day, the limit disappears.
The idea here is that legitimate new users have enough access so that
they can bring over personalized dot files or programs they've written
or what have you, but eggdrop and friends is too big to fit through
the hole. For legitimate users, they'd probably never even notice.
One the other hand, the one-off IRC weenie looking for an easy kill will
get discouraged and split, probably forever, but without leaving behind
large eggdroppings.
|
mdw
|
|
response 140 of 203:
|
Apr 11 08:02 UTC 2001 |
Actually, we went from sendmail, to smail, *back* to sendmail. smail
turns out to have really *bad* behavior in the face of congestion, and
was generally less efficient. Evidently most places are used to a
considerably more favorable machine/user ratio. Sendmail, with its
early production history on VAX-11/750's and other old big slow
expensive hardware, turned out to be better adapted to our needs.
While I wouldn't say that we've done a *lot* of experimentation with
eggdrop users, there is some evidence that these people know they are
doing something "bad", and that they're willing to try whatever it takes
to get around the annoying technical blocks. I haven't looked at this
recently, but at one point, I had the impression that a lot of the
vandal tools we found on grex were due to eggdrop users who tried
running it here, found that it didn't work, and decided to try to break
root so they could "fix" the problem. It's as if there were a FAQ out
there somewhere that included directions like "... find a free shell
provider. Install eggbot. If it doesn't work, become root and fix the
problem. ..." I should say the problem isn't *just* egg-drop; there were
a number of other "bot" packages (some of them sounding decidedly
non-friendly) that seemed to be just as popular.
|
russ
|
|
response 141 of 203:
|
Apr 12 02:13 UTC 2001 |
Re #139: Banning large files is definitely the wrong thing to do.
It's too easy to work around (use "split" and ftp the resulting
little files, uuencode them and send them by mail, etc.). What it
would mostly do is make the operations harder to find and clean up.
Having ftp, ssh etc. *watch for* and *log* files over a certain size
or having names matching certain patterns would be helpful. You
could schedule automatic cleanups of 'bot programs, or trigger alarms
if vandal tools were detected. (More subtle interventions, such as
carefully buggering the source of the vandal tools so that they'd
compile OK but core dump when they got to the right spot, are possible.
I doubt that anyone here has time to do that. I sure don't.)
In the case of people using vandal tools, reporting them to their ISP
and mentioning that they violated our terms of service might be helpful.
Nothing like getting someone kicked off the 'net to keep them from
bothering us for a while.
|
aruba
|
|
response 142 of 203:
|
Apr 12 14:46 UTC 2001 |
The idea of logging large transfers and then cleaning up after them later
appeals to me. If we waited, say, a day after large transfers before
deleting the files, That would give the vandal time to find out that
eggdrop doesn't work and go away. (It also gives legitimate users
leeway to transfer files for short periods, and delete them quickly.)
I think Russ is correct that if we try to stop transfers as they happen,
then people will find a way around.
I also thought tenser's suggestion that we get some hard data on what
files are problems and how they arrive was a good one.
|
pfv
|
|
response 143 of 203:
|
Apr 12 15:22 UTC 2001 |
re 142:
Why not leave the hogs there only until the user leaves or gets
disconnected? (Even 24 hours can be dangerous, when several of these
folks decide to spend the day downloading/extracting/compiling,
since it can fill the drive quite quickly).
|
aruba
|
|
response 144 of 203:
|
Apr 12 16:13 UTC 2001 |
Maybe we could start at 24 hours, and then move the deadline back if we have
problems.
|
pfv
|
|
response 145 of 203:
|
Apr 12 19:27 UTC 2001 |
I agree, but I wondered about the technical problems, (remember I
asked about 'logout' above?).
What triggers what and HOW and WHEN seems to be the foremost issue.
|
jared
|
|
response 146 of 203:
|
Apr 15 18:33 UTC 2001 |
Hmm.. 80G of disk for $234 including 2-day shipping.
nether.net is getting its disk upgrade this week :)
|
gull
|
|
response 147 of 203:
|
Apr 15 20:34 UTC 2001 |
"I buy the smallest disks I can find, because I just don't believe you
can reliably squeeze 80 gigs into such a small package." -- the author
of SpinRite, in a radio interview.
|