|
Grex > Helpers > #140: Grex System Problems - Spring 2005 | |
|
| Author |
Message |
| 25 new of 457 responses total. |
tod
|
|
response 206 of 457:
|
May 3 15:14 UTC 2005 |
Yea, Dan. Its the DISKS, Dan. It gives worms to ex-girlfriends, Dan.
<pats self on back like bird flapping wings>
What's wrong with the RAID suggestion? It makes sense. If RAID won't fix
the problem then the OS needs to be replaced.
|
cross
|
|
response 207 of 457:
|
May 3 15:39 UTC 2005 |
This response has been erased.
|
cross
|
|
response 208 of 457:
|
May 3 15:41 UTC 2005 |
This response has been erased.
|
tod
|
|
response 209 of 457:
|
May 3 15:42 UTC 2005 |
re #207
Okay, I'll let the cat out of the bag: Staff had a report of a
security problem where a random, unauthorized users could run *cat*
on a tty device and see users connecting and typing their passords.
This was on Grex? Were the users notified that they should change their
passwords?
|
gull
|
|
response 210 of 457:
|
May 3 15:46 UTC 2005 |
FWIW, I think the security arguments for OpenBSD over FreeBSD are
overstated. FreeBSD gets the benefits of OpenBSD's code audits, because
a lot of code is shared. I also suspect FreeBSD has a larger installed
base, which tends to flush out driver problems sooner. I never ran into
them on my x86 machines. I've run into a few on my AlphaPC, but Alpha
is a minority platform that doesn't receive as much testing.
I'm not trying to weigh in on one side or the other here. I'm just
saying that the two operating systems are, from my perspective,
extremely similar, so I think in many ways it's an arbitrary decision.
Migrating from OpenBSD to FreeBSD, if you choose to do so, would
probably be fairly painless; much of the configuration is kept in the
same places. It still may not be easier than fixing what you've got,
though.
(Incidentally, keep in mind that OpenBSD's much-touted "only one hole in
the last 8 years" security claim applies only to *remote* exploits.
That suggests to me that security in a situation where you have local
shell users may not be their first priority.)
|
cross
|
|
response 211 of 457:
|
May 3 16:32 UTC 2005 |
This response has been erased.
|
tod
|
|
response 212 of 457:
|
May 3 16:36 UTC 2005 |
re #203
Did I gather correctly that an ordinary unprivileged
user can take Grex down with a fork bomb? Haven't we set per-user file,
memory, and process limits to reasonable values?
That's what I read from the explanation.
re #211
What's more, most of the security auditing that happens is poorly done
by amateurs. I wouldn't rely on it to run a bank.
I would hope you would want better security for the financial or healthcare
sector than for Grex, too. ;) At least with Grex, I'd hope we could find a
way to keep the system from crashing for several days at a time.
|
mcnally
|
|
response 213 of 457:
|
May 3 16:53 UTC 2005 |
Since I don't want to see this dissolve into a BSD-vs-BSD flame-war
death match, I'm going to try to subvert the conversation by proposing
some concrete suggestions that don't require immediate consensus on
the OS issue and don't require planning an OS upgrade or replacement
any time soon.
How about if we begin by:
Immediate, Critical:
1) Fixing the TTY security problem if it isn't already solved.
2) Make sure that sensible run-time limits are enforced and that
no ordinary user can cripple or crash the system with a fork
bomb.
Very Important:
3) Ensure that Exim version is sufficient to use recent SpamAssassin
integration features and begin testing system load under more
aggressive spam-filtering program.
4) Verify whether network driver support for RealTek NICs really is
affected by known bugs and add new ethernet card based on better-
supported chipset to system if so.
5) Consider setting up CVS or other versioning system to checkpoint
multiple backup copies of critical system files like /etc/passwd.
6) Research OpenBSD disk problems to see whether others are experiencing
similar crashes, in which case we should reconsider OpenBSD, or,
if not, we should consider the possibility of hardware problems as
a root cause.
|
nharmon
|
|
response 214 of 457:
|
May 3 17:42 UTC 2005 |
> I wouldn't rely on it to run a bank.
With most financial institutions, security is concentrated on the perimeter,
usually because the mainframe systems that run banking software use insecure
operating systems (Windows 2000 Datacenter comes to mind).
|
tod
|
|
response 215 of 457:
|
May 3 17:50 UTC 2005 |
re #214
With most financial institutions, security is concentrated on the perimeter
Actually, security is concentrated "in depth" as in at multiple layers like
a fortress with a moat, gate, guard tower, huge wall, etc
A firewall simply doesn't cut it anymore when you have GLBA worries, IT
productivity problems, password headaches, etc..
The least you should have are 2 firewalls with different flavors at the
perimeter of a financial institution but this is not a DMZ or IPS discussion.
The fact is, Grex had a security flaw and it wasn't reported to the users.
I'm disheartened at how this and the subpoena discussions have been buried
from the public discussion.
|
marcvh
|
|
response 216 of 457:
|
May 3 17:58 UTC 2005 |
Sure, and also financial security is based on the concept of transactions
and auditability. Grex doesn't have such beasts.
|
cross
|
|
response 217 of 457:
|
May 3 18:30 UTC 2005 |
This response has been erased.
|
nharmon
|
|
response 218 of 457:
|
May 3 18:32 UTC 2005 |
Re #215 - In Grex's defense, perhaps the Coop conference is a more appropriate
place to discuss Grex policies regarding notifying users. I've posted an item
that hopefully attracts some comments on the pros/cons.
Re #216 - It used to be that Banks didn't have to care very much about their
customer's names and addresses, etc...because this data was regularly bought
and sold to other companies. But the GLBA now requires us to safeguard this
information with the utmost diligence...to the extent that some banks will
fire employees for not locking their PCs and leaving them with customer
information still on the screen.
|
steve
|
|
response 219 of 457:
|
May 3 18:39 UTC 2005 |
We use a Broadcom 5702x nic.
Grex isn't a transaction system. I will agree that such a thing presents
more of a load than Grex does, but it also has hardware better suited to that
task. We've *listened* to the disks Dan. Honestly. There were times on the
Sun-4/670 that you could just sit there and hear them madly running around.
Perhaps I didn't say it well enough but OpenBSD may be significantly different
from SunOS in this regard; maybe it will be kinder on the disks due to caching
issues. I guess we'll see.
|
tod
|
|
response 220 of 457:
|
May 3 18:49 UTC 2005 |
Maybe IDE would be kinder than SCSI?
|
naftee
|
|
response 221 of 457:
|
May 3 18:51 UTC 2005 |
i use FreeBSD and Realtek and am pleased by the performance of both.
|
steve
|
|
response 222 of 457:
|
May 3 18:56 UTC 2005 |
I don't think the disk interface matters much. However, it has occured
to me in the last few minutes that we're swimming in disk compared to what
we had under SunOS: 256M there, and 1.5G here. That will eliminate swapping
and use about 75M ram for file caching which will also help.
I just changed the default limits in /etc/login.conf for maxproc to 32.
Maxproc-max was at 128.
|
steve
|
|
response 223 of 457:
|
May 3 19:06 UTC 2005 |
Now, as for sd0 having a problem, I just mounted it and tried copying
spwd.db to /dev/null. It failed. The message in /var/log/messages is
May 3 15:00:59 grex /bsd: sd0(ahc1:0:0): Check Condition on opcode 0x28
May 3 15:00:59 grex /bsd: SENSE KEY: Media Error
May 3 15:00:59 grex /bsd: INFO FIELD: 116647
May 3 15:00:59 grex /bsd: ASC/ASCQ: Unrecovered Read Error
May 3 15:00:59 grex /bsd: FRU CODE: 0xe4
May 3 15:00:59 grex /bsd: SKSV: Actual Retry Count: 134
May 3 15:01:00 grex /bsd: sd0(ahc1:0:0): Check Condition on opcode 0x28
May 3 15:01:00 grex /bsd: SENSE KEY: Media Error
May 3 15:01:00 grex /bsd: INFO FIELD: 116647
May 3 15:01:00 grex /bsd: ASC/ASCQ: Unrecovered Read Error
May 3 15:01:00 grex /bsd: FRU CODE: 0xe4
May 3 15:01:00 grex /bsd: SKSV: Actual Retry Count: 134
There are other errors on the disk as well. When I tried to dd the
entire disk I brought the system down, the day Joe said that newuser
was failing.
Have to go back and do work work now...
|
naftee
|
|
response 224 of 457:
|
May 3 19:12 UTC 2005 |
work work work
|
steve
|
|
response 225 of 457:
|
May 3 19:15 UTC 2005 |
work plod work
|
nharmon
|
|
response 226 of 457:
|
May 3 19:29 UTC 2005 |
plod no work,... abort, retry, or ignore?
|
cross
|
|
response 227 of 457:
|
May 3 20:02 UTC 2005 |
This response has been erased.
|
steve
|
|
response 228 of 457:
|
May 3 20:11 UTC 2005 |
Dan you are sliding off into fantasy land here. THE DISK HAS PROBLEMS.
It is as simple as that. If no one else saw the errors it was because no
one looked at /var/log/messages. I will point out that you could have
rummaged around there yourself to find errors. Sigh, I don't know why
I'm bothering to respond to some of your comments, but I will say that I
think Marcus and I know the difference between the sound of bearings
and the noise a disk makes when the heads are constantly moving.
|
tod
|
|
response 229 of 457:
|
May 3 20:17 UTC 2005 |
We're not worthy.
|
gull
|
|
response 230 of 457:
|
May 3 20:28 UTC 2005 |
Re resp:211: In my (admittedly limited) experience, banks run on Windows
and proprietary mainframes. The bank I worked for had *no* Internet
connections at all, though. All branch-to-branch connections were on
leased lines.
Re resp:213: #3 is a minor issue. AFAIK there are no major bugfixes in
recent versions of Exim. While versions earlier than 4.50 do not come
with Exiscan out of the box, it's easy to patch in, and the OpenBSD port
probably already includes a flag you can toggle to include it.
FreeBSD's does.
Re resp:219: Are we swapping? Maybe we need more RAM. Even if we're
not swapping, more RAM means more disk cache. RAM is cheap.
Re resp:227: Those messages pretty clearly indicate a hardware problem,
and if they were the result of an incorrect request on the part of the
driver we'd be seeing them on the other disks, too. I think you're
really reaching to blame OpenBSD here, which is unfortunate, because it
makes this look like a matter of religion on your part instead of a
technical argument.
If you're really convinced that OpenBSD is somehow causing the illusion
of a hardware failure on this disk, I suggest connecting it to another
system running a different OS and trying to access it. That should
settle the issue.
|