You are not logged in. Login Now
 0-24   25-49   50-74   75-99   100-124   125-149   130-154   155-179   180-204 
 205-229   230-254   255-279   280-304   305-329   330-354   355-379   380-404   405-429 
 430-454   455-457         
 
Author Message
25 new of 457 responses total.
mcnally
response 155 of 457: Mark Unseen   Apr 27 00:35 UTC 2005

 I believe that reducing the amount of incoming Spam will dramatically
 reduce the demand for mail storage space and finding ways to block 
 some of it will be an important step towards a sustainable mail service.
paull
response 156 of 457: Mark Unseen   Apr 28 19:07 UTC 2005

(This is actually davel, using paull's account to post this.
You'll see why in a moment.)

This is very, very strange.  Grex has apparently been down now for a couple
of days.  I can't connect to it at all via the internet.  Grace (gracel) has
been trying to dial in.  Yesterday she got no connection.  Today she
was able to connect, but it kept telling her that her login was incorrect.
I was able to duplicate this.  But then she tried it with Paul (paull)'s
account, & it let her in.  I have, obviously, reproduced this behavior,
too.

Now comes the really weird part.  Wondering if somehow Grace & I had
gotten deleted from the password file, I tried the following:
   egrep '(davel)|(gracel)' /etc/passwd
   davel:*:2681:1002:Dave Lovelace:/a/d/a/davel:/bin/bash
   gracel:*:2731:1002:Grace Lovelace:/a/g/r/gracel:/bin/bash

But if I scan for paull (to which I'm logged in right now, remember)
or for my other son, Jon (kingjon), it finds nothing in the password
file.  But I can log in as paull, even though there's no entry in /etc/passwd.

I see that there is no /a mounted (& gracel's & my home dirs in /etc/passwd
are under /a), that logged in as paull (no /etc/passwd entry) I find myself
in what looks like Paul's home directory, but it's /grex/a/p/a/paull -
and that /grex/a/d/a/davel and /grex/a/g/r/gracel do exist.  I suspect
that this may have something to do with the "incorrect login" messages.
gelinas
response 157 of 457: Mark Unseen   Apr 30 08:19 UTC 2005

On Wednesday, April 27, 2005, STeve Andre wrote, in a message:

}  Someone did something to use up all the CPU.  You can start an ssh telnet
} or ftp session but nothing ever starts so its brain dead at the moment.

Attmepts to reboot the machine failed.  When we did get the machine to
reboot, we discovered that "the pw database's in a totally inconsistent
state.  /etc/passwd had about 1137 entries in it, and the database files were
of different lenghts.  This means the passwd system was in chaos, . . . "
(STeve, in a message dated April 29, 2005).  Which is why some people could
log on, but others couldn't.

At or about 02:42 this morning (April 30, 205), STeve wrote: " Grex is up at
the moment.  The damage done with the accounts can be fixed this weekend or
not, but it won't affect all the other accounts.  Better to have Grex up now
for the majority of people on Grex."

There appear to be some problems with newuser, so it has been turned off
until the problems can be investigated and repaired.  The web newuser program
is also disabled.
russ
response 158 of 457: Mark Unseen   Apr 30 11:41 UTC 2005

As long as newuser is down anyway, it's time to splat the trolls.

I want to note here that NOBODY updated the hvcn status page nor
the main Grex page with any information whatsoever since Wednesday.
happyboy
response 159 of 457: Mark Unseen   Apr 30 14:56 UTC 2005

splat the trolls = remove the ribbon
twenex
response 160 of 457: Mark Unseen   Apr 30 20:45 UTC 2005

I pinged grex last night to see if it was running; apparently i left the ping
command running all day. Apologies if it caused any problems.
gelinas
response 161 of 457: Mark Unseen   Apr 30 22:18 UTC 2005

It did get updated at 23:00 or so last night.  I'll leave it that way a bit
longer.
naftee
response 162 of 457: Mark Unseen   May 1 03:51 UTC 2005

the grex conference on m-net was updated in a satisfactory and timely manner.
steve
response 163 of 457: Mark Unseen   May 1 05:48 UTC 2005

   I'm baack. ;-)

   I wish things had gone more evenly over the last three weeks.  It's
been alternately boring and too exciting lately.  I think things are
evening out, thankfully.

   Grex went down on Tuesday because of what I believe to be a fork
bomb, which pretty much absorbed all the system.  Some will remember
that the start of a telnet or ssh session worked and then nothing.
I think that was the damon responding to the socket connect but then
not being able to do anything else.  During that time there were
people trying to create accounts, primarily via the web newuser, and
pretty much all hell broke loose internally.  The new accounts were
created in the /etc/passwd file OK, but the logging of them is just
totally bizarre.  There are multiple entries for many accounts,
with incorrect data associated with them.  I'm still trying to sort
all that out.

   Once I got over to Provide.net it became clear that the passwd file
was really messed up.  Root's password wasn't what it should be, nor
were any others that I knew of.  Booting Grex into single user mode
revealed an /etc/passwd file of about 1100 accounts, not the 24,000
that it should have had.  Worse, the "master.passwd" main passwd file
and associated database (.db) files were messed up as well. The
master.passwd file had a different line count than passwd did, which
is horribly wrong.

   Hours before I went into the hospital on April 6th I discoverd
that the reason why Grex couldn't create accounts was because of a
bad disk spot that was underneath one of the passwd database files
and prevented accounts from being updated properly.  It was at that
point that I made a backup of the /etc directly.  That proved very
useful.

   At Provide.net I finally figured out the stuff about the weirdness
with the passwd files and copied my set over.  That then let me put
Grex back into multi-user mode with an /etc/nologin file so Grex
could start processing mail.  Once Glenda and I were satisfied that
the system seemed to be OK other than the passwd stuff, we left
Provide.

   As an aside, Grex now lives in the Attic.  This is the space onb
the second floor of Provide.net, and with its curved in walls it looks
like an attic.  It's a nice facility by the way, nicely air-conditioned
and several UPS's to feed all the computers.  The highest tech attic
I've yet encountered. ;-)

   Anyway, it was leaving Provide that was a mistake.  By bringing Grex
up and not doing anything special, I wound up destroying a good perfect
copy of the passwd file.  To understand this you have to understand that
Jan wrote a very nice set of shell scripts which make backups of Grex
onto our IDE disk on the /mirror partition.  This occurs every day so
we have a backup of things.  It's already come in handy for me, twice.
The problem is that it only has one level of backups.  The problem was
that I had the passwd stuff in place on /etc (my version from 4/6) and
the backup script ran.  That then overwrote the complete copy of passwd
stuff.  By the time I realized this might happen it was too late.  We
had a copy of the backup passwd in /mirror.

   That was stupid of me and I think I owe about 1,100 apologies to 
the people whose accounts are in limbo.  Their home directories are 
on the disk, but with no associated entries in the passwd file.

   It was then that I discovered the data in the nulogfile (the copy
of newuser runs) was crazy.

   So I'm now in the process of sorting that data out and figuring out
the best way to restore accounts.  If the password information in the
nulogfile is correct I think I can restore some? all? accounts.  We'll
see.

   After the problem on 4/6 I remembered that I needed to create the
scripts I wrote on Sunos, where copies of /etc/passwd were made every
6 hours.  I did not do this, much to my shame.  Had I done this none
of this passwd weirdness would have mattered.  I'm going to fix that
soon, and add something, namely teach a machine at work to sftp into
Grex once a day and grab a copy of master.passwd and group, so we'll
have an off site backup of this data.

   Making a few comments on stuff I've read earlier in this item:
We need to alter the size of some of the partitions on Grex.  In
particular the /var/mail partiton is too small.  We keep on running
into this because of the ever increasing amounts of spam we're getting

   Speaking of spam, it is my *hope* that I'll be able to get back
to working with Exim and spam assasin soon, and talk with other staff
about using it here.  Spam is a *complex* problem, one that has no
easy solutions, but we now have enough raw CPU power to start dealing
with it.  We've not done this for far too long.

   I'm not sure why we haven't ordered the new disk yet; I think
there has been some confusion over this.  I hope thats settled before
long.  This touches on the upgrade we need.  OpenBSD 3.7 has started
to ship, so I think we can upgrade sometime in May.  There are several
things that have been improved, including support for our network card,
which has caused one or two panics.

   So thats it for the moment.
keesan
response 164 of 457: Mark Unseen   May 1 15:54 UTC 2005

Could you set up a simple script that lets anyone who chooses to do so throw
out anything with an X-RBL warning?  This would eliminate about half my spam.
I keep a log.  And restore the 100K mail size limit somehow?  Or let people
choose to throw out anything over that size, with the same script?  People
who have tried to send me large attachments generally write me with a smaller
mail when they bounce and I explain to send elsewhere.  

How old are the hard disks that grex is running on now?  Can they be checked
regularly for bad spots or the likelihood of crashing?  We used some program
on a disk that Scott gave us, which told us it was in imminent danger of
failing (it had already slowed down a lot).  
drew
response 165 of 457: Mark Unseen   May 1 18:03 UTC 2005

I've heard of such a program from, I think, Symantec. Designed to deal with
the fact that modern hard drives have circuitry which attempts to hide the
fact that some of the disk goes bad thoughout its service life by moving data
around to good sectors and lying about the actual state of the media. I forget
what it's called.
richard
response 166 of 457: Mark Unseen   May 1 21:06 UTC 2005

STeve said:

"We need to alter the size of some of the partitions on Grex.  In
particular the /var/mail partiton is too small.  We keep on running
into this because of the ever increasing amounts of spam we're getting"

Is this not further evidence that grex needs to get out of the offsite 
email business?  grex should continue to offer email within the grex 
site for all users, but to send or receive email outside grex, you 
should have to be a paying member.  Grex doesn't have the resources 
anymore, if it ever did, to be a free email provider for the universe, 
and too many people have and will abuse Grex with its free anonymous 
email addresses, or worse use it for unethical purposes.  

In fact, I'd think it a good possibility that the FBI probably has Grex 
on a list of websites that could potentially be used to traffic 
terrorist information, because grex gives out free anonymous email 
addresses with an automated program and no verification. Do you keep 
making the partitions larger and larger, and keep taking risks that 
vandals or terrorists might be coming here, or do you finally 
say, "enough is enough, go use hotmail or yahoo for email!"
steve
response 167 of 457: Mark Unseen   May 1 21:56 UTC 2005

   The disks were bought around May 2003.  Checking disks for problems
is a difficult thing.  There is a system that IBM delevoped for its
own disks that has failed to catch more problems than it has found,
in my usage of it.  Grex munches on disks.  We might want to consider
replacing them every X years, but figuring out what X should be is
interesting.

   I don't see that needing to increase the size of /var as an
indication that we need to stop doing email.  Disk is the one thing
that has dropped in cost over everything else.  For about $250 we
could devote at 36G disk to mail and likely have enough disk for some
time.  Also, with spam filtering our disk needs will slow down.

   If Grex is on some government list, which I wouldn't be surprised
if true, it doesn't matter if we offer mail or not.  The fact that
we're an open system is enough. This is still America and the secret
police aren't quite here yet.  I'm not going to worry about it.
mcnally
response 168 of 457: Mark Unseen   May 1 22:31 UTC 2005

 re #166:  I propose that Richard create a list of services that
 he approves of or uses personally so that we can all know which
 other services should be eliminated..
naftee
response 169 of 457: Mark Unseen   May 2 01:09 UTC 2005

whoa, steVE!  are your lungs ok ?!
aruba
response 170 of 457: Mark Unseen   May 2 02:58 UTC 2005

I ordered the new disk form Leeron on Saturday.  There was some confusion
about how dead the old disk is, and whether we should send it in for
warranty repair.  The consensus was that we should send it in, but use the
replacement they send us back as a backup.

But, mostly, I've been draggin my heels because I've had other things to do.
So sorry for the delay.  We should have a new disk within a week.
keesan
response 171 of 457: Mark Unseen   May 2 02:59 UTC 2005

I have several friends who use grex ONLY for email and one of them was a paid
member (she may stop paying since she lives in Chelsea) but the others still
appreciate the email.  I told them they should not feel obligated to pay for
light use.  One of them also has an ISP with mail but got used to grex.  

STeve, were these disks bought new in 2003 and only put into service a few
months ago?  If so, would a warranty at least cover them going bad if we got
similar ones new now and they lasted under a year?
aruba
response 172 of 457: Mark Unseen   May 2 03:02 UTC 2005

Re #171: Sindi - yes, the disks are warranteed by Seagate for 5 years.  So
we should be able to get a replacement for the one that failed, as soon as
someone can pull it out of the machine and get it to me.
richard
response 173 of 457: Mark Unseen   May 2 04:05 UTC 2005

,
steve
response 174 of 457: Mark Unseen   May 2 04:13 UTC 2005

   The warranty is almost irrevelant.  Disks going down are a disaster for
any entity, and with Grex its even worse because of access and staff time
issues.  I try to optimize on disks that have a decent record of not dying
and use those.

   Replacement disks obtained from a warranty exchange make me queasy.  They
are almost universally refurbished disks, meaning they came into the
manufacturer because of some problem and got "fixed".  I've never liked using
this kind of disk in an intense environment, and thats exactly what Grex is. 
These days we can get a 36G scsi disk for about $250, which is pretty amazing. 
Thats a 15,000 rpm ultra-320 speed disk, too.  Amazing.
cross
response 175 of 457: Mark Unseen   May 2 11:53 UTC 2005

This response has been erased.

steve
response 176 of 457: Mark Unseen   May 2 12:04 UTC 2005

   In terms of disk I/O, it is.
gull
response 177 of 457: Mark Unseen   May 2 13:42 UTC 2005

Spamassassin is a great program, and it'd be great if Grex could use it.
 I'd suggest being very careful about implementing it, though, because
it can consume large amounts of CPU and memory.  Here are my
suggestions, after playing with it for a while myself:

- Large messages should bypass Spamassassin.  "Large" here meaning
anything over 100K or so.  There messages are unlikely to be spam,
anyway, and scanning them takes far too long.

- Use spamd/spamc, don't call Spamassassin directly.  Exim 4.50 and
later, or earlier versions with the Exiscan patch, can call spamd
directly from the DATA acl.  I recommend this because it saves some
process overhead, and it allows a lot of flexibility.

- When you test it, monitor the CPU load carefully, and be ready to take
Spamassassin back out of the loop if you find it's consuming too many
resources.

- Running a caching nameserver can really speed up the Spamassassin
tests that rely on DNS-based blacklists.  If there isn't already one on
the local LAN, you may want to run one.
cross
response 178 of 457: Mark Unseen   May 2 15:36 UTC 2005

This response has been erased.

tod
response 179 of 457: Mark Unseen   May 2 15:45 UTC 2005

re #163
    Once I got over to Provide.net it became clear that the passwd file
 was really messed up.  Root's password wasn't what it should be, nor
 were any others that I knew of.  Booting Grex into single user mode
 revealed an /etc/passwd file of about 1100 accounts, not the 24,000
 that it should have had.  Worse, the "master.passwd" main passwd file
 and associated database (.db) files were messed up as well.
I hate to break it to you but it sounds like Grex was hacked.  It'd be in
everyone's best interest to change their password and also any other passwords
they may have attempted to use here that match that of a login to a system
elsewhere.  For more information on what I'm talking about, catch the article
on Arbornet in next months Fortune Magazine.
 0-24   25-49   50-74   75-99   100-124   125-149   130-154   155-179   180-204 
 205-229   230-254   255-279   280-304   305-329   330-354   355-379   380-404   405-429 
 430-454   455-457         
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss