Grex Helpers Conference

Item 134: Grex System Problems - Summer 2004

Entered by i on Tue Jun 22 02:27:23 2004:

258 new of 286 responses total.


#29 of 286 by twenex on Tue Jul 13 22:18:30 2004:

Looks like DSL woes are back, AGAIN. What's happening with the new modem?


#30 of 286 by pgreen on Tue Jul 13 22:45:47 2004:

This response has been erased.



#31 of 286 by jor on Wed Jul 14 00:31:53 2004:

        Where can we read about NextGrex?
        SciFi.cf??


#32 of 286 by pgreen on Wed Jul 14 01:30:03 2004:

This response has been erased.



#33 of 286 by pgreen on Wed Jul 14 02:38:30 2004:

This response has been erased.



#34 of 286 by grexmom on Wed Jul 14 02:39:20 2004:

sghSGSWAH

Hum......another fine example of that fine educational system and its 
smart Ph.D-ed personnel??


#35 of 286 by pgreen on Wed Jul 14 03:23:33 2004:

This response has been erased.



#36 of 286 by twenex on Wed Jul 14 15:35:54 2004:

Re: #34. Heh.


#37 of 286 by pgreen on Wed Jul 14 16:55:35 2004:

This response has been erased.



#38 of 286 by i on Sun Jul 18 22:10:08 2004:

Grex was down from about 6:24AM to 6:00PM (EDT) today due to another hardware
lockup/crash.  Too warm in the pumpkin, bad alignment of stars, or what?


#39 of 286 by cross on Sun Jul 18 22:19:26 2004:

This response has been erased.



#40 of 286 by naftee on Mon Jul 19 02:12:46 2004:

New fan?


#41 of 286 by bru on Mon Jul 19 06:21:43 2004:

or the effects of polytarps actions.


#42 of 286 by gregb on Mon Jul 19 14:45:56 2004:

If he's alaising as Pgreen, I vote for the latter.


#43 of 286 by keesan on Mon Jul 19 21:50:37 2004:

174 new items in agora.  


#44 of 286 by cross on Tue Jul 20 01:23:43 2004:

This response has been erased.



#45 of 286 by gregb on Tue Jul 20 01:28:46 2004:

<RANT>
Just zap his ass.  It was very tedious getting rid of all his @#$% from
my topics list (I use Backtalk).  How much of this stupid crap do we
have to put up with before something gets done?
</RANT>


#46 of 286 by tpryan on Tue Jul 20 03:26:50 2004:

        Dan, I give more authority to ask 'em to stop than it has
for posting them.


#47 of 286 by naftee on Tue Jul 20 03:33:02 2004:

re 44 Can you please retire those items?


#48 of 286 by glenda on Tue Jul 20 04:19:21 2004:

I believe that they have been removed.


#49 of 286 by cross on Tue Jul 20 04:52:29 2004:

This response has been erased.



#50 of 286 by tsty on Tue Jul 20 07:29:29 2004:

hell, a buncha of us *wish* we could take credit for the salvation.
  
congratulations to whomever it was!
/


#51 of 286 by naftee on Wed Jul 21 02:18:07 2004:

re 49 oic, but why'd it take so long ?!


#52 of 286 by jor on Wed Jul 21 12:09:03 2004:

 grex: telnet

/tmp: write failed, file system is full

/tmp: write failed, file system is full
/usr/local/grex-scripts/.inet_real/telnet> 



#53 of 286 by jor on Wed Jul 21 12:11:13 2004:

 grex: df
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/sd0a             109823   73971   24870    75%    /
/dev/sd0d             156783  120973   20132    86%    /usr
/dev/sd6h            1971009 1798533       0   101%    /usr/local
/dev/sd0e             706783  372750  263355    59%    /bbs
/dev/sd0f             471183  450069       0   106%    /x
/dev/sd6g            1969885 1621510  151387    91%    /var
/dev/sd7h            1969885 1026946  745951    58%    /var/spool/mail
/dev/sd2a              31023   16317   11604    58%    /rootbak
/dev/sd2d              31023   15998   11923    57%    /suidbin
/dev/sd2f              62863   56584       0   100%    /tmp
/dev/sd2h             842574  709514   48803    94%    /s
/dev/sd4a            1944365 1749935       0   100%    /c
/dev/sd7g            1971009  774405  999504    44%    /d
/dev/sd11g           1971692 1733443   41080    98%    /a
/dev/sd2e             699223  455142  174159    72%    /oldvar
 grex:




#54 of 286 by naftee on Wed Jul 21 15:48:16 2004:

Filesystem            kbytes    used   avail capacity  Mounted on
/dev/sd0a             109823   73979   24862    75%    /
/dev/sd0d             156783  121239   19866    86%    /usr
/dev/sd6h            1971009 1798541       0   101%    /usr/local
/dev/sd0e             706783  372754  263351    59%    /bbs
/dev/sd0f             471183  450069       0   106%    /x
/dev/sd6g            1969885 1629334  143563    92%    /var
/dev/sd7h            1969885 1028824  744073    58%    /var/spool/mail
/dev/sd2a              31023   16317   11604    58%    /rootbak
/dev/sd2d              31023   15998   11923    57%    /suidbin
/dev/sd2f              62863    3873   52704     7%    /tmp
/dev/sd2h             842574  709514   48803    94%    /s
/dev/sd4a            1944365 1671559   78370    96%    /c
/dev/sd7g            1971009  780978  992931    44%    /d
/dev/sd11g           1971692 1733229   41294    98%    /a
/dev/sd2e             699223  455142  174159    72%    /oldvar


#55 of 286 by rcurl on Thu Jul 22 05:26:01 2004:

The vandal has struck here too - 50 newresponse items, with many just
'activations' of items with no response entered. Does Grex have any
way to stop these denial-of-service attacks?


#56 of 286 by glenda on Thu Jul 22 05:34:39 2004:

These are the result of Tod scribbling all his responses yet again.


#57 of 286 by rcurl on Thu Jul 22 05:37:59 2004:

It sure is a nuisance.


#58 of 286 by glenda on Thu Jul 22 05:39:33 2004:

He hit almost every cf, including 107 new responses in Fall 2003 Agora and
60 in Winter 2003/2004.


#59 of 286 by rcurl on Thu Jul 22 05:57:21 2004:

It would be useful if responses could only be scribbled within some short
time - 24 hours? - after posting them. 


#60 of 286 by mfp on Thu Jul 22 06:53:08 2004:

That's not technically feasible.  Especially since no-one's going to spend
any time doing it.  Since they aren't even making New Grex go.


#61 of 286 by slynne on Thu Jul 22 15:19:53 2004:

resp:59 - yes, it would be useful if that were the rule. We could have 
avoided the whole valerie and jep thing if that were the case. As it 
is, as long as we allow some users to delete their posts, we have to 
allow everyone to do it even if some people choose to be obnoxious 
about it. 


#62 of 286 by albaugh on Thu Jul 22 17:03:59 2004:

I don't think a limit on scribbling is warranted, as much as I'm annoyed.
As intelligently as he writes, tod is acting like a moron every time he does
this.  Maybe he's trying to keep up with the polytarp's in the twit race...


#63 of 286 by tod on Thu Jul 22 18:16:49 2004:

Or maybe I'm exercising my right to scribble old responses and saving Grex
some disk space while others see it as a nuisance.  Move along, nothing to
see here, folks.


#64 of 286 by rcurl on Thu Jul 22 18:59:18 2004:

Why don't you  think a limit on  scribbling is warranted, albaugh? It
could give anyone adequate time to scribble if they want. The point is
to stop this *wholesale* scribbling, which grinds one's use of the  bbs
to a crawl, as one encounters and bypasses numerous empty responses. If
there were a limit, those that want to scribble would not have as many
to scribble all at once. 


#65 of 286 by tod on Thu Jul 22 19:07:03 2004:

Why not change Picospan to not show items as having a new response when there
is just a scribble INSTEAD of punishing those that want to remove their
responses for whatever reasons they may have?


#66 of 286 by gelinas on Thu Jul 22 19:59:24 2004:

I don't _think_ removing your responses will save grex significant disk-space:
each response is lines in a file, and the file still remains after the lines
are gone.

Picospan (and probably every other conferencing system) just compares the
last-modified time of the item file with the last-read time in the
participation file.  Deleting a response updates the modification time of the
item file.


#67 of 286 by marcvh on Thu Jul 22 20:02:12 2004:

I'm not sure what the 24-hour time period would do.  But it's become
pretty clear that allowing authors to scribble their own responses
causes more problems than it solves.


#68 of 286 by tod on Thu Jul 22 20:48:37 2004:

The discussion of "allowing authors to" is a problem outside of the truth
complaint I'm hearing.  People complain they are being told there is a new
response when there isn't one.  It has nothing to do with motivation within
authors.

Joe says, " Picospan (and probably every other conferencing system) just
compares the
 last-modified time of the item file with the last-read time in the
 participation file.  Deleting a response updates the modification time of
the
 item file."

That sounds like a problem to me.  Perhaps its time to revisit the
modification detection process in BBS.


#69 of 286 by marcvh on Thu Jul 22 21:03:36 2004:

"Problem" is relative.  PicoSpan was written under the assumption that 
modifying responses after they are entered would be a relatively rare
event, and so a little anomalous behavior in this case was acceptable.

Unfortunately, on an open system one can assume that anything which can
be done to annoy people will be done, and not rarely.


#70 of 286 by tod on Thu Jul 22 23:25:14 2004:

"a little anomalous behavior in this case was acceptable"
So the author is responsible if some deem "little" as "too much"?
Illogical.


#71 of 286 by marcvh on Thu Jul 22 23:30:29 2004:

If the "little" behavior is multipled by being done hundreds of times,
and thereby becomes "too much", and the author did it hundreds of times
for the express purpose of annoying people in order to beat a dead
horse, then yes.


#72 of 286 by tod on Thu Jul 22 23:44:49 2004:

Let's break this down. The "little" behavior was a one time thing unless
you're saying that somehow I populated Grex with hundreds of responses in the
sum of one day(which I did not).
If I delete all my responses in one day after years of posting, that makes
means I'm doing it "for the express purpose of annoying people in order to
beat a dead horse"?
I find the analysis of the behavior a spin from the real complaint that there
is "a little anomalous behavior in this case was acceptable".


#73 of 286 by marcvh on Fri Jul 23 00:05:46 2004:

This was not a one-time thing; you have repeatedly run scribble scripts,
which slow the system down, scribble your items, and produce mild
annoyance due to PicoSpan timestamp issues or people who now have
difficulty following conversations in items where your content has been
removed.  I have no idea what your point is with the "entering hundreds
of responses in one day" part so I'll ignore it.

Yes, based on the information available, I feel the most reasonable
conclusion is that you run scribble scripts for the express purpose of
annoying people in order to beat a dead horse.  If you can honestly say
that you have some other primary reason and you have no wish to annoy
people, then I apologize.

I agree with you that this is a bug in PicoSpan, but that seems only
peripherally relevant.


#74 of 286 by tpryan on Fri Jul 23 11:57:08 2004:

        Let's get down to the point of abuse of the system and 
stay there.  98% of the responses recently scribbled had already
been scribbled.  That is consumeing resources beyond reasonable use.
        Tod, I thought you where one for taking responsibility for
ones deeds.  It is not the fault of the system.  The fault is yours.


#75 of 286 by gull on Fri Jul 23 13:20:35 2004:

Re resp:63: Since scribbled responses are logged elsewhere, I don't
think scribbling is a net savings of disk space.  In fact, since I'm not
sure the file containing the response that was scribbled is actually
shortened, it may be a net loss.  (I'm not certain about that, but
removing lines from the *middle* of a long file is a time-consuming
process, so I would guess it'd be avoided.)


#76 of 286 by scott on Fri Jul 23 13:40:16 2004:

Quite aside from Tod's other annoyances, he's been running an idle-evader (or
worse?) script recently.


#77 of 286 by tod on Fri Jul 23 15:54:43 2004:

Sorry if I'm annoying everyone with my presence and interaction.  I do not
apologize.


#78 of 286 by albaugh on Fri Jul 23 16:43:52 2004:

Your presence & interaction is welcome, not annoying.  Your penchant for
periodic wide scribbling of your responses is "stupid" and annoying.


#79 of 286 by tod on Fri Jul 23 16:45:03 2004:

You say tomato, I say roma tomato.


#80 of 286 by albaugh on Fri Jul 23 16:49:31 2004:

All roadsa lead to roma.


#81 of 286 by tod on Fri Jul 23 17:00:03 2004:

On an evenin in Roma


#82 of 286 by naftee on Sat Jul 24 00:50:11 2004:

I've noticed that running scribble scripts nowadays doesn't seem to annoy
people as much as it used to.


#83 of 286 by rcurl on Sat Jul 24 02:55:40 2004:

Perhaps the people doing it now are of less consequence.


#84 of 286 by naftee on Sat Jul 24 14:41:55 2004:

Or people have just memorized the item numbers with all the activity



#85 of 286 by twenex on Mon Jul 26 13:52:41 2004:

Sometimes I can ping grex but not connect to it. Somet imes I can telnet in
but can't use backtalk. What gives?


#86 of 286 by mfp on Mon Jul 26 17:12:21 2004:

Erectile disfunction is common among people with paralyzed backs, twenex.


#87 of 286 by twenex on Mon Jul 26 17:13:36 2004:

My back is not paralized. Unlike your brain.


#88 of 286 by mfp on Mon Jul 26 17:28:51 2004:

What trash talk.

You bring it on, bitch!


#89 of 286 by twenex on Mon Jul 26 17:30:00 2004:

Apologies for the misseplling in that response (#87).


#90 of 286 by tod on Mon Jul 26 17:30:15 2004:

What's disfunction?  Is that when you skip Algebra homework in Detroit?


#91 of 286 by twenex on Mon Jul 26 17:31:15 2004:

Holy crap. Apologies for the misspellings in responses #87 and #89.


#92 of 286 by mfp on Mon Jul 26 17:32:10 2004:

It's okay, twenex.  I wouldn't even have noticed if you hadn't pointed them
out.


#93 of 286 by happyboy on Mon Jul 26 18:01:20 2004:

re 90:  *groan*


#94 of 286 by i on Tue Jul 27 01:13:43 2004:

Re: #85
It looks like the internet connection was getting flakey again...i just
rebooted the DSL modem; recent experience says that should fix the problem
for roughly 2 weeks.


#95 of 286 by twenex on Tue Jul 27 01:24:17 2004:

Thankyou ta.


#96 of 286 by naftee on Tue Jul 27 07:14:48 2004:

Apologies for the spelling errors in resp:95 .  Ta.


#97 of 286 by mfp on Tue Jul 27 07:24:05 2004:

Go baq to Iraq.


#98 of 286 by gregb on Wed Jul 28 16:56:12 2004:

Re. 75:  Is this a straight, text file?  If so, and if seah line is
independent of the others (e.g., each line ends in a CR or EOL), then
you could use a text editor that can sort the lines so that all the
scribbled lines would be grouped together (I assume there's some
indicator that says it's been scribbled) so it would be easy to delete
them in one shot.


#99 of 286 by cross on Thu Jul 29 00:41:04 2004:

This response has been erased.



#100 of 286 by gull on Thu Jul 29 15:04:23 2004:

Conferencing systems now seem to mostly be based on some kind of
database system.  Usually MySQL, but that's mainly because it's so
widely available -- if all you have is a hammer, everything starts to
look like a nail. ;)


#101 of 286 by gelinas on Thu Jul 29 17:40:20 2004:

(Confer U was based on gdbm, IIRC.)


#102 of 286 by tsty on Sat Jul 31 19:32:09 2004:

((( that was teh third iteration/generation of parnes' confer ???)


#103 of 286 by gelinas on Sun Aug 1 00:38:50 2004:

It was the port to Unix, yes.  It was based on WMU's port to VMS, Confer V.


#104 of 286 by russ on Sun Aug 8 15:50:21 2004:

Grex is inaccessible via ssh, and the web interface is excruciatingly
slow.  What's going on?


#105 of 286 by rcurl on Sun Aug 8 18:13:58 2004:

I recall its accessible via ssh1, although I only have an ssh2 client so
can't use that. 


#106 of 286 by russ on Sun Aug 8 19:58:07 2004:

It takes over 1 minute to load a new page in Backtalk.  I cannot
establish an ssh connection at all; it times out before connecting.

Doesn't anyone even know that something is wrong?


#107 of 286 by mfp on Sun Aug 8 21:36:24 2004:

It's fine now.


#108 of 286 by davel on Mon Aug 9 00:35:31 2004:

See coop item 105 resp 73 as to (maybe) why network connections were slow.


#109 of 286 by gelinas on Mon Aug 9 01:24:43 2004:

It looks like a vandal was misbehaving; its processes were killed off around
17:30.


#110 of 286 by sholmes on Wed Aug 11 02:27:54 2004:

Someone is hogging the system
12% w joelgsm
 10:22pm  up 23 days,  4:34,  42 users,  load average: 5.36, 4.48, 3.95
User     tty       login@  idle   JCPU   PCPU  what
joelgsm  ttyp7    10:21pm     1                /usr/local/bin/bash
joelgsm  ttypf    10:13pm     8                -
joelgsm  ttyq3     9:42pm    39                -
joelgsm  ttyq7     8:50pm  1:30                -
joelgsm  ttyq8    10:17pm     4                -
joelgsm  ttyqb    10:15pm     6                -
joelgsm  ttyqe     9:21pm    59                -
joelgsm  ttyr0     9:27pm    53                -
joelgsm  ttyr5    10:05pm    16                -
joelgsm  ttyr8     9:52pm    28                -
joelgsm  ttyra     9:13pm  1:08                -
joelgsm  ttyrb     9:48pm    33                -
joelgsm  ttyrc     8:09pm  2:12                -
joelgsm  ttys2     9:46pm    35                -
joelgsm  ttys8     9:32pm    49                -
joelgsm  ttys9    10:19pm     2                -
joelgsm  ttysb     9:50pm    31                -
joelgsm  ttyt0     8:40pm  1:41                -
joelgsm  ttyt3    10:07pm    14                -
joelgsm  ttyu4     9:34pm    47                -
joelgsm  ttyu6     9:23pm    57                -
joelgsm  ttyuc     9:44pm    37                -


#111 of 286 by cross on Wed Aug 11 03:05:56 2004:

This response has been erased.



#112 of 286 by katie on Thu Aug 12 19:50:23 2004:

My sister Meg (and I) would love it if Grex didn't interpret her
"mintsol.com" as spam. I would like to be able to receive her email.
Can this be fixed?



#113 of 286 by prp on Sat Aug 14 18:47:47 2004:

Yesterday, 2004.08.13, from about 2pm-6pm backtalk was extreamly slow,
also Telnet and SSH connections failed.  Backtalk was so slow IE could
not cope, but Netscape could.


#114 of 286 by tpryan on Sat Aug 14 19:39:42 2004:

        Any reason for not being able to phone in Friday and early
today, while the system has been up for 26 days?


#115 of 286 by gelinas on Mon Aug 16 03:01:05 2004:

Looks like we had someone using up lots of resources.  A staff member was 
able to visit the Pumpkin and kill off the user's processes.


#116 of 286 by mary on Mon Aug 16 10:42:30 2004:

Thank you, staff, for Grex.


#117 of 286 by slynne on Mon Aug 16 12:48:26 2004:

Yay staff!


#118 of 286 by jor on Mon Aug 16 18:43:48 2004:

        email: negative function?



#119 of 286 by rcurl on Tue Aug 17 21:39:50 2004:

I get the following message when logging in using ssh-1 from Terminal in  Mac
OS 10.3.2

Warning: Server lies about size of server public key: actual size is 767 bits
vs. announced 768.
Warning: This may be due to an old implementation of ssh.

The connection works OK thereafter, so what is the significance of this
warning?


#120 of 286 by jor on Tue Aug 17 22:30:18 2004:

        1 bit


#121 of 286 by cross on Wed Aug 18 00:15:40 2004:

This response has been erased.



#122 of 286 by tsty on Wed Aug 18 03:38:53 2004:

  
   grex is a spammer, it says here:
  
   REALLY need this fixed,   pp ll ee aa ss ee !!
  
*******  See http://spamblock.outblaze.com/216.93.104.34  <<<<<<<
  
from:
  
The original message was received at Tue, 17 Aug 2004 23:12:48 -0400
from YYYyyy@localhost

   ----- The following addresses had delivery problems -----
XXXXXxxxxxx@lycos.com  (unrecoverable error)

   ----- Transcript of session follows -----
... while talking to lycos-com.mr.outblaze.com.:
>>> RCPT To:<XXXXxxxx@lycos.com>
<<< 554 EMail from mailserver at 216.93.104.34 is refused. 

*******  See http://spamblock.outblaze.com/216.93.104.34  <<<<<<<

554 XXXXxxxx@lycos.com... Service unavailable

   ----- Original message follows -----

--XAA22521.1092798775/grex.cyberspace.org
Content-Type: message/rfc822

 /cry


#123 of 286 by gull on Thu Aug 19 13:56:57 2004:

From Outblaze's webpage:

"This listing indicates that the mailserver you are sending our users
mail from is a "Direct Spam Source" - that is, it has been blocked
locally by our administrators as a result of spam coming in from that
server to our users.

"Please have your ISP or systems administrator contact us at
postmaster@outblaze.com regarding this issue."


#124 of 286 by blaise on Thu Aug 19 20:11:11 2004:

The TTY queue is hosed, as is part of spring agora.


#125 of 286 by davel on Fri Aug 27 16:13:27 2004:

/tmp needs to have its permissions corrected.


#126 of 286 by mcnally on Fri Aug 27 16:23:09 2004:

 It sure does..


#127 of 286 by tpryan on Fri Aug 27 16:24:04 2004:

        I tried to get my mail, I got permission denied.  An aftereffect
of catching up on the mail?
like /tmp/Rxa06999: permission denied.


#128 of 286 by kip on Fri Aug 27 16:30:33 2004:

Okay, I believe I have /tmp permissioned correctly now.  I'm still a little
groggy, so holler if that's not it.


#129 of 286 by gull on Fri Aug 27 17:06:01 2004:

I crashed Backtalk just now by hiting the 'Next Conf' button.  I didn't
copy the whole error (I assume it's logged somewhere) but here's the
first part.  Looks like there's a permission problem.

ERROR: Could not open item file /bbs/agora49/_49 - Permission denied
executing "conf_new" on line 63 of pistachio/confhome.bt

Version: Backtalk version 1.2.24

Stack:
(<HTML><HEAD>\n<BASE HREF=")
(http://www.cyberspace.org/cgi-bin/pw/backtalk)
(/)
(pistachio/)
(confhome">\n)
MARK[0]


#130 of 286 by janc on Fri Aug 27 17:20:02 2004:

Yeah, there were some problems restoring agora49 (the previous agora to this
one).  A lot of it's files are munged.


#131 of 286 by albaugh on Fri Aug 27 17:26:27 2004:

I realize that grex has just gotten back "on the air".  But could someone give
us the Reader's Digest version of what happened to grex?  What was done to
fix it?  Why was it so hard?  Is this likely to repeat?  Whether or not the
same thing could happen on Next Grex?


#132 of 286 by albaugh on Fri Aug 27 17:27:27 2004:

With the problems that grex experienced, would there have been any way to
display some terse "system down because X" + auto-logout during connection
via telnet or dialup?


#133 of 286 by gregb on Fri Aug 27 18:08:51 2004:

Re. 131:  Would U like fries with that order?  B-)


#134 of 286 by albaugh on Fri Aug 27 18:10:20 2004:

Onion rings & a chocolate shake, please!  :-)


#135 of 286 by dpc on Fri Aug 27 18:29:24 2004:

I'd also like to know what happened.  This "disk disaster" ranks with
the accidental destruction of the password file (and its hand-rebuilding
by Marcus) several years ago.


#136 of 286 by janc on Fri Aug 27 18:37:46 2004:

Here's what I know:

  Grex's disk drive zero died.  This disk is used to boot the system, and
  contains most of the operating system (root, /usr, /usr/local) and the
  bbs data (/bbs).  It was still partially working, but not enough to
  boot the system.

  Our last tape backup was many months old.  I forgot how many, but way too
  many.

  STeve and Kip did nearly all the work restoring things.  I was on vacation
  when this started and didn't get involved till late and then only in
  limited ways.

  STeve and Kip's initial problem was that you can't build a new disk on
  a machine that you can't boot.  So they're plan was to take Grex's tape
  drive to one of the machines at Kip's workplace, and build a new boot
  disk for Grex from the backup tape there.  However, they didn't have
  the latest backup tape, which (appropriately) was not stored on site.
  So they got the tape drive hooked up, and discovered that they couldn't
  read the much older tape backups that they had brought from the pumpkin.

  Later, Joe passed the latest backup tape to STeve.

  I came home from vacation, and spent a little time in the pumpkin.  Since
  I did much of the original building of the current Grex system, I
  remembered that we had a CD-drive for Grex (standard ones won't work
  with SunOS) and a 4.1.4 distribution CD which can be booted.  I hooked
  this up and figured out how to boot from the CDrom and documented this.
  Booting from the CDrom gives you an extremely limited set of tools. It
  looked to me to be too limited to actually do anything useful, plus I
  had neither the tape drive, nor the backup tape, so I left things at that.

  Kip and STeve again got together, with tape drive and backup tape.  They
  actually managed to figure out how to do a restore when booted from the
  CDrom.  (I'd still like to know how they did that.)  However, they
  discovered that most of the spare disk drives in the pumpkin were
  unusuable.  Some are differential drives.  Some are too small.  The
  only viable candidates were four 4Gig Conner drives.  They tried two
  of these and found that both were defective.  (I had tried using one
  of these years ago and found it didn't work, but they didn't know that).

  When I saw their emailed report the next day, I went and searched my
  house for some other disk drives.  When we had started putting together
  the new Grex, but didn't have drives yet, I had borrowed some drives from
  the pumpkin that I thought I could use temporarily.  These were 4Gig
  Seagate drives, which had previously been used on a development system
  that we ran for a while called "grease".  I never ended up using them on
  the NextGrex project, and couldn't really remember what I'd done with them.
  I found them in my garage and returned them to the pumpkin.

  STeve and Kip did another late night session, restoring the backups of
  root, /usr, and /usr/local onto one of the Seagate disks.  Years ago,
  STeve had set up a cron process that backed up the /etc/passwd file and
  related files to various other disks periodically, so they had a current
  copy of that.  However, he had not backed up /etc/group or the system
  mail aliases, so new versions of those have to be built.  There is
  probably more work to do updating things that had changed since the last
  backup, but not that much has.

  I believe the got /bbs by reading off the dead drive.  Mostly the /bbs
  partition of the old drive was still readable, but there seems to have been
  some damage.  Items 19 through 58 in oldagora (agora49) where trashed.
  I think someone said they had an offsite backup of this though, so we may
  be able to restore those.

  Overall, we took way too long to get this job done.  We repeatedly allowed
  ourselves to fall into resource deadlocks - first STeve and Kip couldn't
  do much because Joe had the backup tape, then I couldn't do much because
  STeve and Kip had the backup tape and tape drive respectively, then STeve
  and Kip couldn't do much because I had the only good disk drives.  Each
  time STeve and Kip's work got blocked, it took them some time to be able
  to get together again - they both have very busy schedules.

  I think that part of the problem is that knowledge of these old Sun systems
  is thinly scattered.  STeve and Kip are experienced system administrators,
  but I doubt either has done much SunOS work for years.  I'm not a system
  administrator at all, and my knowledge about this stuff is very spotty, but
  I did do a lot of the "recent" work on Grex, so I know more about what
  CD drives and disk drives and bootable CD's we have.  I don't think any
  one of us knows enough to be able to readily do this kind of job on our
  own.  This means that we have to work together on jobs like this, and
  that slows things down, as it is hard to coordinate among us.  Hopefully
  this will be less of an issue with the new machine and operating system
  which are better known to more people.


#137 of 286 by albaugh on Fri Aug 27 18:51:16 2004:

Thanks very much to everyone who helped with the restoration, in any way!

It sounds like this is the first time this has happened to grex.  It probably
shouldn't happen again.  It shouldn't happen any more frequently on Next Grex,
hopefully much less.  And if it did, Next Grex should be easier to recover
from.  Is this a correct assessment?


#138 of 286 by mary on Fri Aug 27 22:10:08 2004:

A huge thank you to Kip, STeve and Jan.  While the rest of us
were missing Grex you folks were spending hours of your time 
thrashing though problem after problem.  You are our heroes.


#139 of 286 by jor on Sat Aug 28 00:01:50 2004:

        no no, we just criticise, no gratitude.

        next we'll dock their pay.


#140 of 286 by kip on Sat Aug 28 01:06:15 2004:

Thanks Jan, that was a very good summary of the events.

As for the "trick" to restoring from tape while booting from the CD,
actually after booting from the CD, you have an option to install a miniroot
system (like a modern Linux rescue disk) to one of the swap partitiions on
the system and then boot from that where you can then create mount points and
start to work with the filesystems and eventually restore from the tape. 

A rather nice feature, wouldn't you say?


#141 of 286 by janc on Sat Aug 28 03:15:14 2004:

I created a mini root, but I didn't see a mount command or a restore
command.  I thought that was rather pathetic.  Probably I was hallucinating.
That would be just too dumb to be real.


#142 of 286 by charcat on Sat Aug 28 03:18:51 2004:

Thanks to Kip, Steve, Jan and all others who resurrected Grex! (charcat does
the snoopy happydance!)


#143 of 286 by keesan on Sat Aug 28 03:27:41 2004:

Jim asks whether the next grex will have more stuff on a bootable CD.
And whether you can do backups to CD or DVD instead of tape.


#144 of 286 by gelinas on Sat Aug 28 03:44:03 2004:

One advantage of the new machine, which could be put to use on the current
one, is that much of the documentation, and thus the critical files, are
being stored in CVS, on a separate machine.  

Yes, backups can be done to CD instead of tape.  Backups can also be done
to separate disks.  As disk gets cheaper, many folks are finding it makes
more (economic) sense to mirror to disk than to tape or CD.


#145 of 286 by janc on Sat Aug 28 16:31:33 2004:

Next Grex has lots of extra disk space, part of which is currently configured
as a mirror disk.  We don't yet have a CD-R drive for the machine.  We should
probably start a discussion of backup strategies for it.

Currently I've got the disks set up so that we can always have two copies of
the OS installed.  Each time I upgrade the OS, I replace the older copy with
the new one.  Thus it should always be possible to boot Grex into either of
the last two OS versions.  Eventually I want to change over to a procedure
where we can install the next version of the OS on the alternate partitions
while Grex is running.  Theoretically it should be possible to do an OS
upgrade with almost no down time.

Also, as Joe says we are putting everything needed to build a new Grex into
the off-site CVS archive.  My goal is to be able to build and configure a new
Grex, starting from a blank machine with a net connection, in under 24 hours.
So we are checking all the grex-specific code we have into the archive, all
the config files, together with scripts to fetch packages from the net, build,
configure and install them.  Of course, user data and and bbs data will need
to be restored from backup.


#146 of 286 by albaugh on Sat Aug 28 19:31:59 2004:

The one limitation of backing up to disk is that it would still be in close
proximity to the master, so if a disaster struck the pumpkin there would be
no off-site storage to recover from.


#147 of 286 by gregb on Sat Aug 28 21:54:50 2004:

Why is SunOS used as opposed to Linux or BSD?


#148 of 286 by jor on Sat Aug 28 23:51:28 2004:

        runs on a SUN


#149 of 286 by janc on Sun Aug 29 03:44:18 2004:

Next Grex runs on OpenBSD.

Grex opened for business on July 18, 1991.  Linus Torvald released the very
first version of Linux about two months later. ("Hello everybody out there
using minix - I'm doing a (free) operating system (just a hobby, won't be
big and professional like gnu) for 386(486) AT clones.")  Somehow, the
founders didn't seem to think Linux was quite ready for the job at the time.

I'm not exacty sure what the situation was with BSD in 1991, but it wasn't
an option the founders were likely to have spent an awful lot of time
thinking about either.


#150 of 286 by keesan on Sun Aug 29 09:13:07 2004:

So what OS did first grex use?  And what hardware?


#151 of 286 by remmers on Sun Aug 29 14:14:44 2004:

The hardware was a Sun 2, running SunOS (I forget which version).

Jan's right - Linux didn't exist yet, BSD wasn't easily available at low
cost, and we did have access to Sun (which was regarded as the Cadillac
of Unixes at the time).

Times have changed though, and I'm glad we're making the switch to x86
hardware and BSD.


#152 of 286 by janc on Sun Aug 29 14:56:00 2004:

I think the first open source version of BSD, BSD/386 was also released in
1991.  It too would have been horribly inadequate for Grex's needs.  FreeBSD,
NetBSD, and OpenBSD were all years later.

I presume that it was SunOS 4.1.3 on the Sun 2.  The differences between
that and the SunOS 4.1.4 running on this system are entirely unnoticable.
Mostly bug fixes.

At the time, SunOS was clearly the most stable, most capable version of
Unix available in our price range (probably in any price range).  It's
still a remarkably solid piece of software.  For me the main reason to
move off it is that too many of the open source packages that we want to
use (like mysql) no longer compile on SunOS.


#153 of 286 by dpc on Mon Aug 30 14:34:21 2004:

Thanks to Kip, STeve and Jan!


#154 of 286 by tsty on Mon Aug 30 17:14:20 2004:

nice job ... we all appreciate the efforts and results -thank you


#155 of 286 by mfp on Tue Aug 31 19:55:43 2004:

Hi, all!  I was in Ann Arbor!  I ate at the Fleetwood!


#156 of 286 by happyboy on Tue Aug 31 20:13:15 2004:

i'm sorry.

i use to work there.  yuk.


#157 of 286 by tod on Tue Aug 31 20:18:27 2004:

I used to consume hippy hash served by a tracked up Lisa.  The coffee sucked
but they had a torlet so what the hell.


#158 of 286 by happyboy on Tue Aug 31 20:29:51 2004:

i remember some of the grafitti from the torlet:

"The Fleetwood makes me shit PURE WATER SHIT."

 accompanied by a childlike drawing of a screaming
person sitting on a torlet.

the cook use to pork his girlfriend in the storeroom and would
ash his ciggies in the chilipot.

bon appetit!


#159 of 286 by tod on Tue Aug 31 20:48:25 2004:

We put a arbornet sticker in that torlet..wonder if its still there


#160 of 286 by naftee on Wed Sep 1 03:38:02 2004:

INSIDE the torlet?  Highly unlikely it survived.


#161 of 286 by gregb on Wed Sep 1 14:47:30 2004:

Getting back to "Grex System Problems..."

In Backtalk, I disabled the "Favorites" items, but my listings are still
being seperated.


#162 of 286 by keesan on Thu Sep 2 23:16:55 2004:

I am using procmail filter and I turned on the verbose part to figure out why
I get messages about locked filters:

Locking "var/spool/mail/k/e/keesan.lock"
Procmail:  Error while writing to "/var/spool/mail/k/e/_w30Ggrex.cybe"
I get several of the above line then it unlocks things, every time.  Why?


#163 of 286 by gelinas on Fri Sep 3 03:09:49 2004:

The error occurs because you don't (and shouldn't) have 'write' access to the 
directory /var/spool/mail/k/e/ .  The file "/var/spool/mail/k/e/_w30Ggrex.cybe"
doesn't exist.


#164 of 286 by keesan on Fri Sep 3 13:48:47 2004:

Have I set up .procmailrc wrong?  Why it is trying to do something it should
not do?  MAIL=/var/spool/mail/k/e/keesan is my first line of the filter, which
is no longer working to catch spams.  Maybe I broke it?  But I was frequently
getting these lock messages before and about 1-2 spams a day that should have
been caught were not, and now ALL of the spams are getting through (8 in the
last 5 hours or so).  I would appreciate if you could take a look at the
filter, or I could post the complete (verbose) log file for one spam.  I am
wondering if this is something to do with the grex revival (a bug).


#165 of 286 by keesan on Fri Sep 3 14:55:43 2004:

I checked my log and every single message (none of which were caught by the
filter) says I had a lock failure.  I think previously only the ones that the
spam filter missed said that.  Has something changed at grex or did I mess
up my filter?  .procmailrc    


#166 of 286 by keesan on Fri Sep 3 20:04:33 2004:

Here is a typical entry in my log file

procmail: Lock failure on "/var/spool/mail/k/e/keesan.lock"
From dmawllet@hotmail.com  Fri Sep  3 10:54:30 2004
 Subject: Cailis for $6 ($3 a dose)
  Folder: /var/spool/mail/k/e/keesan                                        925


When I get the lock failure, the spam is not filtered to /dev/null as it
should be.  What is causing the lock failure and how can I (or staff) fix
it?.  This is 10 times as bad as it was before the grex disaster.

Todd, are you having spam filter problems (you use my filter, I think).


#167 of 286 by keesan on Fri Sep 3 20:17:48 2004:

I think I caught the problem - I am sending anything Received from ... grex
or cyberspace to my inbox and this part of the header includes not only the
sender's but also the recipient's address.  Sorry to bother people but I still
don't understand the 'lock' business.  


#168 of 286 by gelinas on Sat Sep 4 02:39:13 2004:

According to the man page for procmailrc, the format of a block is:

          :0 [flags] [ : [locallockfile] ]
          <zero or more conditions (one per line)>
          <exactly one action line>

In some cases, you do not have a newline immediately following the
second colon.  For example:

        :0: 
        * ^Received:.*zillion
        /dev/null

Has a couple of spaces at the end of the first line.  I don't know that the
spaces are significant, since I haven't tried to correlate the messages that
cause lock errors with specific blocks in your .procmailrc.  Neither have
I looked at every block to see if you have specified a lockfile somewhere.


#169 of 286 by keesan on Sat Sep 4 05:58:57 2004:

Thanks, I will delete spaces on a line after the :.  How did you find them?
Is there some way to view them with pico?
Can I put all the lines beginning with * ^ in between just a single
I have no idea how to specify a lockfile.


#170 of 286 by tpryan on Sat Sep 4 23:08:20 2004:

        I'm no expert, but my the man, as I read the notation, if a
locallockfile is used, the colon must precede it.  That structure is
optional, so the trailing colon should probably be removed, as it may
be thinking your locallockfile is named ' '.


#171 of 286 by cmcgee on Sat Sep 4 23:10:16 2004:

What's the deal with Grex being off the net?


#172 of 286 by krj on Sun Sep 5 00:56:50 2004:

Grex remains off the net.  Sigh.


#173 of 286 by gelinas on Sun Sep 5 01:54:13 2004:

Sindi, I don't know how to find lines that end in ": " in pico.  I used vi,
and searched for ": "

I don't know why grex is off the net; it is reachable from other machines
on the network in the Pumpkin, and those machines are reachable from the
Internet.  I tried rebooting and a few other things that didn't help.


#174 of 286 by keesan on Sun Sep 5 14:33:09 2004:

I searched with pico Ctrl-W for space-space and found a lot of places with
double spaces after the colon and deleted the spaces.  I think you are saying
that the spaces are being misinterpreted and that Tim is saying that I don't
need the second colon - is that true?  Joe, thanks for working on all our
problems.


#175 of 286 by bhoward on Sun Sep 5 16:50:13 2004:

Grex was never off the net but something broke within our ISP's
routing tables for a time cutting off direct access.  Seems to
have recovered in the last hour or so.


#176 of 286 by tpryan on Sun Sep 5 17:15:32 2004:

        If you are not sure you can subsitute space, multiples of, at the end
of a line with nothing:  s/ *$//    (?).


#177 of 286 by blaise on Sun Sep 5 18:27:33 2004:

You do need the trailing colons because your incoming mail file is in
mbox  format, so you need to prevent multiple processes from writing to
it at the same time.  Just make sure that there are no trailing spaces
after the colons.


#178 of 286 by twenex on Sun Sep 5 18:32:54 2004:

Procmail's syntax sounds as fascist as JCL.


#179 of 286 by janc on Sun Sep 5 21:46:27 2004:

As Bruce said, we had some down time probably because our ISP modified our
routing table to route Grex off into cloud cuckoo land.  After Joe and Walter
each spent some time poking at Grex to try to figure out why it wasn't on the
net, I began to suspect the ISP and phoned them.  They said they'd ask their
engineer to look into it, and after a while connectivity came back.


#180 of 286 by cmcgee on Mon Sep 6 01:31:23 2004:

Is there some reason that mail is still backed up? Someone just sent me an
email and it's been more than 30 mins and it hasn't arrived.  Do we have a
mail backlog or is there a problem at the sender's end?


#181 of 286 by keesan on Mon Sep 6 03:22:31 2004:

Thanks Joe - I was still getting locallockfile messages so I hunted for
colon-space and removed some of those.  But my spam filter missed this:


From apsmith@aps.org Sun Sep  5 23:19:23 2004
Date: Sun, 5 Sep 2004 21:09:22 -0400
From: apsmith@aps.org
To: keesan@cyberspace.org
Subject: nqmqpnpxdbe

Dear user keesan@cyberspace.org,

We have found that your account has been used to send a huge amount of junk
email messages during the recent week. We suspect that your computer had been
infected by a recent virus and now contains a trojaned proxy server.

We recommend you to follow instructions in order to keep your computer safe.

Best regards,
cyberspace.org technical support team.


  [Part 2, Application/OCTET-STREAM (Name: "INSTRUCTION.EXE")  39KB]
  [Unable to print this part]



-----
I wonder why the technical support team here could not find a more
spellable subject line and why they chose to send out the fix in a DOS
format.  


#182 of 286 by mcnally on Mon Sep 6 05:38:01 2004:

  Of course it's a trojan horse, and a really lame one at that.
  The author apparently couldn't even forge the headers to appear
  as though the mail was coming from the domain it claimed to
  speak for..


#183 of 286 by krj on Thu Sep 9 04:26:55 2004:

Party people are experiencing connection lockups and/or disconnects.
See the party log for the last hour or so for what meager evidence
there is.


#184 of 286 by keesan on Thu Sep 9 05:07:56 2004:

I got several long lockups while dialed into grex and telnetted elsewhere.
2 min.


#185 of 286 by bru on Thu Sep 9 05:43:19 2004:

Ssytem would not let me in thru telnet.  But would let me telnet from the
homepage.


#186 of 286 by mcnally on Thu Sep 9 16:43:49 2004:

 Grex was unreachable on my first attempt this morning and I just had a
 ~3 minute lag while entering this response.  Something is up with the DSL.


#187 of 286 by drew on Thu Sep 9 18:57:25 2004:

I'm in through dialup now. The internet connection does not answer, and there
are few people online.


#188 of 286 by aruba on Thu Sep 9 21:30:51 2004:

It looks like Grex is off the net.


#189 of 286 by albaugh on Thu Sep 9 23:25:41 2004:

I'm sorry, I certainly won't blame anyone, but this unreliability of grex is
becoming intolerable.  If things are going to continue like this indefinitely,
then that constitutes the beginning of the end...


#190 of 286 by i on Thu Sep 9 23:45:14 2004:

I just "rebooted" the DSL modem again after finding grex off the net, that
seems to have "fixed" things.  The "slowly goes bad" DSL connection issue
is an old issue, easily fixed with a few quick button pushes.

More bothersome to me is that i tried dialing in before coming down to
this-here pumpkin...and got the greeting from the terminal server, but
NOT the grex login prompt...leading to a preliminary mis-diagnosis of an
actual grex crash.  Has anyone else seen problems getting all the way to
the grex login prompt when grex really is alive?


#191 of 286 by cmcgee on Fri Sep 10 00:20:39 2004:

It just took 1.5 mins between the dial-in welcome screen and the login:
prompt.


#192 of 286 by aruba on Fri Sep 10 01:02:47 2004:

Right - I had the same experience as Colleen. I suspect the ssh daemon was
trying to do something (a reverse lookup?) which didn't work because the net
connection was down, and it had to time out before it would let me lot in.


#193 of 286 by gelinas on Fri Sep 10 01:41:01 2004:

hmm.... Load seems within normal limits.  But it did take a while to get
a login prompt using ssh.  Once I got the prompt, though, things seemed
to work fine.  traceroute looks normal, as does top.

I dunno.


#194 of 286 by bru on Fri Sep 10 08:13:36 2004:

just tried telnetting in and failed.


#195 of 286 by i on Fri Sep 10 09:36:33 2004:

Ditto #194, it took a while to get the login: prompt via dial-in.
Wish i knew how to restart telnetd...or enough to deduce what the
problem really is.


#196 of 286 by gelinas on Fri Sep 10 11:52:17 2004:

I put ssh into 'verbose' mode for this session.  The pause was between these
two lines:

]  debug1: identity file /Users/gelinas/.ssh/id_dsa type -1
]  debug1: Remote protocol version 1.5, remote software version 1.2.20

Not that I know what that means, right now; it's just a datum.


#197 of 286 by mfp on Fri Sep 10 13:43:50 2004:

You mean a piece of data.

We're not speaking Latin here.


#198 of 286 by aruba on Fri Sep 10 14:01:54 2004:

Still getting a long pause before the login prompt appears.


#199 of 286 by twenex on Fri Sep 10 14:36:36 2004:

The problem might be easy to fix, but it requires someone to be physically
present to fix it, not always possible with a volunteer system. What happened
to tod's kindly donated modem?


#200 of 286 by keesan on Fri Sep 10 15:51:33 2004:

Has anyone besides me been unable to prYnt with Pine on this revived grex?


#201 of 286 by tod on Fri Sep 10 16:07:51 2004:

re #190
Is the DSL modem I sent months ago being used yet or is Grex still on the
broken one?


#202 of 286 by tpryan on Fri Sep 10 16:32:35 2004:

re 201:
        I think it's holding up a wombly table.

        Same delay in dial-up connect and log-in prompt.


#203 of 286 by twenex on Fri Sep 10 16:39:29 2004:

A *wombly* table?

"Underground, overground, Wombling free, the Wombles of Wimbledon Common are
we..."

Y'all are SO missing a trick if you've no idea what I'm on about!


#204 of 286 by tod on Fri Sep 10 16:43:57 2004:

re #202
I'm willing to do what I can to unscrew Grex's problem from a hardware
standpoint but staff needs to meet me halfway by installing it.


#205 of 286 by twenex on Fri Sep 10 16:57:17 2004:

LOL. Good point.


#206 of 286 by mfp on Fri Sep 10 20:24:49 2004:

Point.


#207 of 286 by happyboy on Fri Sep 10 23:45:43 2004:

PINT


*hic*


#208 of 286 by keesan on Sat Sep 11 04:46:48 2004:

Pine printing works now, in DOS.  (I was in linux before and was not set up
for printing - need to load a module first and who knows what else).


#209 of 286 by keesan on Sat Sep 11 20:53:22 2004:

Grex (today) is taking a very long time to connect when I dial in.


#210 of 286 by krj on Sun Sep 12 05:49:54 2004:

Connections via telnet have also seemed slower than usual, though
I have not attempted to measure the time.


#211 of 286 by mcnally on Mon Sep 13 05:41:06 2004:

 I am still experiencing an extended wait between the time that I start
 my ssh client (PuTTY) and the time I receive a login prompt from Grex.
 Performance is more or less normal once I get logged in but it takes 
 quite a while for the login prompt to show up.  While connecting to my
 present session I timed it and it took 79 seconds before I got a prompt.

 Any ideas?


#212 of 286 by rcurl on Mon Sep 13 06:14:20 2004:

I just opened another ssh1 terminal window to Grex, and it also took 79
seconds for the login prompt to appear. 


#213 of 286 by jor on Mon Sep 13 12:07:55 2004:

        similar delay with dialin. 


#214 of 286 by davel on Mon Sep 13 12:57:02 2004:

and with telnet.


#215 of 286 by aruba on Mon Sep 13 13:03:34 2004:

Yup, I get the same thing.


#216 of 286 by tod on Mon Sep 13 13:52:43 2004:

telnet takes 2 min


#217 of 286 by gull on Mon Sep 13 16:02:18 2004:

Staff seems to have given up on commenting in this item.  Maybe someone
will have to go prod them in coop or garage to find out what's actually
going on?


#218 of 286 by twenex on Mon Sep 13 16:07:58 2004:

prod them in coop[, or in] garage... or in person.


#219 of 286 by mfp on Mon Sep 13 16:18:07 2004:

Point.


#220 of 286 by cmcgee on Mon Sep 13 20:28:53 2004:

Very consistent 78-79 second delay between welcome to Grex and login prompt
when I dial in.


#221 of 286 by naftee on Mon Sep 13 21:00:41 2004:

Yeah, I was using m-net and forgot about the GreX window for a secONde.


#222 of 286 by gull on Mon Sep 13 23:17:28 2004:

Sounds suspiciously like it's waiting for a dead nameserver to time out.


#223 of 286 by krokus on Tue Sep 14 02:06:33 2004:

re 217
Despit the staff being staff, Grex isn't their entire life.  I'm sure
they'll comment when they have a few spare minutes.


#224 of 286 by gelinas on Tue Sep 14 03:47:48 2004:

Or when we know something worth saying.


#225 of 286 by gelinas on Tue Sep 14 03:59:02 2004:

Based on comments above, I took a look at /etc/resolv.conf, commented
out the loop-back address and restarted named.  Now to see if it actually
makes a difference.


#226 of 286 by marcvh on Tue Sep 14 05:04:24 2004:

So far it does not seem to make a difference.


#227 of 286 by mcnally on Tue Sep 14 06:50:18 2004:

 re #222: 
 > Sounds suspiciously like it's waiting for a dead nameserver to time out.

 A not unreasonable guess but that doesn't seem to be the problem.
 Assuming a reasonable caching policy on Grex it wouldn't have fit
 the pattern anyway, as it takes a long time to get the login prompt
 every time one tries, not just the first time (after which, presumably
 the value would be stored in the local cache.)

 re #225:
 > Based on comments above, I took a look at /etc/resolv.conf, commented
 > out the loop-back address and restarted named.  Now to see if it actually
 > makes a difference.

 Doesn't seem to have..  (And I doubt that restarting named was necessary.)


#228 of 286 by dpc on Tue Sep 14 14:05:05 2004:

It took a minute and 19 seconds to get a prompt just now.  I really
hope this situation can be fixed soon.


#229 of 286 by aruba on Tue Sep 14 14:29:13 2004:

Still a long pause for me.


#230 of 286 by gregb on Tue Sep 14 17:07:26 2004:

This is one time I'm glad I use Backtalk.


#231 of 286 by marcvh on Tue Sep 14 17:13:02 2004:

Re #227, caching failed lookups is not universal.  Some systems only
cache success, not failure.


#232 of 286 by jor on Tue Sep 14 23:19:46 2004:

        I just measured the pause at about 78 seconds.
        The temperature is 74. Winds are calm and
        my disposition is sunny.


#233 of 286 by tod on Tue Sep 14 23:30:24 2004:

Partly cloudy
hayz buddhist messiah thetan


#234 of 286 by mcnally on Wed Sep 15 07:55:27 2004:

  re #231:  I was assuming a resolv.conf with more than one nameserver
  listed (which I think is how we started) and also assuming only one
  broken nameserver such that the resolver query would block until the
  first (broken) nameserver timed out and then succeed upon querying 
  the second server.

  I suppose it still could be a nameserver issue but it seems to be 
  something else.  I wonder what's special about 78-79 seconds?  Several
  people have measured the same delay time at this point (and one or two
  report longer times.)


#235 of 286 by jor on Wed Sep 15 12:36:05 2004:

        But I get the same delay via dialin.


#236 of 286 by davel on Wed Sep 15 13:16:11 2004:

Dialin is really a telnet connection, from the termserver.


#237 of 286 by naftee on Wed Sep 15 16:39:35 2004:

i just aspirated on some apple juice :(


#238 of 286 by mcnally on Wed Sep 15 23:41:58 2004:

The mysterious 79 second delay also seems to affect outgoing mail.
For instance, if I start a clock ticking and then send mail to 
my Gmail account it hangs for about 79 seconds (there's that number
again!) before connecting:

grex% ./tick.csh; sleep 2; echo "starting mail test now"; date \
      | Mail -v -s "test mail" mmcnally@gmailDOTcom
[1] 17068
Wed Sep 15 19:33:02 EDT 2004
starting mail test now
Wed Sep 15 19:33:04 EDT 2004
mmcnally@gmail.com... Connecting to gsmtp171.google.com. (smtp)...
Wed Sep 15 19:33:10 EDT 2004
Wed Sep 15 19:33:13 EDT 2004
Wed Sep 15 19:33:16 EDT 2004
Wed Sep 15 19:33:18 EDT 2004
Wed Sep 15 19:33:20 EDT 2004
Wed Sep 15 19:33:21 EDT 2004
Wed Sep 15 19:33:22 EDT 2004
Wed Sep 15 19:33:24 EDT 2004
Wed Sep 15 19:33:25 EDT 2004
Wed Sep 15 19:33:26 EDT 2004
Wed Sep 15 19:33:28 EDT 2004
Wed Sep 15 19:33:29 EDT 2004
Wed Sep 15 19:33:30 EDT 2004
Wed Sep 15 19:33:32 EDT 2004
Wed Sep 15 19:33:33 EDT 2004
Wed Sep 15 19:33:34 EDT 2004
Wed Sep 15 19:33:36 EDT 2004
Wed Sep 15 19:33:37 EDT 2004
Wed Sep 15 19:33:39 EDT 2004
Wed Sep 15 19:33:41 EDT 2004
Wed Sep 15 19:33:44 EDT 2004
Wed Sep 15 19:33:45 EDT 2004
Wed Sep 15 19:33:47 EDT 2004
Wed Sep 15 19:33:48 EDT 2004
Wed Sep 15 19:33:50 EDT 2004
Wed Sep 15 19:33:51 EDT 2004
Wed Sep 15 19:33:53 EDT 2004
Wed Sep 15 19:33:55 EDT 2004
Wed Sep 15 19:33:57 EDT 2004
Wed Sep 15 19:33:58 EDT 2004
Wed Sep 15 19:34:00 EDT 2004
Wed Sep 15 19:34:01 EDT 2004
Wed Sep 15 19:34:02 EDT 2004
Wed Sep 15 19:34:04 EDT 2004
Wed Sep 15 19:34:06 EDT 2004
Wed Sep 15 19:34:07 EDT 2004
Wed Sep 15 19:34:09 EDT 2004
Wed Sep 15 19:34:10 EDT 2004
Wed Sep 15 19:34:12 EDT 2004
Wed Sep 15 19:34:14 EDT 2004
Wed Sep 15 19:34:15 EDT 2004
Wed Sep 15 19:34:17 EDT 2004
Wed Sep 15 19:34:18 EDT 2004
Wed Sep 15 19:34:19 EDT 2004
Wed Sep 15 19:34:21 EDT 2004
Wed Sep 15 19:34:22 EDT 2004
220 mx.gmail.com ESMTP 80si57756rnb
>>> HELO grex.cyberspace.org
250 mx.gmail.com at your service
>>> MAIL From:<mcnally@cyberspace.org>
250 OK
>>> RCPT To:<mmcnally@gmail.com>
250 OK
>>> DATA
354 Go ahead
>>> .
Wed Sep 15 19:34:24 EDT 2004
250 OK 1095291185
mmcnally@gmail.com... Sent (OK 1095291185)
Closing connection to gsmtp171.google.com.
>>> QUIT
221 mx.gmail.com closing connection
grex%
Wed Sep 15 19:34:25 EDT 2004


#239 of 286 by gelinas on Thu Sep 16 03:03:17 2004:

Odd.  The same thing happens with traceroute, to an IP address.


#240 of 286 by marcvh on Thu Sep 16 04:00:00 2004:

How about traceroute -n to an IP address?


#241 of 286 by gelinas on Thu Sep 16 04:29:28 2004:

Same delay.  FWIW, the delay does NOT appear between hops, just on the
start-up.


#242 of 286 by mcnally on Thu Sep 16 06:36:00 2004:

 Are there any outgoing IP applications that are not effected?


#243 of 286 by remmers on Thu Sep 16 12:18:34 2004:

Outgoing telnet isn't affected.


#244 of 286 by mcnally on Thu Sep 16 17:00:42 2004:

  Hmmm..  
  What do the affected programs have in common?
  What is the significance of the common ~79 second delay?
  What changed during the reconstruction of the boot drive
    that might explain this?

  Any ideas, anyone?


  It really does not appear to be name-service related, as both
  forward and reverse lookups seem to finish in normal amounts
  of time using nslookup, which presumably uses the same underlying
  resolver calls.

  What else might have some timeout that's taking place?


#245 of 286 by naftee on Thu Sep 16 19:30:26 2004:

It's faster to use fronttalk from M-net than to try to SSH into GreX and use
BBS.


#246 of 286 by drew on Sat Sep 18 04:59:00 2004:

Ftp keeps timing out.

psftp gets me just past the Password prompt, then it says something to the
effect of sftp-server: command not found, could not connect.


#247 of 286 by krj on Sun Sep 19 04:31:21 2004:

While using party tonight, I kept getting disconnected; eventually the 
network connection appeared to go down solidly.  I dialed in direct
to enter this.  (It takes about 79 (?) seconds to log in on a direct
dial, too.)


#248 of 286 by charcat on Sun Sep 19 06:33:44 2004:




#249 of 286 by krj on Sun Sep 19 17:14:47 2004:

My telnet connections continue to be unstable, though I was able
to connect long enough to enter this...


#250 of 286 by marcvh on Sun Sep 19 18:51:02 2004:

My telnet is repeatedly being disconnected also.


#251 of 286 by mcnally on Sun Sep 19 22:03:17 2004:

  I was disconnected last night and unable to log in this morning.


#252 of 286 by xpandora on Sun Sep 19 23:21:09 2004:

I get a connection attempt to (TCP Port 6346) from 142.163.11.15 
whenever I try to log on.It also takes around 70-80 seconds for a 
prompt.


#253 of 286 by xpandora on Sun Sep 19 23:23:59 2004:

got the same connection attempt just as i hit the submit button on my 
previous post.


#254 of 286 by krj on Mon Sep 20 02:43:11 2004:

Connectivity has been exceptionally unstable for the last couple of 
hours.  Type five lines in party at mosst, and then log in again....


#255 of 286 by mcnally on Mon Sep 20 04:09:48 2004:

  re #252, 253:  That's very odd, but probably a coincidence..
  TCP port 6346 is used for the P2P file-sharing program gnutella
  and "!dig ptr 15.11.163.142.in-addr.arpa" shows the host that's
  trying to connect belongs to a Canadian ISP, sympatico.ca, probably
  a user in Newfoundland.

  > 15.11.163.142.in-addr.arpa.     86354   PTR     stj515.nf.sympatico.ca.

  I'll check my firewall logs tomorrow to see whether I'm getting the
  same behavior but I suspect I won't be.


#256 of 286 by tod on Mon Sep 20 18:10:27 2004:

These sporadic lag sessions that last more than a minute every 4 minutes SUCK.
People keep getting booted off.


#257 of 286 by keesan on Mon Sep 20 18:18:02 2004:

I got 2 minute lags three times while telnetted to another shell account and
writing a single mail, not a terribly long one.  More time down than up.
Telnetted from grex, I mean.  I can't telnet there directly from DOS, it does
not like my DOS telnet program.


#258 of 286 by aruba on Mon Sep 20 20:34:15 2004:

I got a lot of long pauses today too.


#259 of 286 by cmcgee on Tue Sep 21 01:04:43 2004:

I got lagged completely off three times today, while dialed in, and was unable
to complete a short email.

Then, just now, it took more than 3 minutes to get from the welcome screen
to the login while dialing in.


#260 of 286 by gull on Tue Sep 21 03:03:58 2004:

Grex has gotten so unreliable in the last few months that, for the first
time since I created my account nine years ago, I've given up on using
it for email.


#261 of 286 by albaugh on Tue Sep 21 04:07:36 2004:

I'll give things the 4Q to clear up...


#262 of 286 by rcurl on Tue Sep 21 05:38:23 2004:

I've nearly done the same thing myself. It is not only the spam and the
downtime, but also that there have been no upgrades on handling file sizes
and types, while the rest of the online world is bloating. 


#263 of 286 by i on Tue Sep 21 12:22:39 2004:

I just power cycled the DSL modem.  Hopefully that'll cure it for a while.


#264 of 286 by dpc on Tue Sep 21 13:18:00 2004:

What Rane said.


#265 of 286 by bru on Tue Sep 21 13:32:36 2004:

I tried dialing in today and got an error.


#266 of 286 by keesan on Tue Sep 21 16:18:50 2004:

Someone that I send an email to complained they never got it.  I sent it again
today directly, and also forwarded via another shell account.  They just wrote
back that they got the SECOND mail.  Is grex on Spamcop's blacklist again?

I got connected dialin after the usual wait.  I tend to go wash dishes.

I hope that the new Grex has a version of Pine that can handle HTML (with
lynx).  A  friend of mine using grex for email has an ISP account specifically
to forward her html emails to because she cannot figure out how to deal with
them here.  It expires in November.  


#267 of 286 by aruba on Tue Sep 21 17:16:42 2004:

Thanks, Walter, for cycling the modem.


#268 of 286 by mary on Tue Sep 21 22:34:14 2004:

I'm not sure if this was announced somewhere or not, but folks should be
aware that if they have any files in their directories they'd mind
losing, they should get them off Grex or at least have a safe backup
stored elsewhere.  The partition with user directories hasn't been
successfully backed up since sometime in 2003.  And STeve thinks this
might be our next disk to fail. 

If I got that wrong, someone with more information please set the
record straight.  


#269 of 286 by slynne on Wed Sep 22 02:07:08 2004:

Thanks for the reminder, Mary


#270 of 286 by tod on Wed Sep 22 17:12:06 2004:

Would someone on staff please respond to item 196 in the coop cf? Thanks!


#271 of 286 by albaugh on Wed Sep 22 18:03:45 2004:

Are there any prospects for user partition backups any time soon?


#272 of 286 by gull on Wed Sep 22 19:36:36 2004:

I think Grex's policy has always been that the user partition is not
backed up on any regular basis.  As long as I've been here there have
been warnings not to store anything too important there.

It doesn't take too long to tar up all your files and then use FTP or
SCP to copy them to your home machine.


#273 of 286 by rcurl on Wed Sep 22 19:44:56 2004:

I'd like some help in doing that. I can do it file by file, but can I just
compress and download my whole directory, and open it as needed on my
home Mac (OS X)? If so, how (e-mail me, if it convenient). 


#274 of 286 by gull on Wed Sep 22 20:01:08 2004:

I'll post it here, so that other people can benefit, too.

To get all your files into one tarball:

Change to your home directory and type 
   tar cvf mystuff.tar .

If you then want to compress it, type
   gzip mystuff.tar
and you'll end up with a file called mystuff.tar.gz.

I'm not sure what the best utility to open this on OS X would be, though
I bet you can extract it in a terminal with
   tar xvzf mystuff.tar.gz
but if you want a GUI solution, something like StuffIt may work.


#275 of 286 by gull on Wed Sep 22 20:02:05 2004:

(Incidentally, I should probably point out that there's a period on the
end of that first command line.  Don't forget it; it's there to indicate
you want everything in the current directory.)


#276 of 286 by keesan on Wed Sep 22 20:13:35 2004:

How about tar zcvf to produce .tar.gz ?
When I telnet here with Kermit it gives me some helpful hints about what is
taking so long:  The Telnet server is not sending required responses:

WILL TERMINAL-TYPE
WILL NAWS
WILL NEW-ENVIRONMENT
WILL COM-PORT-CONTROL

Does this mean there is a software problem at grex?


#277 of 286 by mcnally on Wed Sep 22 20:48:08 2004:

 re #274: 

 > Change to your home directory and type 
 >  tar cvf mystuff.tar .

 What happens when the tar command tries to add ./mystuff.tar to 
 the archive it's building in ./mystuff.tar?

 Another problem (well, not really a *problem*, but potential
 complication is that when files are extracted from the mystuff.tar
 file they'll be dumped unceremoniously into the directory where
 the tar extract command is run.  You ought to either tell the
 user to extract in a fresh directory or else back up a level before
 building the tar file, e.g.

    cd ~/..; tar cvzf /tmp/grex_homedir_$USER.tar.gz $USER

 (in most shells, anyway..)


#278 of 286 by mcnally on Wed Sep 22 20:50:54 2004:

  Also, while tar is the most convenient archive utility for most
  Unix users, PC and Mac afficionados may find zip to be more useful
  for them (Mac users can unpack zip archives using UnStuffIt; I'm
  not sure how well it handles gzipped or bzipped tar files.)


#279 of 286 by tod on Wed Sep 22 20:52:46 2004:

tar is nicer because it retains the perms info


#280 of 286 by mcnally on Wed Sep 22 21:36:49 2004:

  I agree that tar is the way to go if you're going to be extracting
  under Unix or a Unix-like system..


#281 of 286 by mcnally on Wed Sep 22 21:39:42 2004:

re #276:

> When I telnet here with Kermit it gives me some helpful hints about what is
> taking so long:  The Telnet server is not sending required responses:
> 
> WILL TERMINAL-TYPE
> WILL NAWS
> WILL NEW-ENVIRONMENT
> WILL COM-PORT-CONTROL

  Based on other investigation I doubt it's a telnet- or telnet-and-ssh-
  specific problem.  The same 79 second delay seems to affect outgoing
  SMTP connections, too, for example.

> Does this mean there is a software problem at grex?

  It's definitely a configuration problem or software error of some sort.


#282 of 286 by tod on Wed Sep 22 21:50:33 2004:

Something in inetd perhaps?


#283 of 286 by gelinas on Thu Sep 23 02:43:02 2004:

No promises, but it looks like I'll have some time on Friday to attempt
a back-up of grex.  I hope folks can stand losing it for a day during
the week.  ;/


#284 of 286 by rcurl on Thu Sep 23 06:43:52 2004:

From my reading of recent responses, there seems to be some questions, or
suggestions, inre what gull said in #274. What should I take as the
current consensus inre my ? in #273? 



#285 of 286 by gull on Thu Sep 23 13:13:00 2004:

Re resp:277: The version of tar that Grex has is smart enough not to add
the archive it's creating to itself...as long as you *don't* use the z
flag.  That's why I didn't use it as was suggested in resp:276.  I
thought about telling people to create the tar file in the /tmp
directory, but I didn't for two reasons:  There's not much room in /tmp,
and people would inevitably leave the file lying around there where
anyone can read it.  I agree with your other comments.

Re resp:284: Any of the suggestions that have been given will work. 
There's More Than One Way To Do It(tm).


#286 of 286 by naftee on Fri Sep 24 06:24:42 2004:

Fronttalk from m-net works well


There are no more items selected.

You have several choices: