You are not logged in. Login Now
 0-11          
 
Author Message
remmers
An Exercise in URL Rewriting Mark Unseen   Mar 8 17:49 UTC 2007

The mod_rewrite module of the Apache webserver can be used to translate 
a requested URL to a different one.  This is useful for example if a 
page has been moved to a new location, but you still want the old URL to 
work.

If the webserver has been configured to allow it, this facility is 
available to users on a per-directory basis.  You create a file 
named .htaccess in the directory in which you want URL translations to 
apply and put some directives in the file that specify how the 
translation should be done.

As an exercise for myself in writing .htaccess files, I've implemented a 
simplified URL scheme for read-only access to Grex conferences, items, 
and responses.  It works as follows:

  http://jremmers.org/grex/bbs               -list of all conferences
  http://jremmers.org/grex/bbs/CONF          -index of conference CONF
  http://jremmers.org/grex/bbs/CONF/ITEM     -content of an item
  http://jremmers.org/grex/bbs/CONF/ITEM/SEL -selected part of an item

Examples:

  http://jremmers.org/grex/bbs/kitchen     -index of kitchen cf.
  http://jremmers.org/grex/bbs/web/5       -item 5 of web cf.
  http://jremmers.org/grex/bbs/web/5/2     -resp 2 of item 5 of web cf.
  http://jremmers.org/grex/bbs/web/5/1-4   -resps 1-4 of that item

Note that even though the domain given in the URLs is my website, no 
Grex conference content is actually stored there.

Feel free to play around with this.  I'll explain how I did it in a 
subsequent response.
11 responses total.
other
response 1 of 11: Mark Unseen   Mar 9 13:21 UTC 2007

What does that do to web logs?  Is the rewritten URL logged as referrer?
remmers
response 2 of 11: Mark Unseen   Mar 9 21:08 UTC 2007

Hm, dunno.  Anybody?
remmers
response 3 of 11: Mark Unseen   Mar 9 21:59 UTC 2007

I did some reading up on and experimentation with the HTTP referer 
header.  Typically it's sent by a browser when you follow a link; its 
value is the URL of the page on which the link occurs.  It's a reverse 
link from the target of the original link back to the source.

If you're reading this in Backtalk, you can see the value of the referer 
header by clicking on this link:  http://c2.com/cgi/test/
You'll get a display of the list of HTTP headers that your browser sent 
to the server at c2.com.  Unless your browser is configured not to send 
referer headers, one of the headers will be HTTP_REFERER; its value is 
the URL of the Backtalk page on which the link occurs.

On the other hand, if you go to a URL by simply typing it into your 
browser address window, your browser shouldn't send a referer header.  
You can try this out with c2.com too.

I think all that is completely independent of any rewriting that 
mod_rewrite does, though, since the referer header is sent by the 
browser before any rewriting on the server takes place.

(My starting point for the above was looking at the "HTTP referer" 
article in Wikipedia (http://en.wikipedia.org/wiki/HTTP_Referer).  The 
article points out that the correct spelling is "referrer" and that 
whoever made up the HTTP header name misspelled it.)
remmers
response 4 of 11: Mark Unseen   Mar 9 22:15 UTC 2007

Hm... In testing the link in resp:3, it seems that no referer header is 
sent, at least by my browser (Safari).  However, I made a page
http://grex.org/~remmers/referer.html that links to c2.com/cgi/test/;
when I click on *that* link, Safari sends the expected referer header.

Same results with Firefox.

Not sure what's going on.  If I click on the link in resp:3 from my usual 
Backtalk interface (pistachio), no referer is sent.  However, it is from 
the vanilla interface.
cmcgee
response 5 of 11: Mark Unseen   Mar 10 13:42 UTC 2007

John, I'm reading, even if it's mostly over my head.  Thank you for musing
outloud about this stuff.
remmers
response 6 of 11: Mark Unseen   Mar 10 14:05 UTC 2007

You're welcome.

Okay, a little more investigation seems to indicate that a referer is sent 
if you're using Backtalk in readonly mode and not if you're using it as an 
authenticated user.  Maybe it's a security feature.
remmers
response 7 of 11: Mark Unseen   Mar 10 15:05 UTC 2007

Getting back to the original topic, here are the details on how the URL 
rewriting is done.  In the root web directory of my website, I created a 
directory called "grex" and a subdirectory of that called "bbs" (which 
you can see via the link http://jremmers.org/grex/).

The only file in the bbs directory is a .htaccess file that specifies 
how anything following "grex/" is translated.  (The line numbers are 
supplied for ease of reference and aren't actually part of the file.  
Also, for readability I've done some line wrapping; each number 
corresponds to one line of the file.  You can see the actual .htaccess 
file at http://jremmers.org/htaccess-example.txt)

-----------------------------------------------------------------------
1. RewriteEngine 
   on
2. RewriteRule 
   ^/*$ 
   http://grex.org/cgi-bin/backtalk/vanilla/conflist
3. RewriteRule 
   ^([^/]+)/*$ 
   http://grex.org/cgi-bin/backtalk/vanilla/browse?conf=$1
4. RewriteRule 
   ^([^/]+)/+([0-9]+)/*$ 
   http://grex.org/cgi-bin/backtalk/vanilla/read?conf=$1&item=
   $2&rsel=all
5. RewriteRule ^([^/]+)/+([0-9]+)/([^/]+)/*$ http://grex.org/cgi-bin
   /backtalk/vanilla/read?conf=$1&item=$2&rsel=$3
----------------------------------------------------------------------
Line 1 tells Apache to pay attention to the rewriting rules.

The next 4 lines are rewriting rules, each of the form

  RewriteRule PATTERN REPLACEMENT

The PATTERN is a "regular expression" that specifies the form of what is 
to be replaced.  The REPLACEMENT is what to replace anything with that 
matches the pattern.  I won't attempt to explain regular expressions in 
general here (see Google or Wikipedia), but for example the regular 
expression "^([^/]+)/*$" matches any string of one or more non-slash 
characters, followed by 0 or more slashes.  The parentheses around 
"[^/]+" tells the rewrite engine to store the string of non-slashes in a 
variable named "$1" which can then be referenced in the replacement.

For example, in the URL "http://jremmers.org/grex/bbs/coop", the string 
"coop" matches the string-of-non-slashes expression "[^/]+".  Since the 
latter is parenthesized, it's stored as "$1" and then dumped into the 
corresponding replacement.  The rewritten URL is thus

  http://grex.org/cgi-bin/backtalk/vanilla/browse?conf=coop

which is a standard Backtalk URL for generating the index to a 
conference.
remmers
response 8 of 11: Mark Unseen   Mar 10 15:16 UTC 2007

Late-breaking news:  I created a .htaccess file in the above-mentioned 
"grex" directory that makes the browsing scheme a little more 
hierarchical; the URL "http://jremmers.org/grex" takes you to the Grex 
homepage.

Exercise for the technically inclined:  What .htaccess file would achieve 
this effect?
fuzzball
response 9 of 11: Mark Unseen   Mar 23 15:51 UTC 2007

very nice john....
wouldent have thought of doing this.....
:)
remmers
response 10 of 11: Mark Unseen   Mar 24 13:05 UTC 2007

What led me to do this is some recent reading about "well-designed URLs". 
This got me to thinking about what a simple, clean, human-friendly URL 
scheme for bbs items and responses might look like.

For some good ideas on the issue of well-designed URLs, see Mike 
Schinkel's post 
  http://www.mikeschinkel.com/blog/welldesignedurlsarebeautiful/
and the various references he gives.
madmike
response 11 of 11: Mark Unseen   Sep 26 16:04 UTC 2008

Very good stuff. Thanks for the link to mikeschinkel.com. I will have 
to mess around with this. 
Isn't .htaccess strictly an Apache thing, or is it also supported by 
the Windows server platforms? 
 0-11          
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss