|
Grex > Web > #8: An Exercise in URL Rewriting | |
|
| Author |
Message |
remmers
|
|
An Exercise in URL Rewriting
|
Mar 8 17:49 UTC 2007 |
The mod_rewrite module of the Apache webserver can be used to translate
a requested URL to a different one. This is useful for example if a
page has been moved to a new location, but you still want the old URL to
work.
If the webserver has been configured to allow it, this facility is
available to users on a per-directory basis. You create a file
named .htaccess in the directory in which you want URL translations to
apply and put some directives in the file that specify how the
translation should be done.
As an exercise for myself in writing .htaccess files, I've implemented a
simplified URL scheme for read-only access to Grex conferences, items,
and responses. It works as follows:
http://jremmers.org/grex/bbs -list of all conferences
http://jremmers.org/grex/bbs/CONF -index of conference CONF
http://jremmers.org/grex/bbs/CONF/ITEM -content of an item
http://jremmers.org/grex/bbs/CONF/ITEM/SEL -selected part of an item
Examples:
http://jremmers.org/grex/bbs/kitchen -index of kitchen cf.
http://jremmers.org/grex/bbs/web/5 -item 5 of web cf.
http://jremmers.org/grex/bbs/web/5/2 -resp 2 of item 5 of web cf.
http://jremmers.org/grex/bbs/web/5/1-4 -resps 1-4 of that item
Note that even though the domain given in the URLs is my website, no
Grex conference content is actually stored there.
Feel free to play around with this. I'll explain how I did it in a
subsequent response.
|
| 11 responses total. |
other
|
|
response 1 of 11:
|
Mar 9 13:21 UTC 2007 |
What does that do to web logs? Is the rewritten URL logged as referrer?
|
remmers
|
|
response 2 of 11:
|
Mar 9 21:08 UTC 2007 |
Hm, dunno. Anybody?
|
remmers
|
|
response 3 of 11:
|
Mar 9 21:59 UTC 2007 |
I did some reading up on and experimentation with the HTTP referer
header. Typically it's sent by a browser when you follow a link; its
value is the URL of the page on which the link occurs. It's a reverse
link from the target of the original link back to the source.
If you're reading this in Backtalk, you can see the value of the referer
header by clicking on this link: http://c2.com/cgi/test/
You'll get a display of the list of HTTP headers that your browser sent
to the server at c2.com. Unless your browser is configured not to send
referer headers, one of the headers will be HTTP_REFERER; its value is
the URL of the Backtalk page on which the link occurs.
On the other hand, if you go to a URL by simply typing it into your
browser address window, your browser shouldn't send a referer header.
You can try this out with c2.com too.
I think all that is completely independent of any rewriting that
mod_rewrite does, though, since the referer header is sent by the
browser before any rewriting on the server takes place.
(My starting point for the above was looking at the "HTTP referer"
article in Wikipedia (http://en.wikipedia.org/wiki/HTTP_Referer). The
article points out that the correct spelling is "referrer" and that
whoever made up the HTTP header name misspelled it.)
|
remmers
|
|
response 4 of 11:
|
Mar 9 22:15 UTC 2007 |
Hm... In testing the link in resp:3, it seems that no referer header is
sent, at least by my browser (Safari). However, I made a page
http://grex.org/~remmers/referer.html that links to c2.com/cgi/test/;
when I click on *that* link, Safari sends the expected referer header.
Same results with Firefox.
Not sure what's going on. If I click on the link in resp:3 from my usual
Backtalk interface (pistachio), no referer is sent. However, it is from
the vanilla interface.
|
cmcgee
|
|
response 5 of 11:
|
Mar 10 13:42 UTC 2007 |
John, I'm reading, even if it's mostly over my head. Thank you for musing
outloud about this stuff.
|
remmers
|
|
response 6 of 11:
|
Mar 10 14:05 UTC 2007 |
You're welcome.
Okay, a little more investigation seems to indicate that a referer is sent
if you're using Backtalk in readonly mode and not if you're using it as an
authenticated user. Maybe it's a security feature.
|
remmers
|
|
response 7 of 11:
|
Mar 10 15:05 UTC 2007 |
Getting back to the original topic, here are the details on how the URL
rewriting is done. In the root web directory of my website, I created a
directory called "grex" and a subdirectory of that called "bbs" (which
you can see via the link http://jremmers.org/grex/).
The only file in the bbs directory is a .htaccess file that specifies
how anything following "grex/" is translated. (The line numbers are
supplied for ease of reference and aren't actually part of the file.
Also, for readability I've done some line wrapping; each number
corresponds to one line of the file. You can see the actual .htaccess
file at http://jremmers.org/htaccess-example.txt)
-----------------------------------------------------------------------
1. RewriteEngine
on
2. RewriteRule
^/*$
http://grex.org/cgi-bin/backtalk/vanilla/conflist
3. RewriteRule
^([^/]+)/*$
http://grex.org/cgi-bin/backtalk/vanilla/browse?conf=$1
4. RewriteRule
^([^/]+)/+([0-9]+)/*$
http://grex.org/cgi-bin/backtalk/vanilla/read?conf=$1&item=
$2&rsel=all
5. RewriteRule ^([^/]+)/+([0-9]+)/([^/]+)/*$ http://grex.org/cgi-bin
/backtalk/vanilla/read?conf=$1&item=$2&rsel=$3
----------------------------------------------------------------------
Line 1 tells Apache to pay attention to the rewriting rules.
The next 4 lines are rewriting rules, each of the form
RewriteRule PATTERN REPLACEMENT
The PATTERN is a "regular expression" that specifies the form of what is
to be replaced. The REPLACEMENT is what to replace anything with that
matches the pattern. I won't attempt to explain regular expressions in
general here (see Google or Wikipedia), but for example the regular
expression "^([^/]+)/*$" matches any string of one or more non-slash
characters, followed by 0 or more slashes. The parentheses around
"[^/]+" tells the rewrite engine to store the string of non-slashes in a
variable named "$1" which can then be referenced in the replacement.
For example, in the URL "http://jremmers.org/grex/bbs/coop", the string
"coop" matches the string-of-non-slashes expression "[^/]+". Since the
latter is parenthesized, it's stored as "$1" and then dumped into the
corresponding replacement. The rewritten URL is thus
http://grex.org/cgi-bin/backtalk/vanilla/browse?conf=coop
which is a standard Backtalk URL for generating the index to a
conference.
|
remmers
|
|
response 8 of 11:
|
Mar 10 15:16 UTC 2007 |
Late-breaking news: I created a .htaccess file in the above-mentioned
"grex" directory that makes the browsing scheme a little more
hierarchical; the URL "http://jremmers.org/grex" takes you to the Grex
homepage.
Exercise for the technically inclined: What .htaccess file would achieve
this effect?
|
fuzzball
|
|
response 9 of 11:
|
Mar 23 15:51 UTC 2007 |
very nice john....
wouldent have thought of doing this.....
:)
|
remmers
|
|
response 10 of 11:
|
Mar 24 13:05 UTC 2007 |
What led me to do this is some recent reading about "well-designed URLs".
This got me to thinking about what a simple, clean, human-friendly URL
scheme for bbs items and responses might look like.
For some good ideas on the issue of well-designed URLs, see Mike
Schinkel's post
http://www.mikeschinkel.com/blog/welldesignedurlsarebeautiful/
and the various references he gives.
|
madmike
|
|
response 11 of 11:
|
Sep 26 16:04 UTC 2008 |
Very good stuff. Thanks for the link to mikeschinkel.com. I will have
to mess around with this.
Isn't .htaccess strictly an Apache thing, or is it also supported by
the Windows server platforms?
|