|
|
The first generation of the web was concerned mainly with presenting
data in ways convenient for human beings to read, understand, and
interact with. But Tim Berners-Lee's original vision included a
"semantic web" in which not only human beings, but also software
agents, could extract, share, and repurpose information in useful
ways. In other words, the underlying semantics of data on the web
should be discoverable and manipulable by machines as well as by
people.
HTML (possibly augmented by CSS and JavaScript) is very good at
presenting information for human consumption, but the flexibility
allowed by HTML frustrates automated extraction of semantics.
Simple example: Suppose several different sites rate restaurants. One
site does it this way, using a simple paragraph structure:
<p>The address of Cafe Marie is 1759 Plymouth Road, Ann Arbor,
Michigan, 48105. Phone 734-662-2272.</p>
<p>Very good; I rate it 3 stars.</p>
A second site uses a list and some explicit line breaks for more
structured formatting:
<ul>
<li>Name: Cafe Marie</li>
<li>Location:<br />
1759 Plymouth Road<br />
Ann Arbor<br />
Michigan 48105
</li>
<li>Telephone: 734-662-2272</li>
<li>Rating: 4 stars (excellent)</li>
</ul>
A third site has yet another format:
<p>Cafe Marie, 1759 Plymouth Road, Ann Arbor, Michigan 48105.
734-662-2272.
As if you should care; I think it's lousy.
<strong>(1 star)</strong>.</p>
Same restaurant, different ratings. Now, suppose you wanted to write
a web client that collects restaurant reviews from a number of
different sites and integrates them somehow, for example computing an
"average rating". How would you write software that sees that pieces
of various web pages on different sites are reviews, recognizes that
they're all talking about the same restaurant, and figures out what
the rating is? Given the virtually limitless possible variations of
format in HTML, this would be very difficult to do. Easy for a
human, not for a machine.
What's needed is a notation, or "language", for representing semantic
information in a machine-readable way. To expose its semantics, a web
site would have to make this semantic information available to web
spiders and other software agents. The best-known approach is RDF
(Resource Description Framework), an XML-based notation for making
"assertions" about properties of data and relationships between data
items. But although RDF has been around for several years, and is an
official W3C recommendation, it has not yet been widely adopted.
I suspect that RDF's day may come. In the meantime, a very simple
approach called "microformats" appears to be gaining traction. Rather
than attempting to solve the general problem of representing the
semantics of everything and inventing a new language in which to do
it, the microformats project focuses on some common types of data
found on the web and uses the standard 'class' attribute of HTML (and
XHTML) to embed semantic information in a web page in such a way that
it can be easily extracted by software. Microformat definition is an
open process supported by a wiki that anybody can access and some
mailing lists that anybody can join. So far, they've developed
microformats for calendar entries (i.e. scheduled events like plays,
concerts, or professional meetings), licensing information
(e.g. software licenses and copyright specifications), address-book
entries, social relationships (friends, colleagues, acquaintances,
etc.), and a few other things. Other microformats are currently under
discussion. See http:microformats.org for information about what
microformats are and how the adoption process works.
Using the microformat approach, any of the websites in the
hypothetical example above could add semantic information to their
reviews using standard HTML or XHTML and without changing the
appearance of the review to a human reader, such that it would be
feasible to write software to parse out the semantic information.
The microformat for reviews is called "hReview". The specification
hasn't been finalized yet, so I'll illustrate the microformat process
with something simpler: Marking up just the restaurant address itself
with microformat encoding, such that software could extract the
location information in such a way that it could, for example, be
imported into an address book or a mapping service such as Google
Maps.
The microformat for designating an entity such as a person or a
business is called "hCard" and is based on the "vCard" format defined
in RFC2426 (http://www.ietf.org/rfc/rfc2426.txt). To designate that a
portion of a web page represents such an entity, you can enclose it in
a <div> element with a class value of "vcard":
<div class="vcard">
(entity description goes here)
</div>
Then you mark up the components of the entity description (name,
street address, city, etc.) with elements having class attribute
values borrowed from the vCard standard. Something like
<p>The address of Cafe Marie is 1759 Plymouth Road, Ann Arbor,
Michigan
48105.</p>
becomes
<div class="vcard">
<p>The address of <span class="fn org">Cafe Marie</span> is
<span class="adr">
<span class="street-address">1759 Plymouth Road</span>,
<span class="locality">Ann Arbor</span>,
<span class="region">Michigan</span>
<span class="postal-code">48105</span></span>.</p>
</div>
That's a lot of extra markup, but notice that the basic paragraph
structure of the HTML hasn't changed, nor will the appearance of the
text in a web browser. The big win is that the semantic labels enable
hCard-aware software to find the information, parse it, and re-use it.
A number of sites are using microformats. No web browsers support
microformats out of the box yet, although Firefox 3.0 is expected to
do so, and Bill Gates has stated that microformats are a Good Thing,
so they'll probably be supported by IE7 eventually. In the meantime,
there's a Firefox add-on called Operator that processes a limited
number of microformats. If you have Firefox 2, you can download and
install the add-on from
http://labs.mozilla.com/2006/12/introducing-operator . Then visit
http://cyberspace.org/~remmers/vcard-example.html and have a look at
the menu Operator provides when you right-click on the Cafe Marie
hCard. You can export it to an address book or look it up on Google
Maps, for example.
The microformat effort focuses on representing semantic information
in a standard way, for simple kinds of data that are already widely
used on the web. They call it "paving the cowpaths." Microformats
won't get us all the way to the full-blown semantic web, but they're
a promising start.
10 responses total.
(Typo correction: The link to the microformats web site should be http://microformats.org)
I just ran across a microformats bookmarklet for Safari at http://leftlogic.com/info/articles/microformats_bookmarklet. The author claims that it works with Firefox also, although I haven't tested it. It's just JavaScript, so it may work in other browsers as well. To install, drag the bookmarklet link to the bookmarks bar. To use it: Click on the bookmarklet while viewing a page, and a window will pop up showing a list of links to all detected microformats on the page. Click on one and it will export the microformatted data in a format suitable for import into an appropriate application. For example, when I click on a scheduled event (such as an upcoming concert), a .ics file is created on my desktop that can be imported to iCal (or any other calendering program that understands iCalendar format). This installs the event in my calendar without my having to type in the details by hand. Visit http://upcoming.org and use the bookmarklet to see what I'm talking about. With the increasing availability of software such as browser add-ons that support microformats, and the added convenience they afford the user, I think that it is only a matter of time before more websites that list events start marking them up in the hCalendar microformat. Speaking as an Ann Arbor resident, it would really be great if arborweb.com did that (and supplied RSS feeds as well).
dang, did you type all that in by hand? lol.... and it is interesting how most of that works still.
Yeah these are interesting tools. It's interesting formats are developed before any tools have popped up to make sue of them. john, what are you envisioning?
Re resp:3: Um, yes, I typed it all in myself. Did you think I had a secretary? :) Re resp:4: Tool development typically lags behind format definition. Logically, it kind of has to wait until the format is fairly well specified. Who wants to write software to process a moving target? For microformats, some simple tools were developed in parallel with the specs; you can find pointers to them at the microformats website, http://microformats.org. As microformats become more popular - as I expect they will - more tools will come along. More and more, the big websites are microformatting their data. If you go to, say, http://local.yahoo.com and go to the list of recommended restaurants, each restaurant listing is marked up in "hCard" microformat. This makes it easy for hCard-aware clients to extract information from the listing and do intelligent things with it. For example, the Operator Firefox extension will offer to add it to your address book or locate it for you in either Yahoo or Google maps. You're not locked in to whatever the host website decides to support. This kind of thing is an advantage to both authors and consumers of web content. If a website that lists businesses adds hCard markup to the listings, then things like adding to address books and displaying maps and driving directions can be done on the client side, using a microformat-aware web client. Rumor has it that Firefox 3, due out in a few months, will support microformats natively. I suspect that IE will too, eventually. Once native browser support becomes standard, this will encourage more sites to add microformat markup to their data (which is pretty simple to do).
RE: 5 on RE: 3 no, i just meant its seemed very detailed, and, um... nevermind.......
This Microformats buisness points out the benefits inherent in standards based design. In other words... Present your content in tagged heirarchal format and the end user can better choose the best means to parse the information (to suit their own situation.) See also, XML ;-)
I just found a recent article regarding Microformats. For - perhaps - some fresh info on the subject check this page. http://www.visitmix.com/Articles/Prototype-Oomph-A-Microformats-Toolkit
The sentiment behind microformats is great. After reading microformats-related mailing lists for a while, I've got some reservations about the execution, which strikes me as overly-politicized. Ad hoc centralized body to give a microformat some official "stamp of approval", but unfortunately an ill-defined proces for reaching such approval. People go around and around for month after month after month... An alternative approach that appears to be gaining traction is RDFa, a standard for embedding RDF semantic information in XHTML. It's recently become an official W3C recommendation.
Nearly four years on.... What is the current status of microformats? Microdata is part of HTML5, which seems to be the future (unfortunately? I feel like they threw out the baby with the bathwater on giving up on XHTML. Say what you will about XML, but at least you knew it was well-formed). RDFa has more marketshare than microdata, but less than microformats. Microformats seem to have more than both combined; what should one choose?
Response not possible - You must register and login before posting.
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss