You are not logged in. Login Now
 0-10          
 
Author Message
remmers
Microformats and the Semantic Web Mark Unseen   Feb 28 23:38 UTC 2007

The first generation of the web was concerned mainly with presenting
data in ways convenient for human beings to read, understand, and
interact with.  But Tim Berners-Lee's original vision included a
"semantic web" in which not only human beings, but also software
agents, could extract, share, and repurpose information in useful
ways.  In other words, the underlying semantics of data on the web
should be discoverable and manipulable by machines as well as by 
people.

HTML (possibly augmented by CSS and JavaScript) is very good at 
presenting information for human consumption, but the flexibility 
allowed by HTML frustrates automated extraction of semantics.

Simple example: Suppose several different sites rate restaurants.  One
site does it this way, using a simple paragraph structure:

    <p>The address of Cafe Marie is 1759 Plymouth Road, Ann Arbor,
       Michigan, 48105.  Phone 734-662-2272.</p>  
    <p>Very good; I rate it 3 stars.</p>

A second site uses a list and some explicit line breaks for more
structured formatting:

    <ul>
    <li>Name: Cafe Marie</li>
    <li>Location:<br />
        1759 Plymouth Road<br />
        Ann Arbor<br />
        Michigan 48105
    </li>
    <li>Telephone: 734-662-2272</li>
    <li>Rating: 4 stars (excellent)</li>
    </ul>

A third site has yet another format:

    <p>Cafe Marie, 1759 Plymouth Road, Ann Arbor, Michigan 48105. 
       734-662-2272.
        As if you should care; I think it's lousy. 
        <strong>(1 star)</strong>.</p>

Same restaurant, different ratings.  Now, suppose you wanted to write
a web client that collects restaurant reviews from a number of
different sites and integrates them somehow, for example computing an
"average rating".  How would you write software that sees that pieces
of various web pages on different sites are reviews, recognizes that
they're all talking about the same restaurant, and figures out what
the rating is?  Given the virtually limitless possible variations of
format in HTML, this would be very difficult to do.  Easy for a
human, not for a machine.

What's needed is a notation, or "language", for representing semantic
information in a machine-readable way.  To expose its semantics, a web
site would have to make this semantic information available to web
spiders and other software agents.  The best-known approach is RDF
(Resource Description Framework), an XML-based notation for making
"assertions" about properties of data and relationships between data
items.  But although RDF has been around for several years, and is an
official W3C recommendation, it has not yet been widely adopted.

I suspect that RDF's day may come.  In the meantime, a very simple
approach called "microformats" appears to be gaining traction.  Rather
than attempting to solve the general problem of representing the
semantics of everything and inventing a new language in which to do
it, the microformats project focuses on some common types of data
found on the web and uses the standard 'class' attribute of HTML (and
XHTML) to embed semantic information in a web page in such a way that
it can be easily extracted by software.  Microformat definition is an
open process supported by a wiki that anybody can access and some
mailing lists that anybody can join.  So far, they've developed
microformats for calendar entries (i.e. scheduled events like plays,
concerts, or professional meetings), licensing information
(e.g. software licenses and copyright specifications), address-book
entries, social relationships (friends, colleagues, acquaintances,
etc.), and a few other things.  Other microformats are currently under
discussion.  See http:microformats.org for information about what
microformats are and how the adoption process works.

Using the microformat approach, any of the websites in the
hypothetical example above could add semantic information to their
reviews using standard HTML or XHTML and without changing the
appearance of the review to a human reader, such that it would be
feasible to write software to parse out the semantic information.

The microformat for reviews is called "hReview".  The specification
hasn't been finalized yet, so I'll illustrate the microformat process
with something simpler: Marking up just the restaurant address itself
with microformat encoding, such that software could extract the
location information in such a way that it could, for example, be
imported into an address book or a mapping service such as Google
Maps.

The microformat for designating an entity such as a person or a
business is called "hCard" and is based on the "vCard" format defined
in RFC2426 (http://www.ietf.org/rfc/rfc2426.txt).  To designate that a
portion of a web page represents such an entity, you can enclose it in
a <div> element with a class value of "vcard":

    <div class="vcard">
        (entity description goes here)
    </div>

Then you mark up the components of the entity description (name,
street address, city, etc.) with elements having class attribute
values borrowed from the vCard standard.  Something like

    <p>The address of Cafe Marie is 1759 Plymouth Road, Ann Arbor, 
Michigan
       48105.</p>

becomes

    <div class="vcard">
    <p>The address of <span class="fn org">Cafe Marie</span> is
       <span class="adr"> 
       <span class="street-address">1759 Plymouth Road</span>, 
       <span class="locality">Ann Arbor</span>, 
       <span class="region">Michigan</span>
       <span class="postal-code">48105</span></span>.</p>
    </div>

That's a lot of extra markup, but notice that the basic paragraph
structure of the HTML hasn't changed, nor will the appearance of the
text in a web browser.  The big win is that the semantic labels enable
hCard-aware software to find the information, parse it, and re-use it.

A number of sites are using microformats.  No web browsers support
microformats out of the box yet, although Firefox 3.0 is expected to
do so, and Bill Gates has stated that microformats are a Good Thing,
so they'll probably be supported by IE7 eventually.  In the meantime,
there's a Firefox add-on called Operator that processes a limited
number of microformats.  If you have Firefox 2, you can download and
install the add-on from
http://labs.mozilla.com/2006/12/introducing-operator .  Then visit
http://cyberspace.org/~remmers/vcard-example.html and have a look at
the menu Operator provides when you right-click on the Cafe Marie
hCard.  You can export it to an address book or look it up on Google
Maps, for example.

The microformat effort focuses on representing semantic information
in a standard way, for simple kinds of data that are already widely
used on the web.  They call it "paving the cowpaths."  Microformats
won't get us all the way to the full-blown semantic web, but they're
a promising start.
10 responses total.
remmers
response 1 of 10: Mark Unseen   Feb 28 23:39 UTC 2007

(Typo correction:  The link to the microformats web site should be
http://microformats.org)
remmers
response 2 of 10: Mark Unseen   Mar 1 19:54 UTC 2007

I just ran across a microformats bookmarklet for Safari at
http://leftlogic.com/info/articles/microformats_bookmarklet.
The author claims that it works with Firefox also, although I haven't 
tested it.  It's just JavaScript, so it may work in other browsers as 
well.  To install, drag the bookmarklet link to the bookmarks bar.

To use it:  Click on the bookmarklet while viewing a page, and a window 
will pop up showing a list of links to all detected microformats on the 
page.  Click on one and it will export the microformatted data in a 
format suitable for import into an appropriate application.  For 
example, when I click on a scheduled event (such as an upcoming 
concert), a .ics file is created on my desktop that can be imported to 
iCal (or any other calendering program that understands iCalendar 
format).  This installs the event in my calendar without my having to 
type in the details by hand.

Visit http://upcoming.org and use the bookmarklet to see what I'm 
talking about.

With the increasing availability of software such as browser add-ons 
that support microformats, and the added convenience they afford the 
user, I think that it is only a matter of time before more websites that 
list events start marking them up in the hCalendar microformat.  
Speaking as an Ann Arbor resident, it would really be great if 
arborweb.com did that (and supplied RSS feeds as well).

fuzzball
response 3 of 10: Mark Unseen   Mar 23 15:58 UTC 2007

dang, did you type all that in by hand? lol....

and it is interesting how most of that works still.
h0h0h0
response 4 of 10: Mark Unseen   Mar 24 02:43 UTC 2007

Yeah these are interesting tools.  It's interesting formats are developed
before any tools have popped up to make sue of them.  john, what are you
envisioning?
remmers
response 5 of 10: Mark Unseen   Mar 24 22:52 UTC 2007

Re resp:3:  Um, yes, I typed it all in myself.  Did you think I had a 
secretary?  :)

Re resp:4:  Tool development typically lags behind format definition.  
Logically, it kind of has to wait until the format is fairly well 
specified.  Who wants to write software to process a moving target?  

For microformats, some simple tools were developed in parallel with the 
specs; you can find pointers to them at the microformats website, 
http://microformats.org.  As microformats become more popular - as I 
expect they will - more tools will come along.

More and more, the big websites are microformatting their data.  If you 
go to, say, http://local.yahoo.com and go to the list of recommended 
restaurants, each restaurant listing is marked up in "hCard" 
microformat.  This makes it easy for hCard-aware clients to extract 
information from the listing and do intelligent things with it.  For 
example, the Operator Firefox extension will offer to add it to your 
address book or locate it for you in either Yahoo or Google maps. You're 
not locked in to whatever the host website decides to support.

This kind of thing is an advantage to both authors and consumers of web 
content.  If a website that lists businesses adds hCard markup to the 
listings, then things like adding to address books and displaying maps 
and driving directions can be done on the client side, using a 
microformat-aware web client.  Rumor has it that Firefox 3, due out in a 
few months, will support microformats natively.  I suspect that IE will 
too, eventually.  Once native browser support becomes standard, this 
will encourage more sites to add microformat markup to their data (which 
is pretty simple to do).
fuzzball
response 6 of 10: Mark Unseen   Mar 25 04:53 UTC 2007

RE: 5 on RE: 3

no, i just meant its seemed very detailed, and, um...
nevermind.......
madmike
response 7 of 10: Mark Unseen   Sep 30 19:59 UTC 2008

This Microformats buisness points out the benefits inherent in 
standards based design. In other words...

Present your content in tagged heirarchal format and the end user can 
better choose the best means to parse the information (to suit their 
own situation.) 

See also, XML ;-)
madmike
response 8 of 10: Mark Unseen   Oct 23 12:33 UTC 2008

I just found a recent article regarding Microformats. For - perhaps - 
some fresh info on the subject check this page.

http://www.visitmix.com/Articles/Prototype-Oomph-A-Microformats-Toolkit
remmers
response 9 of 10: Mark Unseen   Oct 23 21:43 UTC 2008

The sentiment behind microformats is great.  After reading
microformats-related mailing lists for a while, I've got some
reservations about the execution, which strikes me as
overly-politicized.  Ad hoc centralized body to give a microformat some
official "stamp of approval", but unfortunately an ill-defined proces
for reaching such approval.  People go around and around for month after
month after month...

An alternative approach that appears to be gaining traction is RDFa, a
standard for embedding RDF semantic information in XHTML.  It's recently
become an official W3C recommendation.
cross
response 10 of 10: Mark Unseen   Sep 2 10:04 UTC 2012

Nearly four years on....

What is the current status of microformats?  Microdata is part of HTML5, which
seems to be the future (unfortunately?  I feel like they threw out the baby
with the bathwater on giving up on XHTML.  Say what you will about XML, but
at least you knew it was well-formed).  RDFa has more marketshare than
microdata, but less than microformats.  Microformats seem to have more than
both combined; what should one choose?
 0-10          
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss