The time has come for the semantic web to SPARQLPodTech recently published three videos of Sir Tim Berners-Lee in action. The first was a presentation of the semantic web at HP Labs in Palo Alto, California, the second was the resulting Q&A session, and the third a brief interview with Robert Scoble. The ever watchful Paul Miller of Talis has linked to all three (tinyurl.com/2c6wm3).
The semantic web has been much heralded. A headline back in 2002 announced: “The semantic web lifts off”. One of that article’s authors was Berners-Lee. Yet six years later it is hard to use the term “lift-off” even though a lot of the architectural underpinnings are already in place. Some tough problems lie ahead. Trust, in particular.
It reminds me of the many IT projects that progress rapidly to 80% and then slow to a crawl. However, Berners-Lee expects the new year to start with a very useful new tool called SPARQL (yep, it’s pronounced “sparkle”). Like GNU before it, it is a recursive acronym (it stands for SPARQL Protocol And RDF Query Language) and is designed to pick up truly relevant information from the internet in RDF (Resource Definition Framework) format.
Until now, web searches have parsed the content of web pages and assembled the results as lists of apparently matching URLs. But this is a fairly hopeless process when it comes to machine-processing the results.
Better to use RDF to describe entities within documents and other data stores so that the resulting hits are credible and can be manipulated to give more comprehensive and meaningful results.
What’s of interest is not the web pages or documents themselves but the networks of things names, places, dates and so on that transcend the documents. They will match the original search and may then unearth related documents, or the query results may lead to further automatic searches. It’s a bit like the agent technology we’ve been talking about for so long.
Fortunately, the W3C folk have invented a way to do some of the heavy lifting needed to get us from where we are to where we need to be. It’s a way to extract RDF from XML and XHTML sources. It’s called GRDDL (Gleaning Resource Descriptions from Dialects of Languages).
One of Berners-Lee’s co-panellists in Palo Alto was Wendy Hall, from the University of Southampton, who speculated about keeping her larder contents online so delivery people could keep tabs on what needs replenishing. She said she was amazed at how much time people have and how much they want to say about themselves. Making information about yourself visible could lead to all sorts of interesting consequences. She even started talking about revealing her vital statistics, but then bit her tongue.
Which leads to security and trust. These are major issues for the semantic web, certainly where personal data is concerned. It might be a case of the agent software trawling through the links until it is sure of the provenance of a piece of information or the authority of the user to receive it.
As someone who once wrote software to mimic the way the brain stores and links information, I am only too aware how easy it is for ambiguities to creep in. Presumably, by following the network of related links, software will be able to disambiguate multiple uses of the same name or term.
Talis, which had a big hand in developing SPARQL and GRDDL, has incorporated semantic web services in its platform. Using the APIs, it has developed Engage, one of the first commercial semantic web apps.
All Information management technology
