A semantic boost for Web users

align=right

Jeff Heflin

Go to an Internet search engine and type in a multifaceted query. Ask for a list of West Coast cities that have a microbrewery and that also boast an NFL team that won a Super Bowl in the 1980s.

Current search engines, says Jeff Heflin, associate professor of computer science and engineering, will not list San Francisco 49ers and Anchor Brewing Company on the same page of results.

Or try this search: researchers who have received NSF grants and who work at universities in states with fewer than one million residents.

“It’s not that this query cannot be answered by one of today’s search engines,” says Heflin. “But it can’t be done automatically and it is not obvious how it should be done. It requires a lot of cognitive ability on the part of the Internet user, and it also requires a lot of trial and error.”

Heflin is part of an international effort to improve the “Semantic Web” by integrating ontologies in the cyberworld. In artificial intelligence, ontology is a set of terms described specifically to provide a common vocabulary for information exchange among people in related fields.

The Semantic Web at present, says Heflin, consists of many independent ontologies, but they require mapping, or alignment, to be useful to a wide variety of Internet users.

“Without some form of alignment, the data that is described in terms of one ontology will be inaccessible to users who ask questions [using the terminology] of another ontology,” Heflin wrote in a technical report in June 2007.

That alignment, says Heflin, can be provided by the Web Ontology Language, or OWL, which was developed by the Web Ontology Working Group, of which Heflin was a member.

“OWL can integrate these ontologies and thereby integrate the data sources that commit to them,” says Heflin, who served as a member of the working group. OWL and two other Web languages, HTML and XML, were recommended by the World Wide Web Consortium, which is directed by Tim Berners-Lee, inventor of the World Wide Web.

True integration of data sources, says Heflin, requires more than ontology alignment.

“For the Semantic Web to be really effective, a lot of people have to put information on the Internet in that form. The more people who do that; the more powerful the system will be for Internet users.

“A potential network effect arises when many individuals each contribute a few small maps. Because OWL is shareable via the Web, anyone can create alignments and make them available for others to use.”

Heflin’s Hawkeye

Heflin and his students have developed two prototype Semantic Web domains—one for e-academia and one for e-government—by uploading to a knowledge base 1.5 million real-world Web pages in each category. The knowledge base, which is called Hawkeye, contains more than 166 million facts from real-world data sources, says Heflin.

“We can query our system and find, for example, all bills dealing with clean air that have been sponsored by Congressmen from states with a population of less than 10 million,” says Heflin.

“Our system can answer complex queries in less than a minute and simple queries in just seconds. We hope to improve on that.”

The explosion of knowledge and the growing interaction between academic disciplines have increased the need for a more robust Semantic Web, says Heflin, who has an NSF CAREER Award.

“Biology and physics are two fields whose interests are overlapping more and more, but they use different vocabularies. The Semantic Web could be used to create ontologies for these fields and then inter-relate these vocabularies.”

In the future, says Heflin, Internet users will go to Semantic Web search engines and enter multifaceted queries. The search engine will answer questions and, if multiple, conflicting answers exist, will link answers to originating sites so that users can determine for themselves which answers are most accurate.

“We’re trying to give the Semantic Web a kick-start by creating a system that collects information from many pages about many topics that are different but still related. Hopefully, once Internet users start querying our system and similar systems, they will be encouraged to create their own sites.”

Last month, two of Heflin’s graduate students, Abir Qasem and Fabiana Prabhakar, demonstrated the e-government and e-academics domains at the Sixth International Semantic Web Conference (ISWC-2007) in Busan, South Korea.

What form will the Semantic Web eventually take?

“You can only speculate,” says Heflin. “Ten or 15 years ago, Berners-Lee would never have imagined that the Web would be used today for shopping, socializing, e-commerce, research and so on.

“Virtually everything you need to know is on the Web. The problem is trying to find it. We want people to be able to take advantage of the information that’s out there and use it as efficiently as possible.”

--Kurt Pfitzer