Prev | Current Page 263 | Next

A. F. Salam and Jason R. Stevens

"Semantic Web Technologies and E-Business: Toward the Integrated Virtual Organization and Business Process Automation"


The explosion of Web documents, many of which are different descriptions of the
same facts, will also bring about the need to recognize which facts are conceptually
equivalent. Craven et al. (2002) refer to this as the multiple Elvis problem. In
our current work, we extract from and filter out, duplicate facts from multiple Web
sources, including not only the WSJ but also Reuters, and use this information to
create a knowledge base that contains only novel facts. Semantically conflicting
facts are identified and quarantined until new information validates or disavows
one or the other, and the conflict can be resolved. In this approach, the multiple
sources of a given fact are remembered (via URL references to the source articles)
for verification purposes, but each fact is stored only once.
Since Web information providers may be slow to convert their existing content into
a rich XML format, much of the semantic encoding may have to be done by third
party e-business service providers, or by end users themselves, using browser-side
extracting and encoding tools, such as the Thresher tool proposed by Hogue and
Karger (2005).


Pages:
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275