This chapter explores how natural language processing
(NLP) principles, using linguistic analysis, can be employed to extract information
from unstructured Web documents and translate it into extensible markup language
(XML)??”the enabling currency of today??™s e-business applications, and the founda-
0 Conlon, Lukose, Hale, and V njamur
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
tion for the emerging Semantic Web languages of tomorrow. Our prototype system
is built and tested with online financial documents.
Introduction
Business decision makers demand relevant, accurate, and complete information
about the marketplaces in which they compete. The World Wide Web is a rich but
unmanageably huge source of human-readable business information??”some novel,
accurate, and relevant??”some repetitive, wrong, or out of date. As the flood of Web
document tops 11.5 billion pages and continues to rise (Gulli & Signorini, 2005),
the human task of grasping the business information it bears seems more and more
hopeless.
Pages:
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242