The overall
process is shown in Figure 1. This section starts with a discussion of the information
extraction literature. Later, we discuss how FIRST extracts information from
online documents to produce XML-formatted files.
Information.Extraction.
The explosion of textual information on the Web requires new technologies that can
recognize information originally structured for human consumption rather than for
data processing. Research in artificial intelligence (AI) has been trying to find ways
to help computers process tasks which would otherwise require human judgment.
NLP, a sub-area of AI, is a research area that deals with spoken and written human
languages. NLP subareas include machine translation, natural language interfaces,
language understanding, and text generation. Since NLP tasks are very difficult,
few NLP application areas have been developed commercially. Currently, the most
successful applications are grammar checking and machine translation programs.
To deal with textual data, information systems need to be able to understand the
documents they read.
Pages:
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252