43%
Figure 8. A document used by FIRST for extraction
0 Conlon, Lukose, Hale, and V njamur
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
XML.Formatting
To maximize the usefulness of a system like FIRST, it should extract facts and
record them in a format that will travel well from one e-business application to
another. XML is such a format. Thus, FIRST has been enhanced with an XML
converter. To convert an online WSJ corporate earnings article to into XML, the
article??™s URL is entered into a browser by the user. This triggers the FIRST system
to semantically process the article. The facts extracted from FIRST are fed as input
to the XML processor, which is implemented in Java. Data items are tagged as a
set of companies or organizations, along with generic header information, like the
Figure 10. The XML formatted output file
Figure 9. User interface page
Automat cally Extract ng and Tagg ng Bus ness Informat on for E-Bus ness Systems
Copyright ?© 2007, Idea Group Inc.
Pages:
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273