S. Securities and
Exchange Commission (SEC) had begun accepting voluntary financial filings in
XBRL, the Federal Deposit Insurance Corporation (FDIC) was requiring XBRL
reporting, and a growing number of publicly traded corporations were producing
financial statements in XBRL (XBRL, 2006).
We present a prototype system that uses natural language processing techniques to
perform information extraction of specific types of facts from corporate earnings
articles of the Wall Street Journal. These facts are represented in template form to
demonstrate their structured nature and converted into XBRL for Web portability.
Extracting. Information. .
from. Online.Articles
This section discusses the process of generating XML-formatted files from online
documents. Our system, Flexible Information extRaction SysTem (FIRST), analyzes
online documents from the WSJ using syntactic and simple semantic analysis
(Hale, Conlon, McCready, Lukose, & Vinjamur, 2005; Lukose, Mathew, Conlon, &
Lawhead, 2004; Vinjamur, Conlon, Lukose, McCready, & Hale, 2005). Syntactic
analysis helps FIRST to detect sentence structure, while semantic analysis helps
FIRST to identify the concepts that are represented by different terms.
Pages:
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251