Prev | Current Page 250 | Next

A. F. Salam and Jason R. Stevens

"Semantic Web Technologies and E-Business: Toward the Integrated Virtual Organization and Business Process Automation"


edu/SLM_info.html). The CMU-SLM toolkit provides several functions, including
word frequency lists and vocabularies, word bigram and trigram counts, bigram- and
trigram-related statistics, and various back off bigram and trigram language models.
Table 1 shows some 3-gram and 4-gram data.
Figure 4. Sample rows from a KWIC index file
Automat cally Extract ng and Tagg ng Bus ness Informat on for E-Bus ness Systems
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
Figure 5. Sample rows from KWIC index file where ???sales??? appears in column 1
Table 1. Sample 3-grams and 4-grams
There are two types of n-gram patterns we are interested in, for example, a word
such as sales
??? Patterns where sales is the first term and with n-1 words after it:
sales declined 42%, to $53.4
sales declined to $475.6 million,
Conlon, Lukose, Hale, and V njamur
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc.


Pages:
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262