Prev | Current Page 249 | Next

A. F. Salam and Jason R. Stevens

"Semantic Web Technologies and E-Business: Toward the Integrated Virtual Organization and Business Process Automation"

???
We have generated more than 5 million rows of data. When many sentences are
generated in the file, we look at the key terms that we believe may be used to express
important information??”the specific types of information we aim to extract. For
example, suppose we believe that the word sale will lead to important information
about stock prices, but we are not sure how other words relate to the word sale. We
therefore select all the rows in the database that contain the word sale, using the
following structured query language (SQL) statement:
Select W1, W2, W3, W4, W5
From WSJ_1987
Where W1 like ???sale%??™
Order by W1, W2;
Many rows are returned from this SQL statement. Some rows are useful and show
interesting patterns but some are not. Figure 5 shows some sample rows that have
the word sales appearing in column 1. Using this technique, we are able to find
several patterns within which the word sales appears.
We also look for patterns using n-gram data produced by the Carnegie Mellon
Statistical Language Modeling (CMU-SLM) Toolkit (http://www.speech.cs.cmu.


Pages:
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261