Skip to main content

Event Extraction from Unstructured Text Data

Part of the Lecture Notes in Computer Science book series (LNISA,volume 9261)

Abstract

We extend a bootstrapping method that was initially developed for extracting relations from webpages to the problem of extracting content from large collections of short unstructured text. Such data appear as field notes in enterprise applications and as messages in social media services. The method iteratively learns sentence patterns that match a set of representative event mentions and then extracts different mentions using the learnt patterns. At every step, the semantic similarity between the text and set of patterns is used to determine if the pattern was matched. Semantic similarity is calculated using the WordNet lexical database. Local structure features such as bigrams are extracted where possible from the data to improve the accuracy of pattern matching. We rank and filter the learnt patterns to balance the precision and recall of the approach with respect to extracted events. We demonstrate this approach on two different datasets. One is a collection of field notes from an enterprise dataset. The other is a collection of “tweets” collected from the Twitter social network. We evaluate the accuracy of the extracted events when method parameters are varied.

Keywords

  • Semantic Similarity
  • Content Extraction
  • Stop Word
  • Event Extraction
  • Relation Extraction

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-22849-5_38
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-22849-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.

References

  1. Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: ACM Conference on Digital Libraries, pp. 85–94 (2000)

    Google Scholar 

  2. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–45. ACM (1998)

    Google Scholar 

  3. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction for the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)

    Google Scholar 

  4. Benson, E., Haghighi, A., Barzilay, R.: Event discovery in social media feeds. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 389–398 (2011)

    Google Scholar 

  5. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2009)

    Google Scholar 

  6. Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  7. Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and svd. Behav. Res. Methods 44(3), 890–907 (2012)

    CrossRef  Google Scholar 

  8. Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ace) program-tasks, data, and evaluation. In: LREC (2004)

    Google Scholar 

  9. Lin, D., Pantel, P.: Discovery of inference rules for question-answering. Nat. Lang. Eng. 7(04), 343–360 (2001)

    CrossRef  Google Scholar 

  10. Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet:: Similarity: measuring the relatedness of concepts. In: Demonstration Papers at HLT-NAACL 2004, pp. 38–41. Association for Computational Linguistics (2004)

    Google Scholar 

  11. Ritter, A., Mausam, E.O., Clark, S.: Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1104–1112. ACM (2012)

    Google Scholar 

  12. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)

    Google Scholar 

  13. Zhao, W.X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.-P., Li, X.: Topical keyphrase extraction from Twitter. In: ACL: Human Language Technologies, pp. 379–388 (2011)

    Google Scholar 

  14. Zong, B., Wu, Y., Song, J., Singh, A.K., Cam, H., Han, J., Yan, X.: Towards scalable critical alert mining. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1057–1066 (2014)

    Google Scholar 

Download references

Acknowledgment

This work is supported by Chevron U.S.A. Inc. under the joint project, Center for Interactive Smart Oilfield Technologies (CiSoft), at the University of Southern California.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anand Panangadan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Shang, C., Panangadan, A., Prasanna, V.K. (2015). Event Extraction from Unstructured Text Data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22849-5_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22848-8

  • Online ISBN: 978-3-319-22849-5

  • eBook Packages: Computer ScienceComputer Science (R0)