Skip to main content

Extracting 5W1H Event Semantic Elements from Chinese Online News

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

Abstract

This paper proposes a verb-driven approach to extract 5W1H (Who, What, Whom, When, Where and How) event semantic information from Chinese online news. The main contributions of our work are two-fold: First, given the usual structure of a news story, we propose a novel algorithm to extract topic sentences by stressing the importance of news headline; Second, we extract event facts (i.e. 5W1H) from these topic sentences by applying a rule-based method (verb-driven) and a supervised machine-learning method (SVM). This method significantly improves the predicate-argument structure used in Automatic Content Extraction (ACE) Event Extraction (EE) task by considering valency (dominant capacity to noun phrases) of a Chinese verb. Extensive experiments on ACE 2005 datasets confirm its effectiveness and it also shows a very high scalability, since we only consider the topic sentences and surface text features. Based on this method, we build a prototype system named Chinese News Fact Extractor (CNFE). CNFE is evaluated on a real world corpus containing 30,000 newspaper documents. Experiment results show that CNFE can extract event facts efficiently.

This work is sponsored by Beijing Municipal Science & Technology Commission project “R & D of 3-dimensional risk warning and integrated prevention technology” and China Postdoctoral Science Foundation (No.20080440260).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Filatova, E., Hatzivassiloglou, V.: Event-based Extractive summarization. In: Proceedings of ACL, pp. 104–111 (2004)

    Google Scholar 

  2. Li, W., Wu, M., Lu, Q., Xu, W., Yuan, C.: Extractive Summarization using Inter- and Intra- Event Relevance. In: Proceedings of ACL (2006)

    Google Scholar 

  3. Liu, M., Li, W., Wu, M., Lu, Q.: Extractive Summarization Based on Event Term Clustering. In: Proceedings of ACL (2007)

    Google Scholar 

  4. Carmagnola, F.: The five ws in user model interoperability. In: UbiqUM (2008)

    Google Scholar 

  5. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open Information Extraction from the Web. In: Proceedings of IJCAI, 2670–2676 (2007)

    Google Scholar 

  6. Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A Prototype System for Extracting Relations from Large Text Collections. In: Proceedings of SIGMOD Conference, pp. 612–612 (2001)

    Google Scholar 

  7. Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall (preliminary results). In: Proceedings of WWW, pp. 100–110 (2004)

    Google Scholar 

  8. Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations from web documents. In: Proceedings of KDD, pp. 712–717 (2006)

    Google Scholar 

  9. Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.: StatSnowball: a statistical approach to extracting entity relationships. In: Proceedings of WWW, pp. 101–110 (2009)

    Google Scholar 

  10. Chinchor, N., Marsh, E.: MUC-7 Information Extraction Task Definition (version 5. 1). In: Proceedings of MUC-7 (1998)

    Google Scholar 

  11. ACE (Automatic Content Extraction). Chinese Annotation Guidelines for Events. National Institute of Standards and Technology (2005)

    Google Scholar 

  12. Chen, Z., Ji, H.: Graph-based Event Coreference Resolution. In: Proceedings of ACL-IJCNLP workshop on TextGraphs-4: Graph-based Methods for Natural Language Processing (2009)

    Google Scholar 

  13. Ji, H., Grishman, R.: Refining Event Extraction Through Unsupervised Cross-document Inference. In: Proceedings of ACL (2008)

    Google Scholar 

  14. Ji, H., Grishman, R., Chen, Z., Gupta, P.: Cross-document Event Extraction, Ranking and Tracking. In: Proceedings of Recent Advances in Natural Language Processing (2009)

    Google Scholar 

  15. Ji, H.: Unsupervised Cross-lingual Predicate Cluster Acquisition to Improve Bi-lingual Event Extraction. In: Proceedings of HLT-NAACL Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics (2009)

    Google Scholar 

  16. Ahn, D.: The stages of event extraction. In: Proceedings of the Workshop on Annotations and Reasoning about Time and Events, pp.1–8 (2006)

    Google Scholar 

  17. Naughton, M. , Stokes, N., Carthy, J.: Investigating statistical techniques for sentence-level event classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 617–624 (2008)

    Google Scholar 

  18. Zhao, Y.Y., Qin, B., Che, W.X., Liu, T.: Research on Chinese Event Extraction. Journal of Chinese Information Processing 22(1), 3–8 (2008)

    Google Scholar 

  19. Tan, H., Zhao, T., Zheng, J.: Identification of Chinese Event and Their Argument Roles. In: Proceedings of Computer and Information Technology Workshops on IEEE 8th International Conference, pp. 14–19 (2008)

    Google Scholar 

  20. Xue, N.: Labeling Chinese Predicates with Semantic Roles. In: Proceedings of Computational Linguistics, pp. 225–255 (2008)

    Google Scholar 

  21. Surdeanu, M., Harabagiu, S.M., Williams, J., Aarseth, P.: Using Predicate-Argument Structures for Information Extraction. In: Proceedings of ACL, pp. 8–15 (2003)

    Google Scholar 

  22. Dorr, B.J., Zajic, D.M., Schwartz, R.M.: Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In: Proceedings of HLT-NAACL, W03-0501 (2003)

    Google Scholar 

  23. Tesnière, L.: Esquisse d’une syntaxe structurale. Klincksieck, Paris (1953)

    Google Scholar 

  24. Tesnière, L.: Èlm̀ent de Syntaxe Structurale. Klincksieck, Paris (1959)

    Google Scholar 

  25. Feng, X.: Exploration of trivalent verb in modern Chinese (2004)

    Google Scholar 

  26. Ningjing, L., Weiguo, Z.: A Study of Verification Principle of Valency of Chinese Verbs and Reclassification of Trivalent Verbs. In: Proceedings of 9th Chinese National Conference on Computational Linguistics (CNCCL 2007), pp. 171–177 (2007)

    Google Scholar 

  27. Luhn, H.P.: The Automatic Creation of Literature Abstracts. In: Proceedings of IBM Journal of Research and Development, pp. 159–165 (1958)

    Google Scholar 

  28. Edmundson, H.P.: New Methods in Automatic Extracting. Proceedings of J. ACM, 264–285 (1969)

    Google Scholar 

  29. Paice, C.D., Jones, P.A.: The Identification of Important Concepts in Highly Structured Technical Papers. In: Proceedings of SIGIR, pp. 69–78 (1993)

    Google Scholar 

  30. Paice, C.D.: Constructing literature abstracts by computer: Techniques and prospects. In: Proceedings of Inf. Process. Manage., pp.171–186 (1990)

    Google Scholar 

  31. Dali, L., Fortuna, B.: Triplet Extraction From Sentences Using SVM. In: Proceedings of SiKDD (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W. (2010). Extracting 5W1H Event Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics