Abstract
This paper proposes a verb-driven approach to extract 5W1H (Who, What, Whom, When, Where and How) event semantic information from Chinese online news. The main contributions of our work are two-fold: First, given the usual structure of a news story, we propose a novel algorithm to extract topic sentences by stressing the importance of news headline; Second, we extract event facts (i.e. 5W1H) from these topic sentences by applying a rule-based method (verb-driven) and a supervised machine-learning method (SVM). This method significantly improves the predicate-argument structure used in Automatic Content Extraction (ACE) Event Extraction (EE) task by considering valency (dominant capacity to noun phrases) of a Chinese verb. Extensive experiments on ACE 2005 datasets confirm its effectiveness and it also shows a very high scalability, since we only consider the topic sentences and surface text features. Based on this method, we build a prototype system named Chinese News Fact Extractor (CNFE). CNFE is evaluated on a real world corpus containing 30,000 newspaper documents. Experiment results show that CNFE can extract event facts efficiently.
This work is sponsored by Beijing Municipal Science & Technology Commission project “R & D of 3-dimensional risk warning and integrated prevention technology” and China Postdoctoral Science Foundation (No.20080440260).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Filatova, E., Hatzivassiloglou, V.: Event-based Extractive summarization. In: Proceedings of ACL, pp. 104–111 (2004)
Li, W., Wu, M., Lu, Q., Xu, W., Yuan, C.: Extractive Summarization using Inter- and Intra- Event Relevance. In: Proceedings of ACL (2006)
Liu, M., Li, W., Wu, M., Lu, Q.: Extractive Summarization Based on Event Term Clustering. In: Proceedings of ACL (2007)
Carmagnola, F.: The five ws in user model interoperability. In: UbiqUM (2008)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open Information Extraction from the Web. In: Proceedings of IJCAI, 2670–2676 (2007)
Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A Prototype System for Extracting Relations from Large Text Collections. In: Proceedings of SIGMOD Conference, pp. 612–612 (2001)
Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall (preliminary results). In: Proceedings of WWW, pp. 100–110 (2004)
Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations from web documents. In: Proceedings of KDD, pp. 712–717 (2006)
Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.: StatSnowball: a statistical approach to extracting entity relationships. In: Proceedings of WWW, pp. 101–110 (2009)
Chinchor, N., Marsh, E.: MUC-7 Information Extraction Task Definition (version 5. 1). In: Proceedings of MUC-7 (1998)
ACE (Automatic Content Extraction). Chinese Annotation Guidelines for Events. National Institute of Standards and Technology (2005)
Chen, Z., Ji, H.: Graph-based Event Coreference Resolution. In: Proceedings of ACL-IJCNLP workshop on TextGraphs-4: Graph-based Methods for Natural Language Processing (2009)
Ji, H., Grishman, R.: Refining Event Extraction Through Unsupervised Cross-document Inference. In: Proceedings of ACL (2008)
Ji, H., Grishman, R., Chen, Z., Gupta, P.: Cross-document Event Extraction, Ranking and Tracking. In: Proceedings of Recent Advances in Natural Language Processing (2009)
Ji, H.: Unsupervised Cross-lingual Predicate Cluster Acquisition to Improve Bi-lingual Event Extraction. In: Proceedings of HLT-NAACL Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics (2009)
Ahn, D.: The stages of event extraction. In: Proceedings of the Workshop on Annotations and Reasoning about Time and Events, pp.1–8 (2006)
Naughton, M. , Stokes, N., Carthy, J.: Investigating statistical techniques for sentence-level event classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 617–624 (2008)
Zhao, Y.Y., Qin, B., Che, W.X., Liu, T.: Research on Chinese Event Extraction. Journal of Chinese Information Processing 22(1), 3–8 (2008)
Tan, H., Zhao, T., Zheng, J.: Identification of Chinese Event and Their Argument Roles. In: Proceedings of Computer and Information Technology Workshops on IEEE 8th International Conference, pp. 14–19 (2008)
Xue, N.: Labeling Chinese Predicates with Semantic Roles. In: Proceedings of Computational Linguistics, pp. 225–255 (2008)
Surdeanu, M., Harabagiu, S.M., Williams, J., Aarseth, P.: Using Predicate-Argument Structures for Information Extraction. In: Proceedings of ACL, pp. 8–15 (2003)
Dorr, B.J., Zajic, D.M., Schwartz, R.M.: Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In: Proceedings of HLT-NAACL, W03-0501 (2003)
Tesnière, L.: Esquisse d’une syntaxe structurale. Klincksieck, Paris (1953)
Tesnière, L.: Èlm̀ent de Syntaxe Structurale. Klincksieck, Paris (1959)
Feng, X.: Exploration of trivalent verb in modern Chinese (2004)
Ningjing, L., Weiguo, Z.: A Study of Verification Principle of Valency of Chinese Verbs and Reclassification of Trivalent Verbs. In: Proceedings of 9th Chinese National Conference on Computational Linguistics (CNCCL 2007), pp. 171–177 (2007)
Luhn, H.P.: The Automatic Creation of Literature Abstracts. In: Proceedings of IBM Journal of Research and Development, pp. 159–165 (1958)
Edmundson, H.P.: New Methods in Automatic Extracting. Proceedings of J. ACM, 264–285 (1969)
Paice, C.D., Jones, P.A.: The Identification of Important Concepts in Highly Structured Technical Papers. In: Proceedings of SIGIR, pp. 69–78 (1993)
Paice, C.D.: Constructing literature abstracts by computer: Techniques and prospects. In: Proceedings of Inf. Process. Manage., pp.171–186 (1990)
Dali, L., Fortuna, B.: Triplet Extraction From Sentences Using SVM. In: Proceedings of SiKDD (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W. (2010). Extracting 5W1H Event Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)