Pattern Learning for Chinese Open Information Extraction

  • Yang LiEmail author
  • Qingliang Miao
  • Tong Guo
  • Ji Geng
  • Changjian Hu
  • Feiyu Xu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 957)


Open Information Extraction systems, such as ReVerb, OLLIE, Clause IE, OpenIE 4.2, Sanford OIE, and PredPatt, have attracted much attention on English OIE. However, few studies have been reported on OIE for languages beyond English. This paper presents a Chinese OIE system PLCOIE to extract binary relation triples and N-ary relation tuples from Chinese documents. Our goal is to learn general patterns that is composed of both dependency parsing roles and parts of speech from large corpus, and the learned patterns are used to extract relation tuples from documents. In addition, this paper alleviates trans-classed word issue and light verb construction issue. PLCOIE can extract binary relation triples as well as N-ary relation tuples, and experiments on four real-world data sets show that the results are more precise than state-of-the-art Chinese OIE systems, which indicate that PLCOIE is feasible and effective.


Information extraction Trans-classed word LVC Logistic regression 


  1. 1.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction for the web. In: 20th International Proceedings on International Joint Conferences on Artificial Intelligence, pp. 2670–2676. University of Washington, Seattle (2007)Google Scholar
  2. 2.
    Tseng, Y.H., et al.: Chinese open relation extraction for knowledge acquisition. In: 14th International Proceedings on European Chapter of the Association for Computational Linguistics, pp. 12–16. ACL, Stroudsburg (2014)Google Scholar
  3. 3.
    Zhang, H., Zheng, J.: A study on consistency checking method of part-of-speech tagging for chinese corpora1. IJCLCLP 13(2), 157–169 (2008)Google Scholar
  4. 4.
    Zhang, H., Zheng, J.H., Zhao, Y.: A classification-based algorithm for consistency check of part-of-speech tagging for Chinese corpora. J. Vis. Exp. Jove pii(16), e722–e722 (2008)Google Scholar
  5. 5.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: 16th Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. ACL, Stroudsburg (2011)Google Scholar
  6. 6.
    Butt, M.: The light verb jungle. Workshop on Multi 9 (2003)Google Scholar
  7. 7.
    Mesquita, F., Schmidek, J., Barbosa, D.: Effectiveness and efficiency of open relation extraction. In: 18th Conference on Empirical Methods in Natural Language Processing, pp. 225–252. ACL, Stroudsburg (2013)Google Scholar
  8. 8.
    Gamallo, P., Garcia, M., Ndez-Lanza, S.: Dependency-based open information extraction. In: 18th Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP at the Conference of the European Chapter of the Association for Computational Linguistics, 10–18. ACL, Stroudsburg (2013)Google Scholar
  9. 9.
    Mausam, S.M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: 17th Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. ACL, Stroudsburg (2012)Google Scholar
  10. 10.
    Del Gemulla, L., Corro, R.: Clausie: clause-based open information extraction, pp. 355–366 (2013)Google Scholar
  11. 11.
    Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: 53th Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 344–354. ACL, Stroudsburg (2015)Google Scholar
  12. 12.
    White, A.S., et al.: Universal decompositional semantics on universal dependencies. In: 21th Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1713–1723. ACL, Stroudsburg (2016)Google Scholar
  13. 13.
    Christensen, J., Soderland, S., Etzioni, O., et al.: An analysis of open information extraction based on semantic role labeling. In: 6th Proceedings of the sixth international conference on Knowledge capture, pp. 113–120. ACM, New York (2011)Google Scholar
  14. 14.
    Qiu, L., Zhang, Y.: ZORE: a syntax-based system for Chinese open relation extraction. In: 19th Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1870–1880. ACL, Stroudsburg (2014)Google Scholar
  15. 15.
    Sun, M., Li, X., Wang, X., Fan, M., Feng, Y., Li, P.: Logician: a unified end-to-end neural approach for open-domain information extraction. In: 11th Eleventh ACM International Conference on Web Search and Data Mining. ACM, New York (2018)Google Scholar
  16. 16.
    Zhang, Y., Clark, S.: Syntactic processing using the generalized perceptron and beam search. Comput. Linguist. 37(1), 105–151 (2011)CrossRefGoogle Scholar
  17. 17.
    Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Yang Li
    • 1
    Email author
  • Qingliang Miao
    • 1
  • Tong Guo
    • 1
  • Ji Geng
    • 2
  • Changjian Hu
    • 1
  • Feiyu Xu
    • 1
  1. 1.LenovoBeijingChina
  2. 2.UESTCChengduChina

Personalised recommendations