International Journal on Digital Libraries

, Volume 19, Issue 2–3, pp 173–190 | Cite as

Computational linguistics literature and citations oriented citation linkage, classification and summarization

  • Lei LiEmail author
  • Liyuan Mao
  • Yazhao Zhang
  • Junqi Chi
  • Taiwen Huang
  • Xiaoyue Cong
  • Heng Peng


Scientific literature is currently the most important resource for scholars, and their citations have provided researchers with a powerful latent way to analyze scientific trends, influences and relationships of works and authors. This paper is focused on automatic citation analysis and summarization for the scientific literature of computational linguistics, which are also the shared tasks in the 2016 workshop of the 2nd Computational Linguistics Scientific Document Summarization at BIRNDL 2016 (The Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries). Each citation linkage between a citation and the spans of text in the reference paper is recognized according to their content similarities via various computational methods. Then the cited text span is classified to five pre-defined facets, i.e., Hypothesis, Implication, Aim, Results and Method, based on various features of lexicons and rules via Support Vector Machine and Voting Method. Finally, a summary of the reference paper from the cited text spans is generated within 250 words. hLDA (hierarchical Latent Dirichlet Allocation) topic model is adopted for content modeling, which provides knowledge about sentence clustering (subtopic) and word distributions (abstractiveness) for summarization. We combine hLDA knowledge with several other classical features using different weights and proportions to evaluate the sentences in the reference paper. Our systems have been ranked top one and top two according to the evaluation results published by BIRNDL 2016, which has verified the effectiveness of our methods.


Citation linkage Facet classification Summarization Word vector Document vector SVM hLDA 



This work was supported by the National Natural Science Foundation of China under Grant 91546121, 61202247, 71231002 and 61472046; EU FP7 IRSES MobileCloud Project (Grant No. 612212); the 111 Project of China under Grant B08004; EngineeringResearch Center of Information Networks, Ministry of Education(MOE); MOE Liberal arts and Social Sciences Foundation under Grant 16YJA630011; BeijingInstitute of Science and Technology Information; CapInfo Company Limited.


  1. 1.
    Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: IEEE/Wic/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)Google Scholar
  2. 2.
    Garca, J., Laurent, F., Gillard, O.F.: Bag-of-senses versus bag-of-words: comparing semantic and lexical approaches on sentence extraction. In: TAC 2008 Workshop—Notebook Papers and Results (2008)Google Scholar
  3. 3.
    Bellemare, S., Bergler, S., Witte, R.: ERSS at TAC 2008. In: TAC 2008 Proceedings (2008)Google Scholar
  4. 4.
    Conroy, J., Schlesinger, J.D.: CLASSY at TAC 2008 Metrics. In: TAC 2008 Proceedings (2008)Google Scholar
  5. 5.
    Zheng, Y., Takenobu, T.: The TITech Summarization System at TAC-2009. In: TAC 2009 Proceedings (2009)Google Scholar
  6. 6.
    Annie, L., Ani, N.: Predicting summary quality using limited human input. In: TAC 2009 Proceedings (2009)Google Scholar
  7. 7.
    Darling, W.M.: Multi-document summarization from first principles. In: Proceedings of the third Text Analysis Conference, TAC-2010. NIST, vol. 150 (2010)Google Scholar
  8. 8.
    Kokil, J., Muthu, K.C., Sajal, R., MinYen, K.: Overview of the 2nd computational linguistics scientific document summarization shared task (CL-SciSumm 2016). In: The Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), Newark, New Jersey, USA (2016)Google Scholar
  9. 9.
    Genest, P., Lapalme, G., Qubec, M.: Text generation for abstractive summarization. In: TAC 2010 Proceedings (2010)Google Scholar
  10. 10.
    Jin, F., Huang, M., Zhu, X.: The THU summarization systems at TAC 2010. In: Text Analysis Conference (2010)Google Scholar
  11. 11.
    Zhang, R., Ouyang, Y., Li, W., Zhang, R., Ouyang, Y., Li, W.: Guided summarization with aspect recognition. In: TAC 2011 Proceedings (2011)Google Scholar
  12. 12.
    Marina, L., Natalia, V.: Multilingual multi-document summarization with POLY. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)Google Scholar
  13. 13.
    Steinberger, J.: The UWB summariser at multiling-2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)Google Scholar
  14. 14.
    Ardjomand, N., Mcalister, J.C., Rogers, N.J., Tan, P.H., George, A.J., Larkin, D. F.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Multiling 2013 Workshop on Multilingual Multi-Document Summarization, pp. 3899–3905 (2013)Google Scholar
  15. 15.
    Anechitei, D.A., Ignat, E.: Multi-lingual summarization system based on analyzing the discourse structure at MultiLing 2013. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization (2013)Google Scholar
  16. 16.
    El-Haj, M., Rayson, P.: Using a keyness metric for single and multi document summarisation. In: Multiling 2013 Workshop, ACL (2013)Google Scholar
  17. 17.
    Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(40), 592–600 (2014)CrossRefGoogle Scholar
  18. 18.
    Zhang, R., Li, W., Gao, D., Ouyang, Y.: Automatic twitter topic summarization with speech acts. IEEE Trans. Audio Speech Lang. Process. 21(3), 649–658 (2013)CrossRefGoogle Scholar
  19. 19.
    Xu, Y.D., Zhang, X.D., Quan, G.R., Wang, Y.D.: MRS for multi-document summarization by sentence extraction. Tele-commun. Syst. 53(1), 91–98 (2013)CrossRefGoogle Scholar
  20. 20.
    Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: The Workshop on Analytics for Noisy Unstructured Text Data. ACM, pp. 91–97 (2008)Google Scholar
  21. 21.
    Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: ACM Conference on Recommender Systems, pp. 61–68 (2009)Google Scholar
  22. 22.
    Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. Adv. Neural Inf. Process. Syst. 17, 537–544 (2010)Google Scholar
  23. 23.
    Blei, D.M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)Google Scholar
  24. 24.
    Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22. In: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 1982–1989 (2009)Google Scholar
  25. 25.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. Sharing clusters among related groups: hierarchical Dirichlet processes. Advanced Neural Inf Process Syst 37(2), 1385–1392 (2004)Google Scholar
  26. 26.
    Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 87–103 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Celikyilmaz, A., HakkaniTur, D.: A hybrid hierarchical model for multi-document summarization. In: ACL 2010, Proceedings of the, Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 815–824 (2010)Google Scholar
  28. 28.
    Ren, Z., De Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 93–102 (2015)Google Scholar
  29. 29.
    Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59(1), 51–62 (2008)CrossRefGoogle Scholar
  30. 30.
    Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008)Google Scholar
  31. 31.
    Qazvinian, V., Radev, D.R., Özgür, A.: Citation summarization through keyphrase extraction. In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics (2010)Google Scholar
  32. 32.
    Abu-Jbara, A., Radev, D.: Coherent citation-based summarization of scientific papers. In: Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, pp. 500–509 (2010)Google Scholar
  33. 33.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. arXiv:1301.3781 (2013)
  34. 34.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 4, 1188–1196 (2014)Google Scholar
  35. 35.
    Heng, W., Yu, J., Li, L., Liu, Y.: Research on key factors in multi-document topic modelling application with HLDA. J. Chin. Inf. Process. 27(6), 117–127 (2013)Google Scholar
  36. 36.
    Huang, T., Li, L., Zhang, Y., Chi, J.: Summarization based on multiple feature combination. In: Proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS 2016), 2016.8.17–19, Beijing, China, pp. 11–15 (2016)Google Scholar
  37. 37.
    Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization with enhanced hLDA features. In: Springer Lecture Notes in Artificial Intelligence, LNAI10035, Subseries of Lecture Notes in Computer Science, Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated BigData, 15th China National Conference, CCL 2016, and 4th International Symposium, NLP-NABD 2016. Yantai, China, October 15–16, 2016, Proceedings, pp. 299–312 (2016)Google Scholar
  38. 38.
    Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 132–138, June 2016Google Scholar
  39. 39.
    Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 168–174, June 2016Google Scholar
  40. 40.
    Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable Citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), Newark, NJ, USA, pp. 175–186, June 2016Google Scholar
  41. 41.
    Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT. Association of Computational Linguistics, Newark, NJ, USA, pp. 186–191 (2015)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Lei Li
    • 1
    Email author
  • Liyuan Mao
    • 1
  • Yazhao Zhang
    • 1
  • Junqi Chi
    • 1
  • Taiwen Huang
    • 1
  • Xiaoyue Cong
    • 1
  • Heng Peng
    • 1
  1. 1.Center for Intelligence Science and Technology (CIST), School of ComputerBeijing University of Posts and Telecommunications (BUPT)BeijingPeople’s Republic of China

Personalised recommendations