International Journal on Digital Libraries

, Volume 19, Issue 4, pp 323–337 | Cite as

Neural ParsCit: a deep learning-based reference string parser

  • Animesh PrasadEmail author
  • Manpreet Kaur
  • Min-Yen Kan


We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain (\(p<0.01\)) over the reported state-of-the-art CRF-only-based parser.


Reference string parsing Sequence labeling CRF LSTM 



We would like to acknowledge the support of the NExT research grant funds, supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its IRC @ SG Funding Initiative. We would also like to gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan X GPU used for this research. We also acknowledge Muthu Kumar Chandrasekaran for insightful feedback on data handling and editing, along with Kishaloy Halder and Wenqiang Lei for their help in the ongoing integration with the current ParsCit pipeline.


  1. 1.
    Bengio, Y.: Learning deep architectures for AI. Found. trends\({\textregistered }\) Mach. Learn.2(1), 1–127 (2009)CrossRefGoogle Scholar
  2. 2.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  3. 3.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: A CPU and GPU math compiler in python. In: Proceedings of 9th Python in Science Conference, pp. 1–7 (2010)Google Scholar
  4. 4.
    Chen, C.C., Yang, K.H., Chen, C.L., Ho, J.M.: Bibpro: a citation parser based on sequence alignment. IEEE Trans. Knowl. Data Eng. 24(2), 236–250 (2012)CrossRefGoogle Scholar
  5. 5.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)zbMATHGoogle Scholar
  6. 6.
    Councill, I.G., Giles, C.L., Kan, M.Y.: Parscit: an open-source CRF reference string parsing package. LREC 8, 661–667 (2008)Google Scholar
  7. 7.
    Cuong, N.V., Chandrasekaran, M.K., Kan, M.Y., Lee, W.S.: Scholarly document information extraction using extensible features for efficient higher order semi-CRFs. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, ACM, pp. 61–64 (2015)Google Scholar
  8. 8.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)CrossRefGoogle Scholar
  9. 9.
    Giles, C.L., Bollacker, K.D., Lawrence, S.: Citeseer: an automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, ACM, pp. 89–98 (1998)Google Scholar
  10. 10.
    Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. arXiv preprint arXiv:1503.04069 (2015)
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Kern, R., Klampfl, S.: Extraction of references using layout and formatting information from scientific articles. D-Lib Mag. 19(9/10), (2013)Google Scholar
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  14. 14.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, vol. 1, pp. 282–289 (2001)Google Scholar
  15. 15.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
  16. 16.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  17. 17.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  18. 18.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)Google Scholar
  19. 19.
    Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)CrossRefGoogle Scholar
  20. 20.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. ICML 3(28), 1310–1318 (2013)Google Scholar
  21. 21.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP 14, 1532–43 (2014)Google Scholar
  22. 22.
    Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRefGoogle Scholar
  23. 23.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 833–840 (2011)Google Scholar
  24. 24.
    Romanello, M., Boschetti, F., Crane, G.: Citations in the digital library of classics: extracting canonical references by using conditional random fields. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, Association for Computational Linguistics, pp. 80–87 (2009)Google Scholar
  25. 25.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, DTIC Document (1985)Google Scholar
  26. 26.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  27. 27.
    Tkaczyk, D., Szostek, P., Fedoryszak, M., Dendek, P.J., Bolikowski, Ł.: Cermine: automatic extraction of structured metadata from scientific literature. IJDAR 18(4), 317–335 (2015)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations