Methodology for Evaluating Citation Parsing and Matching

  • Mateusz Fedoryszak
  • Łukasz Bolikowski
  • Dominika Tkaczyk
  • Krzyś Wojciechowski
Part of the Studies in Computational Intelligence book series (SCI, volume 467)


Bibliographic references between scholarly publications contain valuable information for researchers and developers involved with digital repositories. They are indicators of topical similarity between linked texts, impact of the referenced document, and improve navigation in user interfaces of digital libraries. Consequently, several approaches to extraction, parsing and resolving said references have been proposed to date. In this paper we develop a methodology for evaluating parsing and matching algorithms and choosing the most appropriate one for a document collection at hand. We apply the methodology for evaluating reference parsing and matching module of the YADDA2 software platform.


citation parsing citation matching evaluation test set YADDA2 software platform 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
    Bolelli, L., Ertekin, S., Giles, C.L.: Clustering Scientific Literature Using Sparse Citation Graph Analysis. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 30–41. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering (2011)Google Scholar
  5. 5.
    Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)CrossRefGoogle Scholar
  6. 6.
    Garfield, E.: Citation Indexing: Its Theory and Application in Science, Technology, and Humanities. John Wiley & Sons, New York (1979)Google Scholar
  7. 7.
    Garfield, E.: The history and meaning of the journal impact factor. Journal of the American Medical Association 295(1), 90–93 (2006)CrossRefGoogle Scholar
  8. 8.
    Giles, C., Bollacker, K., Lawrence, S.: CiteSeer: An automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 89–98. ACM (1998)Google Scholar
  9. 9.
    Goutorbe, C.: Document Interlinking in a Digital Math Library. In: Towards a Digital Mathematics Library, pp. 85–94 (2009)Google Scholar
  10. 10.
    Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America 102(46) (2005)Google Scholar
  11. 11.
    Hitchcock, S.M., Carr, L.A., Harris, S.W., Hey, J.M.N., Hall, W.: Citation Linking: Improving Access to Online Journals. Proceedings of Digital Libraries 97, 115–122 (1997)Google Scholar
  12. 12.
    Lawrence, S., Giles, C.L., Bollacker, K.D.: Autonomous citation matching. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) Proceedings of the Third Annual Conference on Autonomous Agents AGENTS 1999, vol. 1, pp. 392–393. ACM Press (1999)Google Scholar
  13. 13.
    Liao, Z., Zhang, Z.: A Generalized Joint Inference Approach for Citation Matching. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 601–607. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Macskassy, S.A., Provost, F.: Classification in Networked Data: A Toolkit and a Univariate Case Study. Journal of Machine Learning Research 8, 935–983 (2007)Google Scholar
  15. 15.
    McCallum, A., Nigam, K., Rennie, J.: Automating the construction of internet portals with machine learning. Information Retrieval, 127–163 (2000)Google Scholar
  16. 16.
    Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Proceedings of NIPS 2002. MIT Press (2002)Google Scholar
  17. 17.
    Poon, H., Domingos, P.: Joint Inference in Information Extraction. In: Artificial Intelligence, vol. 22, pp. 913–918. AAAI Press (2007)Google Scholar
  18. 18.
    Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiski, A., Sojka, P.: EuDML Towards the European Digital Mathematics Library. In: Towards a Digital Mathematics Library, pp. 11–26 (2010),
  19. 19.
    Tkaczyk, D., Bolikowski, L., Czeczko, A., Rusek, K.: A modular metadata extraction system for born-digital articles. In: 10th IAPR International Workshop on Document Analysis Systems, pp. 11–16 (2012)Google Scholar
  20. 20.
    Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proc. UAI, pp. 593–601 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mateusz Fedoryszak
    • 1
  • Łukasz Bolikowski
    • 1
  • Dominika Tkaczyk
    • 1
  • Krzyś Wojciechowski
    • 1
  1. 1.Interdisciplinary Centre for Mathematical and Computational ModellingWarsaw UniversityWarsawPoland

Personalised recommendations