Abstract
Bibliographic references between scholarly publications contain valuable information for researchers and developers involved with digital repositories. They are indicators of topical similarity between linked texts, impact of the referenced document, and improve navigation in user interfaces of digital libraries. Consequently, several approaches to extraction, parsing and resolving said references have been proposed to date. In this paper we develop a methodology for evaluating parsing and matching algorithms and choosing the most appropriate one for a document collection at hand. We apply the methodology for evaluating reference parsing and matching module of the YADDA2 software platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apache Solr, http://lucene.apache.org/solr/
PostgreSQL, http://www.postgresql.org/
Bolelli, L., Ertekin, S., Giles, C.L.: Clustering Scientific Literature Using Sparse Citation Graph Analysis. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 30–41. Springer, Heidelberg (2006)
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering (2011)
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Garfield, E.: Citation Indexing: Its Theory and Application in Science, Technology, and Humanities. John Wiley & Sons, New York (1979)
Garfield, E.: The history and meaning of the journal impact factor. Journal of the American Medical Association 295(1), 90–93 (2006)
Giles, C., Bollacker, K., Lawrence, S.: CiteSeer: An automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 89–98. ACM (1998)
Goutorbe, C.: Document Interlinking in a Digital Math Library. In: Towards a Digital Mathematics Library, pp. 85–94 (2009)
Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America 102(46) (2005)
Hitchcock, S.M., Carr, L.A., Harris, S.W., Hey, J.M.N., Hall, W.: Citation Linking: Improving Access to Online Journals. Proceedings of Digital Libraries 97, 115–122 (1997)
Lawrence, S., Giles, C.L., Bollacker, K.D.: Autonomous citation matching. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) Proceedings of the Third Annual Conference on Autonomous Agents AGENTS 1999, vol. 1, pp. 392–393. ACM Press (1999)
Liao, Z., Zhang, Z.: A Generalized Joint Inference Approach for Citation Matching. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 601–607. Springer, Heidelberg (2008)
Macskassy, S.A., Provost, F.: Classification in Networked Data: A Toolkit and a Univariate Case Study. Journal of Machine Learning Research 8, 935–983 (2007)
McCallum, A., Nigam, K., Rennie, J.: Automating the construction of internet portals with machine learning. Information Retrieval, 127–163 (2000)
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Proceedings of NIPS 2002. MIT Press (2002)
Poon, H., Domingos, P.: Joint Inference in Information Extraction. In: Artificial Intelligence, vol. 22, pp. 913–918. AAAI Press (2007)
Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiski, A., Sojka, P.: EuDML Towards the European Digital Mathematics Library. In: Towards a Digital Mathematics Library, pp. 11–26 (2010), http://www.eudml.eu/
Tkaczyk, D., Bolikowski, L., Czeczko, A., Rusek, K.: A modular metadata extraction system for born-digital articles. In: 10th IAPR International Workshop on Document Analysis Systems, pp. 11–16 (2012)
Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Proc. UAI, pp. 593–601 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fedoryszak, M., Bolikowski, Ł., Tkaczyk, D., Wojciechowski, K. (2013). Methodology for Evaluating Citation Parsing and Matching. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-35647-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)