Extraction and Characterization of Citations in Scientific Papers

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 475)


We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.


Semantic annotation Citation acts CRF RDF graphs Linked Open Data Bibliography parsing 



We thank Angelo Di Iorio at the Department of Computer Science and Engineering (DISI) of the University of Bologna for providing the gold standard and the evaluation.


  1. 1.
    Bertin, M., Atanassova, I., Lariviere, V., Gingras, Y.: The distribution of references in scientific papers: an analysis of the IMRaD structure. In: Proceedings of the 14th ISSI Conference, pp. 591–603 (2013)Google Scholar
  2. 2.
    Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: LREC (2008)Google Scholar
  3. 3.
    Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 219–228. ACM (2013)Google Scholar
  4. 4.
    Shotton, D.: Cito, the citation typing ontology. J. Biomed. Semant. 1(Suppl 1), S6 (2010)CrossRefGoogle Scholar
  5. 5.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)Google Scholar
  6. 6.
    Desclés, J.P.: Contextual exploration processing for discourse and automatic annotations of texts. In: FLAIRS Conference, pp. 281–284 (2006)Google Scholar
  7. 7.
    Bertin, M., Atanassova, I., Descles, J.P.: Automatic analysis of author judgment in scientific articles based on semantic annotation. In: 22nd International Florida Artificial Intelligence, Research Society Conference, Sanibel Island, Florida. AAAI Press (2009)Google Scholar
  8. 8.
    Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), vol. 2004 (2004)Google Scholar
  9. 9.
    Cyganiak, R., Bizer, C.: D2R server: a semantic web front-end to existing relational databases. In: XML Tage, 2006, pp. 171–173 (2006)Google Scholar
  10. 10.
    Shotton, D., Peroni, S.: DoCo, the document components ontology (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.CIRSTUniversité du Québec à MontréalMontrealCanada

Personalised recommendations