Skip to main content

Extraction and Characterization of Citations in Scientific Papers

Part of the Communications in Computer and Information Science book series (CCIS,volume 475)


We propose a hybrid method for the extraction and characterization of citations in scientific papers using machine learning combined with rule-based approaches. Our protocol consists of the extraction of metadata, bibliography parsing, section titles processing, and find-grained semantic annotation on the sentence level of texts. This allows us to generate Linked Open Data from a set of research papers in XML.


  • Semantic annotation
  • Citation acts
  • CRF
  • RDF graphs
  • Linked Open Data
  • Bibliography parsing

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-12024-9_16
  • Chapter length: 7 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-12024-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.


  1. 1.

    PubMed ID Converter API: api/

  2. 2.

    The D2RQ Platform is a system for accessing relational databases as RDF graphs:

  3. 3.


  1. Bertin, M., Atanassova, I., Lariviere, V., Gingras, Y.: The distribution of references in scientific papers: an analysis of the IMRaD structure. In: Proceedings of the 14th ISSI Conference, pp. 591–603 (2013)

    Google Scholar 

  2. Councill, I.G., Giles, C.L., Kan, M.Y.: ParsCit: an open-source CRF reference string parsing package. In: LREC (2008)

    Google Scholar 

  3. Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 219–228. ACM (2013)

    Google Scholar 

  4. Shotton, D.: Cito, the citation typing ontology. J. Biomed. Semant. 1(Suppl 1), S6 (2010)

    CrossRef  Google Scholar 

  5. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)

    Google Scholar 

  6. Desclés, J.P.: Contextual exploration processing for discourse and automatic annotations of texts. In: FLAIRS Conference, pp. 281–284 (2006)

    Google Scholar 

  7. Bertin, M., Atanassova, I., Descles, J.P.: Automatic analysis of author judgment in scientific articles based on semantic annotation. In: 22nd International Florida Artificial Intelligence, Research Society Conference, Sanibel Island, Florida. AAAI Press (2009)

    Google Scholar 

  8. Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), vol. 2004 (2004)

    Google Scholar 

  9. Cyganiak, R., Bizer, C.: D2R server: a semantic web front-end to existing relational databases. In: XML Tage, 2006, pp. 171–173 (2006)

    Google Scholar 

  10. Shotton, D., Peroni, S.: DoCo, the document components ontology (2011)

    Google Scholar 

Download references


We thank Angelo Di Iorio at the Department of Computer Science and Engineering (DISI) of the University of Bologna for providing the gold standard and the evaluation.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marc Bertin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bertin, M., Atanassova, I. (2014). Extraction and Characterization of Citations in Scientific Papers. In: , et al. Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12023-2

  • Online ISBN: 978-3-319-12024-9

  • eBook Packages: Computer ScienceComputer Science (R0)