Skip to main content

Partridge: An Effective System for the Automatic Cassification of the Types of Academic Papers

Abstract

Partridge is a system that enables intelligent search for academic papers by allowing users to query terms within sentences designating a particular core scientific concept (e.g. Hypothesis, Result, etc). The system also automatically classifies papers according to article types (e.g. Review, Case Study). Here, we focus on the latter aspect of the system. For each paper, Partridge automatically extracts the full paper content from PDF files, converts it to XML, determines sentence boundaries, automatically labels the sentences with core scientific concepts, and then uses a random forest model to classify the paper type. We show that the type of a paper can be reliably predicted by a model which analyses the distribution of core scientific concepts within the sentences of the paper. We discuss the appropriateness of many of the existing paper types used by major journals, and their corresponding distributions. Partridge is online and available for use, includes a browser-friendly bookmarklet for new paper submission, and demonstrates a range of possibilities for more intelligent search in the scientific literature. The Partridge instance and further information about the project can be found at http://papro.org.uk.

Keywords

  • Core Scientific Concepts
  • Document Type
  • Determine Sentence Boundaries
  • Random Forest Learning
  • CoreSC

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-02621-3_26
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   189.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-02621-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   249.99
Price excludes VAT (USA)
Fig. 1

Notes

  1. 1.

    http://agricola.nal.usda.gov/

  2. 2.

    http://www.thecochranelibrary.com/

  3. 3.

    http://www.textpresso.org/

  4. 4.

    http://arxiv.org

  5. 5.

    http://plosone.org

  6. 6.

    http://www.sapientaproject.com

  7. 7.

    https://github.com/ravenscroftj/partridge

  8. 8.

    http://api.plos.org/solr/faq/

References

  1. M. Liakata, S. Saha, S. Dobnik, C. Batchelor, D. Rebholz-Schuhmann, Bioinformatics pp. 991–1000 (2012)

    Google Scholar 

  2. M. Liakata, S. Teufel, A. Siddharthan, C. Batchelor, in Proceedings of LREC’10 (2010)

    Google Scholar 

  3. A. Constantin, S. Pettier, A. Voronkov, in Proceedings of the 13th ACM Symposium on Document Engineering (Doc Eng) (2013)

    Google Scholar 

  4. M. Liakata, L.N. Soldatova, et al., in Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (Association for, Computational Linguistics, 2009), pp. 193–200

    Google Scholar 

  5. L. Breiman, Machine, Learning pp. 5–32 (2001)

    Google Scholar 

  6. J. Demšar, B. Zupan, G. Leban, T. Curk, in Knowledge Discovery in Databases PKDD 2004, Lecture Notes in Computer Science, vol. 3202, ed. by J.F. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi. Faculty of Computer and Information Science, University of Ljubljana (Springer, 2004), Lecture Notes in Computer Science, vol. 3202, pp. 537–539. DOI 10.1007/b100704. URL http://www.springerlink.com/index/G58613YV08BX48QJ.pdf

Download references

Acknowledgments

We thank the Leverhulme Trust for the support to Dr Liakata’s Early Career Fellowship and also EMBL-EBI, Cambridge UK for the facilities offered to Dr Liakata.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Ravenscroft .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Ravenscroft, J., Liakata, M., Clare, A. (2013). Partridge: An Effective System for the Automatic Cassification of the Types of Academic Papers. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXX. SGAI 2013. Springer, Cham. https://doi.org/10.1007/978-3-319-02621-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02621-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02620-6

  • Online ISBN: 978-3-319-02621-3

  • eBook Packages: Computer ScienceComputer Science (R0)