Skip to main content

Content Visualization of Scientific Corpora Using an Extensible Relational Database Implementation

  • Conference paper
  • First Online:
Book cover Theory and Practice of Digital Libraries -- TPDL 2013 Selected Workshops (TPDL 2013)

Abstract

A method for supervised classification and visualization of collections of scientific publications is presented. By integrating a text classification module, which leads to class probability estimation, along with a dimensionality reduction technique, which represents each class in the 2-D space, any collection of unlabelled documents can be visualized. The classification and visualization modules have been trained on three different datasets and respective categorizations. We provide an example of our system’s functionality by visualizing the content of collections of publications which share a common funding scheme. In order to implement this, we have developed a funding mining submodule which identifies documents of particular funding schemes. All the individual modules have been implemented using the madIS system, which provides data analysis functionalities via an extended relational database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.openaire.eu/en/component/content/article/326-openaireplus-press-release

  2. 2.

    https://developers.google.com/chart/

  3. 3.

    http://cordis.europa.eu/fp7/

  4. 4.

    http://www.wellcome.ac.uk/

  5. 5.

    http://www.numpy.org/

  6. 6.

    More visualization examples are available in http://www.di.uoa.gr/~tyiannak/PubVisulatisations/contentAnalysis.html.

References

  1. arXiv: (https://www.arxiv.org) Cornel University Library article archive

  2. Bird, S.: NLTK: the natural language Toolkit. In: COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)

    Google Scholar 

  3. Rish, I.: An empirical study of the Naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)

    Google Scholar 

  4. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press Inc., New York (2008)

    Google Scholar 

  5. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer-Verlag New York Inc., New York (2001)

    Google Scholar 

  6. Madis: https://code.google.com/p/madis/Complex data analysis/processing made easy

Download references

Acknowledgments

The research leading to these results has received funding from the EU’s FP7 under grant agreement no. RI-283595 (OpenAIREplus).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theodoros Giannakopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Giannakopoulos, T., Stamatogiannakis, E., Foufoulas, I., Dimitropoulos, H., Manola, N., Ioannidis, Y. (2014). Content Visualization of Scientific Corpora Using an Extensible Relational Database Implementation. In: Bolikowski, Ł., Casarosa, V., Goodale, P., Houssos, N., Manghi, P., Schirrwagen, J. (eds) Theory and Practice of Digital Libraries -- TPDL 2013 Selected Workshops. TPDL 2013. Communications in Computer and Information Science, vol 416. Springer, Cham. https://doi.org/10.1007/978-3-319-08425-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08425-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08424-4

  • Online ISBN: 978-3-319-08425-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics