AIR: A Semi-Automatic System for Archiving Institutional Repositories

  • Natalia Ponomareva
  • Jose Manuel Gomez
  • Viktor Pekar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5723)


Manual population of institutional repositories with citation data is an extremely time- and resource-consuming process. These costs act as a bottleneck on the fast growth and update of large repositories. This paper aims to describe the AIR system developed at the university of Wolverhampton to address this problem. The system implements a semi-automatic approach for archiving institutional repositories: firstly, it automatically discovers and extracts bibliographical data from the university web site, and, secondly, it interacts with users, authors or librarians, who verify and correct extracted data. The system is integrated with the Wolverhampton Intellectual Repository and E-theses (WIRE), which was designed on the basis of standard software adopted by many UK universities. In this paper we demonstrate that the system can considerably increase the intake of new publication data into an institutional repository without any compromise to its quality.


Information Extraction Conditional Random Field Bibliographical Data Institutional Repository Orthographic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Haase, P., Broekstra, J., Ehrig, M., Menken, M., Plechawski, M., Pyszlak, P., Schnizler, B., Siebes, R., Staab, S., Tempich, C.: Bibster - a semantics-based bibliographic peer-to-peer system. In: Proceedings of the Third International Semantic Web Conference, pp. 122–136 (2004)Google Scholar
  2. 2.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
  3. 3.
    Kohavi, R.: The power of decision tables. In: Proceedings of the European Conference on Machine Learning, pp. 174–189. Springer, Heidelberg (1995)Google Scholar
  4. 4.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  5. 5.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
  6. 6.
    McCallum, A.K.: Mallet: A machine learning for language toolkit (2002),

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Natalia Ponomareva
    • 1
  • Jose Manuel Gomez
    • 2
  • Viktor Pekar
    • 3
  1. 1.University of WolverhamptonUK
  2. 2.University of AlicanteSpain
  3. 3.Oxford University Press 

Personalised recommendations