AIR: A Semi-Automatic System for Archiving Institutional Repositories
Manual population of institutional repositories with citation data is an extremely time- and resource-consuming process. These costs act as a bottleneck on the fast growth and update of large repositories. This paper aims to describe the AIR system developed at the university of Wolverhampton to address this problem. The system implements a semi-automatic approach for archiving institutional repositories: firstly, it automatically discovers and extracts bibliographical data from the university web site, and, secondly, it interacts with users, authors or librarians, who verify and correct extracted data. The system is integrated with the Wolverhampton Intellectual Repository and E-theses (WIRE), which was designed on the basis of standard software adopted by many UK universities. In this paper we demonstrate that the system can considerably increase the intake of new publication data into an institutional repository without any compromise to its quality.
KeywordsInformation Extraction Conditional Random Field Bibliographical Data Institutional Repository Orthographic Feature
Unable to display preview. Download preview PDF.
- 1.Haase, P., Broekstra, J., Ehrig, M., Menken, M., Plechawski, M., Pyszlak, P., Schnizler, B., Siebes, R., Staab, S., Tempich, C.: Bibster - a semantics-based bibliographic peer-to-peer system. In: Proceedings of the Third International Semantic Web Conference, pp. 122–136 (2004)Google Scholar
- 2.Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
- 3.Kohavi, R.: The power of decision tables. In: Proceedings of the European Conference on Machine Learning, pp. 174–189. Springer, Heidelberg (1995)Google Scholar
- 4.Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
- 5.Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
- 6.McCallum, A.K.: Mallet: A machine learning for language toolkit (2002), http://mallet.cs.umass.edu