Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space

  • Atsuhiro Takasu
Conference paper

DOI: 10.1007/3-540-45747-X_6

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2458)
Cite this paper as:
Takasu A. (2002) Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space. In: Agosti M., Thanos C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg

Abstract

It is important to utilize retrospective documents when constructing a large digital library. This paper proposes a method for analyzing recognized bibliographic strings using an extended hidden Markov model. The proposed method enables analysis of erroneous bibliographic strings and integrates many documents accumulated as printed articles in a citation index. The proposed method has the advantage of providing a robust bibliographic matching function using the statistical description of the syntax of bibliographic strings, a language model and an Optical Character Recognition (OCR) error model. The method also has the advantage of reducing the cost of preparing training data for parameter estimation, using records in the bibliographic database.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Atsuhiro Takasu
    • 1
  1. 1.National Institute of InformaticsChiyoda-ku TokyoJapan

Personalised recommendations