Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy

  • Alexander F. Gelbukh
Conference paper

DOI: 10.1007/3-540-44469-6_49

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1873)
Cite this paper as:
Gelbukh A.F. (2000) Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy. In: Ibrahim M., Küng J., Revell N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg

Abstract

A full-text information retrieval system has to deal with various phenomena of string equivalence: ignore case matching, morphological inflection, derivation, synonymy, and hyponymy or hyperonymy. Technically, this can be handled either at the time of indexing by reducing equivalent strings to a common form or at the time of query processing by enriching the query with the whole set of the equivalent forms. We argue for that the latter way allows for greater flexibility and easier maintenance, while being more affordable than it is usually considered. Our proposal consists in enriching the query only with those forms that really appear in the document base. Our experiments with a thesaurus-based information retrieval system showed only insignificant increase of the query size on average with a 200-megabyte document base, even with highly inflective Spanish language.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Alexander F. Gelbukh
    • 1
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)D.F.Mexico

Personalised recommendations