Advertisement

Indexing

  • Peter Christen
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

The naive approach of matching two databases— comparing each record from one database with all records from the other database—has a quadratic computation complexity. Clearly this approach is not feasible for today’s large databases that contain many millions or even billions of records. Not only would the number of record pair comparisons be huge, the number of possible matches compared to the number of non-matches would also be very small, because the number of matches only grows linearly with the size of the databases to be matched while the number of record pair comparisons grows quadratically. Techniques are required that reduce the potentially large number of record pairs that are compared, by generating candidate record pairs that likely refer to true matches. This process has traditionally been referred to as blocking, while more generally it is known as indexing. Various indexing techniques for data matching have been developed in the past decade by researchers from different fields. This chapter covers the different issues that need to be considered in order to achieve efficient indexing, it provides an overview of the major indexing techniques proposed, and it includes a comparative evaluation of these techniques.

Keywords

Inverted Index True Match Indexing Technique Data Match Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Peter Christen
    • 1
  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia

Personalised recommendations