Skip to main content

Indexing

  • Chapter
  • First Online:
Data Matching

Part of the book series: Data-Centric Systems and Applications ((DCSA))

  • 5681 Accesses

Abstract

The naive approach of matching two databases— comparing each record from one database with all records from the other database—has a quadratic computation complexity. Clearly this approach is not feasible for today’s large databases that contain many millions or even billions of records. Not only would the number of record pair comparisons be huge, the number of possible matches compared to the number of non-matches would also be very small, because the number of matches only grows linearly with the size of the databases to be matched while the number of record pair comparisons grows quadratically. Techniques are required that reduce the potentially large number of record pairs that are compared, by generating candidate record pairs that likely refer to true matches. This process has traditionally been referred to as blocking, while more generally it is known as indexing. Various indexing techniques for data matching have been developed in the past decade by researchers from different fields. This chapter covers the different issues that need to be considered in order to achieve efficient indexing, it provides an overview of the major indexing techniques proposed, and it includes a comparative evaluation of these techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://secondstring.sourceforge.net

  2. 2.

    Available from: https://sourceforge.net/projects/febrl/

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Christen, P. (2012). Indexing. In: Data Matching. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31164-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31164-2_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31163-5

  • Online ISBN: 978-3-642-31164-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics