Indexing

Christen, Peter

doi:10.1007/978-3-642-31164-2_4

Peter Christen²

Part of the book series: Data-Centric Systems and Applications ((DCSA))

5681 Accesses

Abstract

The naive approach of matching two databases— comparing each record from one database with all records from the other database—has a quadratic computation complexity. Clearly this approach is not feasible for today’s large databases that contain many millions or even billions of records. Not only would the number of record pair comparisons be huge, the number of possible matches compared to the number of non-matches would also be very small, because the number of matches only grows linearly with the size of the databases to be matched while the number of record pair comparisons grows quadratically. Techniques are required that reduce the potentially large number of record pairs that are compared, by generating candidate record pairs that likely refer to true matches. This process has traditionally been referred to as blocking, while more generally it is known as indexing. Various indexing techniques for data matching have been developed in the past decade by researchers from different fields. This chapter covers the different issues that need to be considered in order to achieve efficient indexing, it provides an overview of the major indexing techniques proposed, and it includes a comparative evaluation of these techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://secondstring.sourceforge.net
2.
Available from: https://sourceforge.net/projects/febrl/

Author information

Authors and Affiliations

Research School of Computer Science, The Australian National University, Canberra, ACT, Australia
Peter Christen

Authors

Peter Christen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Christen, P. (2012). Indexing. In: Data Matching. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31164-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-31164-2_4
Published: 05 July 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31163-5
Online ISBN: 978-3-642-31164-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics