Skip to main content
  • Book
  • © 2007

Data Quality and Record Linkage Techniques

Authors:

(view affiliations)
  • There are no other books available addressing this subject

  • Readers will find this book a mixture of practical advice, mathematical rigor, management insight and philosophy

  • The authors also discuss the software that has been developed to apply the techniques described in the text

  • Includes supplementary material: sn.pub/extras

Buying options

eBook
USD 109.00
Price excludes VAT (USA)
  • ISBN: 978-0-387-69505-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD 139.99
Price excludes VAT (USA)

This is a preview of subscription content, access via your institution.

Table of contents (20 chapters)

  1. Front Matter

    Pages I-XIII
  2. Introduction

    1. Front Matter

      Pages 1-1
    2. Introduction

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 1-3
  3. Data Quality: What It is, Why It is Important, and How to Achieve It

    1. Front Matter

      Pages 5-5
    2. What is Data Quality and Why Should We Care?

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 7-15
    3. Examples of Entities Using Data\break to their Advantage/Disadvantage

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 17-27
    4. Properties of Data Quality and Metrics for Measuring It

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 29-35
    5. Basic Data Quality Tools

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 37-48
  4. Specialized Tools for Database Improvement

    1. Front Matter

      Pages 49-49
    2. Mathematical Preliminaries for Specialized Data Quality Techniques

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 51-60
    3. Automatic Editing and Imputation of Sample Survey Data

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 61-80
    4. Record Linkage – Methodology

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 81-92
    5. Estimating the Parameters of the Fellegi–Sunter Record Linkage Model

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 93-106
    6. Standardization and Parsing

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 107-114
    7. Phonetic Coding Systems for Names

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 115-121
    8. Blocking

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 123-130
    9. String Comparator Metrics for Typographical Error

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 131-135
  5. Record Linkage Case Studies

    1. Front Matter

      Pages 137-137
    2. Duplicate FHA Single-Family Mortgage Records

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 139-149
    3. Record Linkage Case Studies in the Medical, Biomedical, and Highway Safety Areas

      • Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
      Pages 151-158

About this book

This book helps practitioners gain a deeper understanding, at an applied level, of the issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models. Here, we focus on the Fellegi-Holt edit-imputation model, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter record linkage model. Brief examples are included to show how these techniques work.

In the second part of the book, the authors present real-world case studies in which one or more of these techniques are used. They cover a wide variety of application areas. These include mortgage guarantee insurance, medical, biomedical, highway safety, and social insurance as well as the construction of list frames and administrative lists.

Readers will find this book a mixture of practical advice, mathematical rigor, management insight and philosophy. The long list of references at the end of the book enables readers to delve more deeply into the subjects discussed here. The authors also discuss the software that has been developed to apply the techniques described in our text.

Thomas N. Herzog, Ph.D., ASA is the Chief Actuary at the U.S. Department of Housing and Urban Development. He holds a Ph.D. in mathematics from the University of Maryland and is also an Associate of the Society of Actuaries. He is the author or co-author of books on Credibility Theory, Monte Carlo Methods, and Models for Quantifying Risk.

Fritz J. Scheuren, Ph.D., is a Vice President for Statistics with the National Opinion Research Center at the University of Chicago. He has a Ph.D. in statistics from the George Washington University. He is much published with over 300 papers and monographs. He is the 100th President of the American Statistical Association and a Fellow of both the American Statistical Association and the American Association for the Advancement of Science.

William E. Winkler, Ph.D., is Principal Researcher at the U.S. Census Bureau. He holds a Ph.D. in probability theory from Ohio State University and is a Fellow of the American Statistical Association. He has more than 130 papers in areas such as automated record linkage and data quality. He is the author or co-author of eight generalized software systems, some of which are used for production in the largest survey and administrative-list situations.

Keywords

  • coding
  • database
  • dataquality
  • editing
  • imputation
  • missing data
  • recordlinkage

Reviews

From the reviews:

"Data Quality and Record Linkage Techniques is a landmark publication that will facilitate the work of actuaries and other statistical professionals." Douglas C. Borton for The Actuarial Digest

"This book is intended as a primer on editing, imputation and record linkage for analysts who are responsible for the quality of large databases. … The book provides an extended bibliography with references … . The examples given in the book can be valuable for organizations responsible for the quality of databases, in particular when these databases are constructed by linking several different data sources." (T. de Waal, Kwantitatieve Methoden, October, 2007)

"Tom Herzog has a history of writing books...that most mathematically literate people believe they already understand pretty well--until they read the book....This book...[is] interesting and informative. Anyone who works with large databases should read it." (Bruce D. Schoebel, Contingencies, Jan/Feb 2008)

"Who should read this book? The short answer is everyone who is concerned about data quality and what can be done to improve it. Buy a copy for yourself; buy another copy for your IT support." (Kevin Pledge, CompAct, October 2007)

"Data Quality and Record Linkage Techniques is one of the few books on data quality and record linkage that try to cover and discuss the possible errors in different types of data in practical situations. … The intended audience consists of actuaries, economists, statisticians and computer scientists. … This is a good short book for an overview of data quality problems and record linkage techniques. … Statisticians, data analysts and indeed anyone who is going to collect data should first read this book … ." (Waqas Ahmed Malik and Antony Unwin, Psychometrika, Vol. 73 (1), 2008)

"This book covers two related and important topics: data quality and record linkage. … case studies are the book’s major strength; they contain a treasure trove of useful guidelines and tips. For that reason, the book is an excellent purchase for practitioners in business, government, and research settings who plan to undertake major data collection or record linkage efforts. … serves as a stand-alone resource on record linkage techniques. … The book is aimed squarely at practitioners." (Jerome Reiter, Journal of the American Statistical Association, Vol. 103 (482), 2008)

"The book provides a good, sound, verbal introduction and summary, and a useful point of departure into the more technical side of database quality and record linkage problems. In summary, it should be a core sourcebook for non-mathematical statisticians in official statistics agencies, and database designers and managers in government and commerce. It also provides a useful introduction to this important topic, and a comprehensive reference list for further study, for professional statisticians and academics." (Stephan Haslett, International Statistical Reviews, Vol. 76 (2), 2008)

Authors and Affiliations

  • Office of Evaluation, Federal Housing Administration, U.S. Department of Housing and Urban Development, Washington DC

    Thomas N. Herzog

  • National Opinion Research Center, University of Chicago, Alexandria

    Fritz J. Scheuren

  • Statistical Research Division, U.S. Census Bureau, Washington DC

    William E. Winkler

Bibliographic Information

Buying options

eBook
USD 109.00
Price excludes VAT (USA)
  • ISBN: 978-0-387-69505-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD 139.99
Price excludes VAT (USA)