Skip to main content

Adaptive Temporal Entity Resolution on Dynamic Databases

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7819))

Included in the following conference series:

Abstract

Entity resolution is the process of matching records that refer to the same entities from one or several databases in situations where the records to be matched do not include unique entity identifiers. Matching therefore has to rely upon partially identifying information, such as names and addresses. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, increasingly organisations are challenged by the task of having a stream of query records that need to be matched to a database of known entities. As these query records are matched, they are inserted into the database as either representing a new entity, or as the latest embodiment of an existing entity. We investigate how temporal and dynamic aspects, such as time differences between query and database records and changes in database content, affect matching quality. We propose an approach that adaptively adjusts similarities between records depending upon the values of the records’ attributes and the time differences between records. We evaluate our approach on synthetic data and a large real US voter database, with results showing that our approach can outperform static matching approaches.

This research was funded by the Australian Research Council (ARC), Veda Advantage, and Funnelback Pty. Ltd., under Linkage Project LP100200079.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Winkler, W.E.: Methods for evaluating and creating data quality. Elsevier Information Systems 29(7), 531–550 (2004)

    Article  Google Scholar 

  2. Christen, P.: Data Matching. In: Data-Centric Systems and Appl., Springer (2012)

    Google Scholar 

  3. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  4. Herzog, T., Scheuren, F., Winkler, W.: Data quality and record linkage techniques. Springer (2007)

    Google Scholar 

  5. Aggarwal, C.: Data Streams: Models and Algorithms. Database Management and Information Retrieval, vol. 31. Springer (2007)

    Google Scholar 

  6. Anderson, K., Durbin, E., Salinger, M.: Identity theft. Journal of Economic Perspectives 22(2), 171–192 (2008)

    Article  Google Scholar 

  7. Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. VLDB Endowment 3(1) (2010)

    Google Scholar 

  8. Li, P., Dong, X., Maurino, A., Srivastava, D.: Linking temporal records. Proceedings of the VLDB Endowment 4(11) (2011)

    Google Scholar 

  9. Li, P., Tziviskou, C., Wang, H., Dong, X., Liu, X., Maurino, A., Srivastava, D.: Chronos: Facilitating history discovery by linking temporal records. VLDB Endowment 5(12) (2012)

    Google Scholar 

  10. Whang, S., Garcia-Molina, H.: Entity resolution with evolving rules. VLDB Endowment 3(1-2), 1326–1337 (2010)

    Google Scholar 

  11. Yakout, M., Elmagarmid, A., Elmeleegy, H., Ouzzani, M., Qi, A.: Behavior based record linkage. VLDB Endowment 3(1-2), 439–448 (2010)

    Google Scholar 

  12. Christen, P., Gayler, R.: Towards scalable real-time entity resolution using a similarity-aware inverted index approach. In: AusDM 2008, Glenelg, Australia (2008)

    Google Scholar 

  13. Christen, P., Gayler, R., Hawking, D.: Similarity-aware indexing for real-time entity resolution. In: ACM CIKM 2009, Hong Kong, pp. 1565–1568 (2009)

    Google Scholar 

  14. Pal, A., Rastogi, V., Machanavajjhala, A., Bohannon, P.: Information integration over time in unreliable and uncertain environments. In: WWW, Lyon (2012)

    Google Scholar 

  15. Laxman, S., Sastry, P.: A survey of temporal data mining. Sadhana 31(2) (2006)

    Google Scholar 

  16. Christen, P., Pudjijono, A.: Accurate synthetic generation of realistic personal information. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 507–514. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. North Carolina State Board of Elections: NC voter registration database, ftp://www.app.sboe.state.nc.us/ (last accessed September 11, 2012)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Christen, P., Gayler, R.W. (2013). Adaptive Temporal Entity Resolution on Dynamic Databases. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37456-2_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37455-5

  • Online ISBN: 978-3-642-37456-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics