Abstract
We describe how we wrote the DASFAA 2003 paper titled ”Efficient Record Linkage in Large Data Sets” that received the DASFAA 2013 ten-year best paper award, and the followup research after the paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alsubaiee, S.M., Behm, A., Li, C.: Supporting location-based approximate-keyword queries. In: ACM SIGSPATIAL GIS (2010)
Alsubaiee, S., Li, C.: Fuzzy keyword search on spatial data. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010, Part II. LNCS, vol. 5982, pp. 464–467. Springer, Heidelberg (2010)
Behm, A., Ji, S., Li, C., Lu, J.: Space-constrained gram-based indexing for efficient approximate string search. In: ICDE (2009)
Behm, A., Li, C., Carey, M.J.: Answering approximate string queries on large data sets using external memory. In: ICDE, pp. 888–899 (2011)
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Exploiting relationships for object consolidation. In: IQIS, pp. 47–58 (2005)
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL, pp. 204–213 (2007)
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Exploiting context analysis for combining multiple entity resolution systems. In: SIGMOD Conference, pp. 207–218 (2009)
Jin, L., Koudas, N., Li, C.: Nnh: Improving performance of nearest-neighbor searches using histograms. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 385–402. Springer, Heidelberg (2004)
Jin, L., Koudas, N., Li, C., Tung, A.K.H.: Indexing mixed types for approximate retrieval. In: VLDB, pp. 793–804 (2005)
Jin, L., Li, C.: Selectivity estimation for fuzzy string predicates in large data sets. In: VLDB, pp. 397–408 (2005)
Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)
Jin, L., Li, C., Mehrotra, S.: Efficient record linkage in large data sets. In: DASFAA, pp. 137–146 (2003)
Jin, L., Li, C., Vernica, R.: SEPIA: estimating selectivities of approximate string predicates in large databases. VLDB J. 17(5), 1213–1229 (2008)
Kalashnikov, D.V., Chen, Z., Mehrotra, S., Nuray-Turan, R.: Web people search via connection analysis. IEEE Trans. Knowl. Data Eng. 20(11), 1550–1565 (2008)
Kalashnikov, D.V., Chen, Z., Nuray-Turan, R., Mehrotra, S., Zhang, Z.: West: Modern technologies for web people search. In: ICDE, pp. 1487–1490 (2009)
Koudas, N., Li, C., Tung, A.K.H., Vernica, R.: Relaxing join and selection queries. In: VLDB, pp. 199–210 (2006)
Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM Trans. Database Syst. 31(2), 716–767 (2006)
Kalashnikov, D.V., Mehrotra, S., Chen, Z.: Exploiting relationships for domain-independent data cleaning. In: SDM (2005)
Kalashnikov, D.V., Nuray-Turan, R., Mehrotra, S.: Towards breaking the quality curse.: a web-querying approach to web people search. In: SIGIR, pp. 27–34 (2008)
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)
Li, C., Wang, B., Yang, X.: VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB, pp. 303–314 (2007)
Nuray-Turan, R., Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Exploiting web querying for web people search in weps2. In: WePS (2009)
Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Self-tuning in graph-based reference disambiguation. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 325–336. Springer, Heidelberg (2007)
Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Exploiting web querying for web people search. ACM Trans. Database Syst. 37(1), 7 (2012)
Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Adaptive connection strength models for relationship-based entity resolution. ACM JDIQ (2013)
Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S., Yu, Y.: Attribute and object selection queries on objects with probabilistic attributes. ACM Trans. Database Syst. 37(1), 3 (2012)
Vernica, R., Carey, M., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD Conference (2010)
Vernica, R., Li, C.: Efficient top-k algorithms for fuzzy search in string collections. In: KEYS, pp. 9–14 (2009)
Yang, X., Wang, B., Li, C.: Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In: SIGMOD Conference (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, C., Mehrotra, S., Jin, L. (2013). Record Linkage: A 10-Year Retrospective. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37487-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-37487-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37486-9
Online ISBN: 978-3-642-37487-6
eBook Packages: Computer ScienceComputer Science (R0)