Fairness-Aware Privacy-Preserving Record Linkage

Vatsalan, Dinusha; Yu, Joyce; Henecka, Wilko; Thorne, Brian

doi:10.1007/978-3-030-66172-4_1

Dinusha Vatsalan¹¹,
Joyce Yu¹¹,
Wilko Henecka¹¹ &
…
Brian Thorne¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12484))

Included in the following conference series:

797 Accesses
3 Citations

Abstract

Record linkage aims to identify records corresponding to the same real-world entity from different databases, while Privacy-Preserving Record Linkage (PPRL) conducts the linkage in a privacy-preserving context where private and sensitive information about individuals is not compromised. Linking records is considered as a classification task where pairs of records from different databases are classified into matches (i.e. they refer to the same entity) or non-matches (i.e. they refer to different entities). Due to the absence of unique entity identifiers across databases, commonly available quasi-identifiers (QIDs), such as name, gender, address, and date of birth, are used to determine the linkage. The values in such QIDs are often prone to data errors and variations making the linkage task challenging.

Fairness in classification is an emerging concept that determines how much a classifier distorts from producing correct predictions with equal probabilities for individuals across different protected groups based on sensitive features (e.g. gender or race). Developing classifiers that are fair with respect to such sensitive features is an important problem for classification in general and specifically for PPRL to mitigate the bias against sensitive and/or minority groups, for example against female group due to higher likelihood of variations in the QIDs such as last name and address. While there have been increased interest in this field, fairness specifically in PPRL research has not been studied in the literature so far. Fairness for PPRL brings in specific challenges and requirements.

In this paper, we study fairness for PPRL classifiers, analyse appropriate fairness criteria/metric for PPRL, study different forms of fairness-bias for PPRL and investigate the effectiveness of using fairness-aware PPRL. Our experimental results conducted on real and synthetically biased datasets show the efficacy and significance of incorporating fairness constraints in the linkage, leading to higher linkage quality in terms of both correctness and fairness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., Wallach, H.: A reductions approach to fair classification. arXiv preprint arXiv:1803.02453 (2018)
Binns, R.: Fairness in machine learning: lessons from political philosophy. J. Mach. Learn. Res. 81, 1–11 (2018)
Google Scholar
Brown, A.P., Randall, S.M., Boyd, J.H., Ferrante, A.M.: Evaluation of approximate comparison methods on bloom filters for probabilistic linkage. Int. J. Popul. Data Sci. 4(1), 1–16 (2019)
Article Google Scholar
Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2
Book Google Scholar
Dankar, F., El Emam, K.: A method for evaluating marketer re-identification risk. In: EDBT Workshops, No. 28, Lausanne (2010)
Google Scholar
Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings. Proc. Priv. Enhanc. Technol. 2015(1), 92–112 (2015)
Article Google Scholar
Durham, E.A.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Vanderbilt University, Nashville, TN (2012)
Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Innovations in Theoretical Computer Science Conference, pp. 214–226. ACM (2012)
Google Scholar
Dwork, C., Immorlica, N., Kalai, A.T., Leiserson, M.: Decoupled classifiers for group-fair and efficient machine learning. In: Conference on Fairness, Accountability and Transparency, pp. 119–133 (2018)
Google Scholar
Fish, B., Kun, J., Lelkes, Á.D.: A confidence-based approach for balancing fairness and accuracy. In: SIAM International Conference on Data Mining, pp. 144–152. SIAM (2016)
Google Scholar
Flores, A.W., Bechtel, K., Lowenkamp, C.T.: False positives, false negatives, and false analyses: a rejoinder to machine bias: there’s software used across the country to predict future criminals. And it’s biased against blacks. Fed. Probation 80, 38 (2016)
Google Scholar
Krasanakis, E., Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, Y.: Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In: World Wide Web Conference, pp. 853–862 (2018)
Google Scholar
Kum, H.C., Krishnamurthy, A., Machanavajjhala, A., Reiter, M.K., Ahalt, S.: Privacy preserving interactive record linkage (PPIRL). JAMIA 21(2), 212–220 (2014)
Google Scholar
Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. J. Priv. Confidentiality (1) (2009)
Google Scholar
Naumann, F., Herschel, M.: An introduction to duplicate detection. Synth. Lect. Data Manag. 2(1), 1–87 (2010)
Article MATH Google Scholar
Randall, S.M., Ferrante, A.M., Boyd, J.H., Semmens, J.B.: Privacy-preserving record linkage on large real world datasets. J. Biomed. Inform. 50(1), 1 (2014)
Google Scholar
Schnell, R.: Privacy preserving record linkage. In: Harron, K., Goldstein, H., Dibben, C. (eds.) Methodological Developments in Data Linkage, pp. 201–225. Wiley, Chichester (2016)
Chapter Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Google Scholar
Sweeney, L.: Discrimination in online ad delivery. Queue 11(3), 10–29 (2013)
Article Google Scholar
Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: decoupled classifiers with preference guarantees. In: International Conference on Machine Learning, pp. 6373–6382 (2019)
Google Scholar
Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Inf. Syst. 38(6), 946–969 (2013)
Article Google Scholar
Vatsalan, D., Sehili, Z., Christen, P., Rahm, E.: Privacy-preserving record linkage for big data: current approaches and research challenges. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 851–895. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_25
Chapter Google Scholar
Verma, S., Rubin, J.: Fairness definitions explained. In: International Workshop on Software Fairness (FairWare), pp. 1–7. IEEE (2018)
Google Scholar
Zafar, M.B., Valera, I., Rodriguez, M.G., Gummadi, K.P.: Fairness constraints: mechanisms for fair classification. In: International Conference on Artificial Intelligence and Statistics, Florida, USA (2017)
Google Scholar
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333 (2013)
Google Scholar

Download references

Acknowledgement

This work was funded by the Australian Department of Social Sciences (DSS) as part of the Platforms for Open Data (PfOD) project. We would like to thank Waylon Nielsen and Alex Ware, and Maruti Vadrevu from DSS for their support and feedback on this work.

Author information

Authors and Affiliations

CSIRO’s DATA61, Eveleigh, NSW, 2015, Australia
Dinusha Vatsalan, Joyce Yu & Wilko Henecka
Hardbyte, Christchurch, New Zealand
Brian Thorne

Authors

Dinusha Vatsalan
View author publications
You can also search for this author in PubMed Google Scholar
Joyce Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wilko Henecka
View author publications
You can also search for this author in PubMed Google Scholar
Brian Thorne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinusha Vatsalan .

Editor information

Editors and Affiliations

Télécom SudParis, Evry Cedex, France
Joaquin Garcia-Alfaro
Departament d’Enginyeria de la Informació i de les Comunicacions, Universitat Autonoma de Barcelona, Bellaterra, Spain
Guillermo Navarro-Arribas
Escola d’Enginyeria, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Barcelona, Spain
Jordi Herrera-Joancomarti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vatsalan, D., Yu, J., Henecka, W., Thorne, B. (2020). Fairness-Aware Privacy-Preserving Record Linkage. In: Garcia-Alfaro, J., Navarro-Arribas, G., Herrera-Joancomarti, J. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2020 2020. Lecture Notes in Computer Science(), vol 12484. Springer, Cham. https://doi.org/10.1007/978-3-030-66172-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-66172-4_1
Published: 29 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66171-7
Online ISBN: 978-3-030-66172-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics