Abstract
Record linkage aims to identify records corresponding to the same real-world entity from different databases, while Privacy-Preserving Record Linkage (PPRL) conducts the linkage in a privacy-preserving context where private and sensitive information about individuals is not compromised. Linking records is considered as a classification task where pairs of records from different databases are classified into matches (i.e. they refer to the same entity) or non-matches (i.e. they refer to different entities). Due to the absence of unique entity identifiers across databases, commonly available quasi-identifiers (QIDs), such as name, gender, address, and date of birth, are used to determine the linkage. The values in such QIDs are often prone to data errors and variations making the linkage task challenging.
Fairness in classification is an emerging concept that determines how much a classifier distorts from producing correct predictions with equal probabilities for individuals across different protected groups based on sensitive features (e.g. gender or race). Developing classifiers that are fair with respect to such sensitive features is an important problem for classification in general and specifically for PPRL to mitigate the bias against sensitive and/or minority groups, for example against female group due to higher likelihood of variations in the QIDs such as last name and address. While there have been increased interest in this field, fairness specifically in PPRL research has not been studied in the literature so far. Fairness for PPRL brings in specific challenges and requirements.
In this paper, we study fairness for PPRL classifiers, analyse appropriate fairness criteria/metric for PPRL, study different forms of fairness-bias for PPRL and investigate the effectiveness of using fairness-aware PPRL. Our experimental results conducted on real and synthetically biased datasets show the efficacy and significance of incorporating fairness constraints in the linkage, leading to higher linkage quality in terms of both correctness and fairness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., Wallach, H.: A reductions approach to fair classification. arXiv preprint arXiv:1803.02453 (2018)
Binns, R.: Fairness in machine learning: lessons from political philosophy. J. Mach. Learn. Res. 81, 1–11 (2018)
Brown, A.P., Randall, S.M., Boyd, J.H., Ferrante, A.M.: Evaluation of approximate comparison methods on bloom filters for probabilistic linkage. Int. J. Popul. Data Sci. 4(1), 1–16 (2019)
Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2
Dankar, F., El Emam, K.: A method for evaluating marketer re-identification risk. In: EDBT Workshops, No. 28, Lausanne (2010)
Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings. Proc. Priv. Enhanc. Technol. 2015(1), 92–112 (2015)
Durham, E.A.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Vanderbilt University, Nashville, TN (2012)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Innovations in Theoretical Computer Science Conference, pp. 214–226. ACM (2012)
Dwork, C., Immorlica, N., Kalai, A.T., Leiserson, M.: Decoupled classifiers for group-fair and efficient machine learning. In: Conference on Fairness, Accountability and Transparency, pp. 119–133 (2018)
Fish, B., Kun, J., Lelkes, Á.D.: A confidence-based approach for balancing fairness and accuracy. In: SIAM International Conference on Data Mining, pp. 144–152. SIAM (2016)
Flores, A.W., Bechtel, K., Lowenkamp, C.T.: False positives, false negatives, and false analyses: a rejoinder to machine bias: there’s software used across the country to predict future criminals. And it’s biased against blacks. Fed. Probation 80, 38 (2016)
Krasanakis, E., Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, Y.: Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In: World Wide Web Conference, pp. 853–862 (2018)
Kum, H.C., Krishnamurthy, A., Machanavajjhala, A., Reiter, M.K., Ahalt, S.: Privacy preserving interactive record linkage (PPIRL). JAMIA 21(2), 212–220 (2014)
Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. J. Priv. Confidentiality (1) (2009)
Naumann, F., Herschel, M.: An introduction to duplicate detection. Synth. Lect. Data Manag. 2(1), 1–87 (2010)
Randall, S.M., Ferrante, A.M., Boyd, J.H., Semmens, J.B.: Privacy-preserving record linkage on large real world datasets. J. Biomed. Inform. 50(1), 1 (2014)
Schnell, R.: Privacy preserving record linkage. In: Harron, K., Goldstein, H., Dibben, C. (eds.) Methodological Developments in Data Linkage, pp. 201–225. Wiley, Chichester (2016)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Sweeney, L.: Discrimination in online ad delivery. Queue 11(3), 10–29 (2013)
Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: decoupled classifiers with preference guarantees. In: International Conference on Machine Learning, pp. 6373–6382 (2019)
Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Inf. Syst. 38(6), 946–969 (2013)
Vatsalan, D., Sehili, Z., Christen, P., Rahm, E.: Privacy-preserving record linkage for big data: current approaches and research challenges. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 851–895. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_25
Verma, S., Rubin, J.: Fairness definitions explained. In: International Workshop on Software Fairness (FairWare), pp. 1–7. IEEE (2018)
Zafar, M.B., Valera, I., Rodriguez, M.G., Gummadi, K.P.: Fairness constraints: mechanisms for fair classification. In: International Conference on Artificial Intelligence and Statistics, Florida, USA (2017)
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333 (2013)
Acknowledgement
This work was funded by the Australian Department of Social Sciences (DSS) as part of the Platforms for Open Data (PfOD) project. We would like to thank Waylon Nielsen and Alex Ware, and Maruti Vadrevu from DSS for their support and feedback on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vatsalan, D., Yu, J., Henecka, W., Thorne, B. (2020). Fairness-Aware Privacy-Preserving Record Linkage. In: Garcia-Alfaro, J., Navarro-Arribas, G., Herrera-Joancomarti, J. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2020 2020. Lecture Notes in Computer Science(), vol 12484. Springer, Cham. https://doi.org/10.1007/978-3-030-66172-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-66172-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66171-7
Online ISBN: 978-3-030-66172-4
eBook Packages: Computer ScienceComputer Science (R0)