Not a Free Lunch, But a Cheap One: On Classifiers Performance on Anonymized Datasets

Alishahi, Mina; Zannone, Nicola

doi:10.1007/978-3-030-81242-3_14

Not a Free Lunch, But a Cheap One: On Classifiers Performance on Anonymized Datasets

Mina Alishahi¹⁰ &
Nicola Zannone¹⁰

Conference paper
First Online: 14 July 2021

1078 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12840))

Abstract

The problem of protecting datasets from the disclosure of confidential information, while published data remains useful for analysis, has recently gained momentum. To solve this problem, anonymization techniques such as k-anonymity, \(\ell \)-diversity, and t-closeness have been used to generate anonymized datasets for training classifiers. While these techniques provide an effective means to generate anonymized datasets, an understanding of how their application affects the performance of classifiers is currently missing. This knowledge enables the data owner and analyst to select the most appropriate classification algorithm and training parameters in order to guarantee high privacy requirements while minimizing the loss of accuracy. In this study, we perform extensive experiments to verify how the classifiers performance changes when trained on an anonymized dataset compared to the original one, and evaluate the impact of classification algorithms, datasets properties, and anonymization parameters on classifiers’ performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://scikit-learn.org.
2.
The code used for our experiments is available at https://github.com/minaalishahi/classifiersperformance.
3.
https://archive.ics.uci.edu/ml/datasets/.

References

Aggarwal, C.C.: Data Classification: Algorithms and Applications. Chapman and Hall CRC (2014)
Google Scholar
Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L.: A systematic comparison and evaluation of k-anonymization algorithms for practitioners. Trans. Data Priv. 7(3), 337–370 (2014)
MathSciNet Google Scholar
Brickell, J., Shmatikov, V.: The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: International Conference on Knowledge Discovery and Data Mining, pp. 70–78. ACM (2008)
Google Scholar
Ciriani, V., di Vimercati, S.D.C., Foresti, S., Samarati, P.: k-anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. ADBS, vol. 34, pp. 105–136. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_5
Chapter Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Friedman, A., Schuster, A., Wolff, R.: k-anonymous decision tree induction. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 151–162. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_18
Chapter Google Scholar
Gong, M., Xie, Y., Pan, K., Feng, K., Qin, A.: A survey on differentially private machine learning. IEEE Comp. Intell. Mag. 15(2), 49–64 (2020)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
MathSciNet MATH Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-7138-7
Book MATH Google Scholar
Khodaparast, F., Sheikhalishahi, M., Haghighi, H., Martinelli, F.: Privacy preserving random decision tree classification over horizontally and vertically partitioned data. In: Conference on Dependable, Autonomic and Secure Computing, pp. 600–607 (2018)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: International Conference on Data Engineering, p. 25 (2006)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007)
Google Scholar
Li, T., Li, N., Zhang, J., Molloy, I.: Slicing: a new approach for privacy preserving data publishing. IEEE Trans. Knowl. Data Eng. 24(3), 561–574 (2012)
Article Google Scholar
Lopuhaä-Zwakenberg, M., Alishahi, M., Kivits, J., Klarenbeek, J., van der Velde, G.J., Zannone, N.: Comparing classifiers’ performance under differential privacy. In: International Conference on Security and Cryptography (SECRYPT) (2021)
Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3-es (2007)
Article Google Scholar
Malle, B., Kieseberg, P., Holzinger, A.: DO NOT DISTURB? Classifier behavior on perturbed datasets. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2017. LNCS, vol. 10410, pp. 155–173. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66808-6_11
Chapter Google Scholar
Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgotten: towards machine learning on perturbed knowledge bases. In: Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES 2016. LNCS, vol. 9817, pp. 251–266. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45507-5_17
Chapter Google Scholar
Martinelli, F., Alishahi, M.S.: Distributed data anonymization. In: Conference on Dependable, Autonomic and Secure Computing (DASC), pp. 580–586 (2019)
Google Scholar
McDonald, A.W.E., Afroz, S., Caliskan, A., Stolerman, A., Greenstadt, R.: Use fewer instances of the letter “i’’: toward writing style anonymization. In: Fischer-Hübner, S., Wright, M. (eds.) PETS 2012. LNCS, vol. 7384, pp. 299–318. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31680-7_16
Chapter Google Scholar
Nergiz, M.E., Gök, M.Z.: Hybrid k-anonymity. Comput. Secur. 44, 51–63 (2014)
Article Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sheikhalishahi, M., Martinelli, F.: Privacy-utility feature selection as a privacy mechanism in collaborative data classification. In: Enabling Technologies: Infrastructure for Collaborative Enterprises, pp. 244–249 (2017)
Google Scholar
Sheikhalishahi, M., Saracino, A., Martinelli, F., Marra, A.L.: Privacy preserving data sharing and analysis for edge-based architectures. Int. J. Inf. Secur. 1(2), 1–23 (2021). https://doi.org/10.1007/s10207-021-00542-x
Article Google Scholar
Sheikhalishahi, M., Zannone, N.: On the comparison of classifiers’ construction over private inputs. In: International Conference on Trust, Security and Privacy in Computing and Communications, pp. 691–698 (2020)
Google Scholar
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)
Article MathSciNet Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1, 60–83 (1945)
Google Scholar
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 1–19 (2019)
Article Google Scholar
Ye, M., Wu, X., Hu, X., Hu, D.: Anonymizing classification data using rough set theory. Knowl.-Based Syst. 43, 82–94 (2013)
Article Google Scholar

Download references

Acknowledgement

This work has been supported by H2020 EU funded project SECREDAS [GA #783119].

Author information

Authors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Mina Alishahi & Nicola Zannone

Authors

Mina Alishahi
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Zannone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mina Alishahi .

Editor information

Editors and Affiliations

University of Calgary, Calgary, AB, Canada
Ken Barker
State University of New York at Canton, Canton, NY, USA
Kambiz Ghazinour

Appendix

Tables 5, 6, 7, and 8 report respectively the Holm scores of classifiers with respect to accuracy, precision, recall, and F1-score. The higher scores show better performance results for the associated classification algorithm and associated metric.

Table 5. Classifier accuracy scores.

Full size table

Table 6. Classifier precision scores.

Full size table

Table 7. Classifier recall scores.

Full size table

Table 8. Classifier F1-score scores.

Full size table

Figures 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16 show the classifiers performance trained on anonymized Credit, Absent, and Optic datasets for different values of \(k, \ell \), and t.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alishahi, M., Zannone, N. (2021). Not a Free Lunch, But a Cheap One: On Classifiers Performance on Anonymized Datasets. In: Barker, K., Ghazinour, K. (eds) Data and Applications Security and Privacy XXXV. DBSec 2021. Lecture Notes in Computer Science(), vol 12840. Springer, Cham. https://doi.org/10.1007/978-3-030-81242-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-81242-3_14
Published: 14 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81241-6
Online ISBN: 978-3-030-81242-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Abstract

Buying options

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation