A utility based approach for data stream anonymization

Sopaoglu, Ugur; Abul, Osman

doi:10.1007/s10844-019-00577-6

A utility based approach for data stream anonymization

Published: 08 October 2019

Volume 54, pages 605–631, (2020)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

647 Accesses
4 Citations
Explore all metrics

Abstract

Data streams are good models to characterize dynamic, on-line, fast and high-volume data requirements of today’s businesses. However, sensitivity of data is usually an obstacle for deployment of many data streams applications. To address this challenging issue, many privacy preserving models, including k-anonymity, have been adapted to data streams. Data stream anonymization frameworks have already addressed how to preserve data quality as much as possible under bounded delays. In this work, our main motivation is to minimize average delay while keeping data quality high. It is our claim that data utility is a function of both data quality and data aging in data streams processing tasks. However, there is a tradeoff between data aging and data quality optimizations. To this end, we present a tunable data stream k-anonymization framework and an algorithm named UBDSA (Utility Based Approach for Data Stream Anonymization). To attain high quality anonymity groups, UBDSA also introduces a new distance metric, named CAIL (Cardinality Aware Information Loss). Our experimental evaluations compare performance of UBDSA with the literature, and the results show its merit in terms of better average delay and information loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient approximation and privacy preservation algorithms for real time online evolving data streams

Article 20 January 2024

(k, l)-Clustering for Transactional Data Streams Anonymization

A General Algorithm for k-anonymity on Dynamic Databases

References

Abul, O., Bonchi, F., Nanni, M. (2008). Never walk alone: uncertainty for anonymity in moving objects databases. In Proc. of 24th international conference on data engineering (ICDE).
Adult. (2019). Uci machine learning repository. ftp://ftp.ics.uci.edu/pub/.
Aggarwal, C.C. (2003). A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03 (pp. 575–586). New York: ACM. http://doi.acm.org/10.1145/872757.872826.
Aggarwal, C.C. (2005). On k-anonymity and the curse of dimensionality. In Proceedings of the 31st international conference on very large data bases. VLDB Endowment (pp. 901–909).
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A. (2005). Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT).
Apache Spark. (2019). Unified analytics engine for big data. https://spark.apache.org/.
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008). Anonymity preserving pattern discovery. VLDB Journal, 17(4), 703–727.
Article Google Scholar
Cao, F., Estert, M., Qian, W., Zhou, A. (2006). Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM international conference on data mining. SIAM (pp. 328–339).
Cao, J., Carminati, B., Ferrari, E., Tan, K.L. (2011). Castle: continuously anonymizing data streams. IEEE Transactions on Dependable and Secure Computing, 8(3), 337–352.
Article Google Scholar
Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM (pp. 71–80).
Fung, B.C., Wang, K., Yu, P.S. (2005). Top-down specialization for information and privacy preservation. In 21st International conference on data engineering, 2005. ICDE 2005. Proceedings. IEEE (pp. 205–216).
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S. (2009). Data stream mining. In Data mining and knowledge discovery handbook. Springer (pp. 759–787).
Gedik, B., & Liu, L. (2008). Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Transactions on Mobile Computing, 7 (1), 1–18.
Article Google Scholar
Guo, K., & Zhang, Q. (2013). Fast clustering-based anonymization approaches with time constraints for data streams. Knowledge-Based Systems, 46, 95–108.
Article Google Scholar
Hu, X., Sun, Z., Wu, Y., Hu, W., Dong, J. (2009). K-anonymity based on sensitive tuples. In 2009 First international workshop on database technology and applications. IEEE (pp. 91–94).
Kim, S., Sung, M.K., Chung, Y.D. (2014). A framework to preserve the privacy of electronic health data streams. Journal of Biomedical Informatics, 50, 95–106.
Article Google Scholar
Koukis, D., Antonatos, S., Antoniades, D., Markatos, E.P., Trimintzios, P. (2006). A generic anonymization framework for network traffic. In IEEE International Conference on Communications, 2006. ICC’06. IEEE, (Vol. 5 pp. 2302–2309).
Kumar, S.N., & et al. (2013). Sensitive attributes based privacy preserving in data mining using k-anonymity. International Journal of Computer Applications, 84(13), 1–6.
Article Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 25–25).
Li, N., Li, T., Venkatasubramanian, S. (2007). t-closeness: privacy beyond k-anonymity and l-diversity. In IEEE 23rd International conference on data engineering, 2007. ICDE 2007. IEEE (pp. 106–115).
Li, J., Ooi, B.C., Wang, W. (2008). Anonymizing streaming data for privacy protection. In IEEE 24th international conference on data engineering, 2008. ICDE 2008. IEEE (pp. 1367–1369).
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). l-diversity: privacy beyond k-anonymity. In Proceedings of the 22nd international conference on data engineering, 2006. ICDE’06. IEEE (pp. 24–24).
MapReduce. (2019). Mapreduce tutorial. Apache. https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html.
Meyerson, A., & Williams, R. (2004). On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM (pp. 223–228).
Mohamed, M.A., Nagi, M.H., Ghanem, S.M. (2016). A clustering approach for anonymizing distributed data streams. In 2016 11th international conference on computer engineering & systems (ICCES). IEEE (pp. 9–16).
Mohammadian, E., Noferesti, M., Jalili, R. (2014). Fast: fast anonymization of big data streams. In Proceedings of the 2014 international conference on big data science and computing. ACM (p. 23).
Nergiz, M.E., Atzori, M., Saygin, Y., Guc, B. (2009). Towards trajectory anonymization a generalization based approach. Transactions on Data Privacy, 2(106), 47–75.
MathSciNet Google Scholar
Otgonbayar, A., Pervez, Z., Dahal, K. (2016). Toward anonymizing iot data streams via partitioning. In 2016 IEEE 13th International conference on mobile ad hoc and sensor systems (MASS). IEEE (pp. 331–336).
Otgonbayar, A., Pervez, Z., Dahal, K., Eager, S. (2018). K-varp: K-anonymity for varied data streams via partitioning. Information Sciences, 467, 238–255.
Article Google Scholar
Sakpere, A.B., & Kayem, A.V. (2015). Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss. In 2015 international conference on information systems security and privacy (ICISSP). IEEE (pp. 1–11).
Sopaoglu, U., & Abul, O. (2017). A top-down k-anonymization implementation for apache spark. In 2017 IEEE International conference on big data (Big Data). IEEE (pp. 4513–4521).
Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.
Article MathSciNet Google Scholar
Telco. (2019). Telco customer dataset. https://www.kaggle.com/blastchar/telco-customer-churn.
Wang, K., Yu, P.S., Chakraborty, S. (2004). Bottom-up generalization: a data mining solution to privacy protection. In Fourth IEEE international conference on data mining, 2004. ICDM’04. IEEE (pp. 249–256).
Wang, W., Li, J., Ai, C., Li, Y. (2007). Privacy protection on sliding window of data streams. In International conference on collaborative computing: networking, applications and worksharing, 2007. CollaborateCom 2007. IEEE (pp. 213–221).
Wang, P., Lu, J., Zhao, L., Yang, J. (2010). B-castle: an efficient publishing algorithm for k-anonymizing data streams. In 2010 second WRI global congress on intelligent systems (GCIS). IEEE, (Vol. 2 pp. 132–136).
Zakerzadeh, H., & Osborn, S.L. (2011). Faanst: fast anonymizing algorithm for numerical streaming data. In Data privacy management and autonomous spontaneous security. Springer (pp. 36–50).
Zakerzadeh, H., & Osborn, S.L. (2013). Delay-sensitive approaches for anonymizing numerical streaming data. International Journal of Information Security, 12(5), 423–437.
Article Google Scholar
Zhang, J., Yang, J., Zhang, J., Yuan, Y. (2010). Kids: k-anonymization data stream base on sliding window. In 2010 2nd International conference on future computer and Communication (ICFCC). IEEE, (Vol. 2 pp. V2–311).
Zhang, X., Liu, C., Nepal, S., Yang, C., Dou, W., Chen, J. (2014a). A hybrid approach for scalable sub-tree anonymization over big data using mapreduce on cloud. Journal of Computer and System Sciences, 80(5), 1008–1020.
Article MathSciNet Google Scholar
Zhang, X., Yang, L.T., Liu, C., Chen, J. (2014b). A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Transactions on Parallel and Distributed Systems, 25(2), 363–373.
Article Google Scholar
Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers, 64(8), 2293–2307.
Article MathSciNet Google Scholar
Zhou, A., Cao, F., Qian, W., Jin, C. (2008). Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 15(2), 181–214.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
Ugur Sopaoglu & Osman Abul
Department of Big Data and Artificial Intelligence, HAVELSAN Inc., Ankara, Turkey
Ugur Sopaoglu

Authors

Ugur Sopaoglu
View author publications
You can also search for this author in PubMed Google Scholar
Osman Abul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osman Abul.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sopaoglu, U., Abul, O. A utility based approach for data stream anonymization. J Intell Inf Syst 54, 605–631 (2020). https://doi.org/10.1007/s10844-019-00577-6

Download citation

Received: 27 March 2019
Revised: 24 August 2019
Accepted: 27 August 2019
Published: 08 October 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10844-019-00577-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A utility based approach for data stream anonymization

Abstract

Access this article

Similar content being viewed by others

Efficient approximation and privacy preservation algorithms for real time online evolving data streams

(k, l)-Clustering for Transactional Data Streams Anonymization

A General Algorithm for k-anonymity on Dynamic Databases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A utility based approach for data stream anonymization

Abstract

Access this article

Similar content being viewed by others

Efficient approximation and privacy preservation algorithms for real time online evolving data streams

(k, l)-Clustering for Transactional Data Streams Anonymization

A General Algorithm for k-anonymity on Dynamic Databases

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation