Abstract
Technology has transformed our socioeconomic life. We are more dependent on technology for our day-to-day activities like banking, e-commerce, retail, etc. Extensive usage of smart phone apps and enormous interest of people for social media has led to a digital data rich environment where significantly large-scale data is being generated and shared by organizations for better decision making and foster businesses through data analytics. However, data analytics involves privacy threats leading to disclosure of personal and sensitive data without the user's consent. Conventional data analytics involved analytics on the data as a whole using aggregate queries. Modern applications like recommendation systems, digital marketing, etc. involves analytics on person-specific individual records which is more harmful to individual privacy. In this paper, we examine various privacy-related risks, privacy preservation strategies with their potentials and limitations, also highlight the important aspects of many privacy legislations made in various countries including Personal Data Protection bill of India and European Union's General Data Protection Regulation (GDPR).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
P. Ducange, R. Pecori, P. Mezzina, A glimpse on big data analytics in the framework of marketing strategies. Soft. Comput. 22(1), 325–342 (2018)
A. Chauhan, K. Kummamuru, D. Toshniwal, Prediction of places of visit using tweets. Knowl. Inf. Syst. 50(1), 145–166 (2017)
D. Yang, B. Qu, P. Cudré-Mauroux,Privacy-preserving social media data publishing for personalized ranking-based recommendation. IEEE Trans. Knowl. Data Eng. 31(3), 507–520 (2018)
Y. Liu, et al.,A practical privacy-preserving data aggregation (3PDA) scheme for smart grid. IEEE Trans. Ind. Inf. 15(3), 1767–1774 (2018)
G.T. Duncan, et al.,Disclosure limitation methods and information loss for tabular data, in Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies (2001), pp. 35–166.
G.T. Duncan, D. Lambert, Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81(393), 10–18 (1986)
D. Lambert, Measures of disclosure risk and harm. J. Off. Stat. 9, 313 (1993)
K. Spiller, et al., Data privacy: users’ thoughts on quantified self personal data, in Self-Tracking (Palgrave Macmillan, Cham, 2018), pp. 111–124
M. Hettig, et al.:Visualizing risk by example: demonstrating threats arising from android apps, in Symposium on Usable Privacy and Security (SOUPS) (2013)
P.R.M. Rao, S. Murali Krishna, A.P. Siva Kumar, Privacy preservation techniques in big data analytics: a survey. J. Big Data 5(1), 33 (2018)
V.S. Iyengar, Transforming data to satisfy privacy constraints, in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
K. LeFevre, D.J. DeWitt, R. Ramakrishnan, Incognito: efficient full-domain k-anonymity, in Proceedings of the 2005 ACM SIGMOD International Conference on Management of data (2005)
K. LeFevre, D.J. DeWitt, R. Ramakrishnan, Mondrian multidimensional k-anonymity, in 22nd International Conference on Data Engineering (ICDE'06) (IEEE, 2006)
P. Samarati, L. Sweeney,Protecting Privacy When Disclosing Information: K-anonymity and its Enforcement Through Generalization and Suppression (1998)
L. Sweeney, Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 571–588 (2002)
L. Sweeney, k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
R. Williams, On the complexity of optimal k-anonymity, in Proceedings of 23rd ACM SIGMOD-SIGACT-SIGART Symposium Principles of Database Systems (PODS) (ACM, New York, 2004)
X. Xiao, Y. Tao, Personalized privacy preservation, in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (2006)
Y. Rubner, C. Tomasi, L.J. Guibas, The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40(2), 99–121 (2000)
C.C. Aggarwal, S. Yu Philip, A general survey of privacy-preserving data mining models and algorithms, in Privacy-Preserving Data Mining (Springer, Boston, MA, 2008), pp. 11–52
R. Jiang, R. Lu, K.K. Choo, Achieving high performance and privacy-preserving query over encrypted multidimensional big metering data. Future Gen. Comput. Syst. 78, 392–401 (2018)
K. Wang, P.S. Yu, S. Chakraborty, Bottom-up generalization: a data mining solution to privacy protection, in Fourth IEEE International Conference on Data Mining (ICDM'04) (IEEE, 2004), pp. 249–256
B.C.M. Fung, K. Wang, S.Y. Philip, Top-down specialization for information and privacy preservation, in 21st International Conference on Data Engineering (ICDE'05) (IEEE, 2005)
X. Zhang, et al.: A MapReduce based approach of scalable multidimensional anonymization for big data privacy preservation on cloud, in Third International Conference on Cloud and Green Computing (CGC) (IEEE, Piscataway, 2013)
M. Al-Zobbi, S. Shahrestani, C. Ruan, Improving MapReduce privacy by implementing multi-dimensional sensitivity-based anonymization. J. Big Data 4(1), 45 (2017)
C. Schneider, IBM Blogs (2016). https://www.ibm.com/blogs/watson/2016/05/biggest-data-challenges-might-not-even-know
TCS, Emphasizing the Need for Government Regulations on Data Privacy (2016). https://www.tcs.com/content/dam/tcs/pdf/technologies/Cyber-Security/Abstract/Strengthening-Privacy-Protection-with-the-European-General-Data-Protection-Regulation.pdf
X. He et al., Qoe-driven big data architecture for smart city. IEEE Commun. Mag. 56(2), 88–93 (2018)
R. Ramakrishnan, et al., Azure data lake store: a hyperscale distributed file service for big data analytics, in Proceedings of the 2017 ACM International Conference on Management of Data (2017)
A. Beheshti, et al.,Coredb: a data lake service, in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (2017)
T. Shang, et al., A DP Canopy K-means algorithm for privacy preservation of Hadoop platform, in International Symposium on Cyberspace Safety and Security (Springer, Cham, 2017)
Q. Jia, et al., Preserving model privacy for machine learning in distributed systems. IEEE Trans. Parallel Distrib. Syst. 29(8), 1808–1822 (2018)
I. Psychoula, et al., A Deep Learning Approach for Privacy Preservation in Assisted Living. arXiv preprint arXiv:1802.09359 (2018)
M. Guller, Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large Scale Data Analysis (Apress, New York, 2015).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Vasupula, N., Munnangi, V., Daggubati, S. (2022). Modern Privacy Risks and Protection Strategies in Data Analytics. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K.T.V. (eds) Soft Computing and Signal Processing. Advances in Intelligent Systems and Computing, vol 1340. Springer, Singapore. https://doi.org/10.1007/978-981-16-1249-7_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-1249-7_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1248-0
Online ISBN: 978-981-16-1249-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)