Abstract
Technological advancements in the field of Big Data and IoT have led to unprecedented growth in digital data. Data is collected from multiple distributed sources by business organizations, government agencies, and healthcare sectors. Data collected is mined to uncover valuable data, and the insights they provide are used by these organizations for optimized decision making. Data thus amassed may also contain sensitive personal information of individuals that are at risk of disclosure during analytics. Hence, there is a need for a privacy-aware system that enforces sensitive data protection. But such a system constrains the usefulness of data. Study shows that although significant findings do exist for balancing these contradicting objectives, the efficacy and scalability of these solutions continue to challenge the research community, given the volume of Big Data. Assessing the appropriate blend of these objectives for mutual benefit of organizations and customers requires leveraging the benefit of the modern tools and technologies in the Big Data ecosystem. This research study extensively reviews the previous work in the direction of privacy preserved Big Data analytics, and the review is first of its kind in exploring the challenges that have to be overcome in striking a balance between data value, privacy, scalability, and performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Xu L, Jiang C, Wang J, Yuan J, Ren Y (2014) Information security in Big Data: privacy and data mining. IEEE Trans 2
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Trans 5
Vennila S, Priyadarshini J (2015) Scalable privacy preservation in Big Data: a survey. Procedia Comput Sci 50:369–373
Mehta BB, Rao UP (2016) Privacy preserving unstructured big data analytics: issues and challenges. Procedia Comput Sci 78:120–124
Zhao Y, Du M, Le J, Luo Y (2009) A survey on privacy preserving approaches in data publishing. In: Proceedings of IEEE 1st international workshop on database technology and application, Apr 2009, pp 128–131
Aggarwal CC, Yu PS (2008) A general survey of privacy-preserving data mining models and algorithms. In: Privacy-preserving data mining. Springer, New York, NY, USA, pp 11–52
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33(1):50–57
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):571–588
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and ℓ-diversity. Citiseer
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5), 557–570
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discovery Data 1(1):3
Abdelhameed SA, Moussa SM, Khalifa ME (2018) Privacy-preserving tabular data publishing: a comprehensive evaluation from web to cloud. Comput Secur 72:74
Narendra Kumar NV, Shyamasundar RK (2016) An end-to-end privacy preserving design of a map-reduce framework. In: 2016 IEEE 18th international conference on high performance computing and communications
Blass EO, Di Pietro R, Molva R, Önen M (2012) PRISM – privacy-preserving search in MapReduce. In: Fischer-Hübner S, Wright M (eds) Privacy enhancing technologies. PETS 2012. LNCS, vol 7384. Springer, Berlin, Heidelberg
Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11:253. https://doi.org/10.1007/s10207-012-0158-5
Zhang X, Dou W, Pei J, Nepal S, Yang C, Liu C, Chen J (2014) Proximity-aware local-recoding anonymization with MapReduce for scalable big data privacy preservation in cloud
A MapReduce based approach of scalable multidimensional anonymization for big data privacy preservation on cloud
Lefevre K, Dewitt DJ, Ramakrishnan R (2008) Workload-aware anonymization techniques for large-scale datasets. ACM Trans Database Syst 33(3):Article 17
Zhang X, Yang LT, Liu C, Chen J (2014) A scalable two-phase top-down specialization approach for data anonymization using MapReduce on cloud. IEEE Trans Parallel Distrib Syst 25(2)
Sedayao J, Bhardwaj R, Gorade N (2014) Making big data, privacy, and anonymization work together in the enterprise: experiences and issues. In: IEEE international congress on big data, pp 1–7
Wang S, Sinnott RO (2017) Protecting personal trajectories of social media users through differential privacy. Comput Secur
Zhang C, Chang E, Yap (2014) RHC Tagged-MapReduce: a general framework for secure computing with mixed-sensitivity data on hybrid clouds. In: 14th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid, 2014
Al-Zobbi M, Shahrestani S, Ruan C (2017) Implementing a framework for big data anonymity and analytics access control. In: 2017 IEEE Trustcom/BigDataSE/ICESS
Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation
Fan L, Jin H (2015) A practical framework for privacy-preserving data analytics. In: Proceedings of the 24th international conference on World Wide Web (WWW ‘15). International World Wide Web conferences steering committee, Republic and Canton of Geneva, Switzerland, pp 311–321
Dinh TTA, Saxena P, Chang E-C, Ooi BC, Zhang C (2015) M2R: enabling stronger privacy in MapReduce computation. In: 24th USENIX security symposium, 12–14 Aug 2015
Khan SM, Hamlen KW, Kantarcioglu M (2014) Silver lining: enforcing secure information flow at the cloud edge. In: 2014 IEEE international conference on cloud engineering, Boston, 2014. IEEE Computer Society, pp 37–46
Roy I, Setty STV, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX symposium on networked systems design and implementation, NSDI 2010, 28–30 April 2010, San Jose, CA, USA. USENIX Association, pp 297–312
Liu W, Selcuk Uluagac A, Beyah R (2014) MACA: a privacy-preserving multi-factor cloud authentication system utilizing big data. In: IEEE INFOCOM workshops, pp 518–523
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Geetha, P., Naikodi, C., Setty, S.L.N. (2020). Design of Big Data Privacy Framework—A Balancing Act. In: Jain, V., Chaudhary, G., Taplamacioglu, M., Agarwal, M. (eds) Advances in Data Sciences, Security and Applications. Lecture Notes in Electrical Engineering, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-15-0372-6_19
Download citation
DOI: https://doi.org/10.1007/978-981-15-0372-6_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0371-9
Online ISBN: 978-981-15-0372-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)