Abstract
Today organizations are deeply involved in the Big Data era as the amount of data has been exploding with un-predictable rate and coming from various sources. To process and analyze this massive data, privacy is a major concern together with utility of data. Thus, privacy preservation techniques which target at the balance between utility and privacy begin to be one of the recent trends for big data researchers. In this paper, we discuss a technique for big data privacy preservation by means of clustering method. Here, hierarchical particle swarm optimization (HPSO) is used for clustering similar data. To attain scalability for big data, our method is constructed on the novel cloud infrastructure, MapReduce Hadoop. The method is tested by using a novel UCI dataset and the results are compared with an existing approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, X., Yang, C., Nepal, S., Liu, C., Dou, W., Chen, J.: A mapreduce based approach of scalable multidimensional anonymization for big data privacy preservation on cloud. In: IEEE Third International Conference on Cloud and Green Computing, pp. 105–112. IEEE Press (2013)
Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J.: Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Trans. Computers 64(8), 2293–2307 (2015)
Upmanyu, M., Namboodiri, A.M., Srinathan, K., Jawahar, C.V.: Efficient privacy preserving K-means clustering. In: Chen, H., Chau, M., Li, S.-h., Urs, S., Srinivasa, S., Wang, G. (eds.) PAISI 2010. LNCS, vol. 6122, pp. 154–166. Springer, Heidelberg (2010)
Rajalakshmi, V., Mala, G.S.A.: Anonymization based on nested clustering for privacy preservation in data mining. J. Comput. Sci. Eng. (IJCSE) 4(3), 216–224 (2013)
Lin, J.L., Wei, M.C.: Genetic algorithm-based clustering approach for k-anonymization. J. Expert Syst. Appl. 36, 9784–9792 (2009)
Bhaladhare, P.R., Jinwala, D.C.: A clustering approach for the l-diversity model in privacy preserving data mining using fractional calculus-bacterial foraging optimization algorithm. J. Adv. Comput. Eng. 2014 (2014)
Yin, S., Kaynak, O.: Big data for modern industry: challenges and trends. Proc. the IEEE 103(2), 143–146 (2015)
Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: KDD 2009, Paris, France (2009)
Manta, A.: Literature survey on privacy preserving mechanisms for data publishing. M.S. thesis, Department of Intelligence Systems, Delft University of Technology, Delft, Netherland, (2013)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: â„“-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW 2006) (2006)
Ghazi, M.R., Hadoop, D.: MapReduce and HDFS: a developers perspective. J. Procedia Comput. Sci. 48, 45–50 (2015)
Alam, S., Dobbie, G., Riddle, P., Naeem, M.A.: Particle swarm optimization based hierarchical agglomerative clustering. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 64–68 (2010)
Alam, S., Dobbie, G., Koh, Y.S., Riddle, P., Rehman, S.U.: Research on particle swarm optimization based clustering: a systematic review of literature and techniques. J. Swarm Evol. Comput. 17, 1–13 (2014)
Nouaouria, N., Boukadoum, M.: A particle swarm optimization approach to mixed attribute data-set classification. In: IEEE Symposium on Swarm Intelligence (SIS). IEEE (2011)
Xiao, X., Tao, Y.: Personalized privacy preservation. In: ACM SIGMOD International Conference on Management of Data (SIGMOD 2006), pp. 229–240 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wai, E.N.C., Tsai, PW., Pan, JS. (2017). Hierarchical PSO Clustering on MapReduce for Scalable Privacy Preservation in Big Data. In: Pan, JS., Lin, JW., Wang, CH., Jiang, X. (eds) Genetic and Evolutionary Computing. ICGEC 2016. Advances in Intelligent Systems and Computing, vol 536. Springer, Cham. https://doi.org/10.1007/978-3-319-48490-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-48490-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48489-1
Online ISBN: 978-3-319-48490-7
eBook Packages: EngineeringEngineering (R0)