Abstract
This paper focuses on the determination of sensitive data from huge mass of data collected from social network, cloud drives, local repository files etc. With the advancement of technology, numerous technologies have emerged and are actively being used in extracting useful and critical information about criminal activities from big data that get accumulated due to the use of communicating devices and applications. Numerous reduction techniques and data retrieval algorithm have been invented to extract sensitive information from accumulated data of criminals to prevent future criminal activities and to control unexpected events. In this paper, two different reduction techniques – Neural Network and K-Nearest Neighbor algorithms are used. Experiments for both algorithms were done in the similar environment by changing data size and node numbers in the processing cluster. From the experiment, it is found that Neural Network classification algorithm is more superior to retrieve sensitive data from big data than K- nearest neighbor algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tan, W., Blake, M.B., Saleh, I., Dustdar, S.: Social-network-sourced big data analytics. IEEE Internet Comput. 17(5), 62–69 (2013)
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012)
Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)
Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36, 1165–1188 (2012)
Rivinius, J.: Majority of 2013 terrorist attacks occurred in just a few countries, pp. 1–2. Press Release (2014)
Strang, K.D., Alamieyeseigha, S.: What and where are the risks of international terrorist attacks: a descriptive study of the evidence. Int. J. Risk Conting. Manag. (IJRCM) 4(1), 1–20 (2015)
Jayo, M., Diniz, E.H., Zambaldi, F., Christopoulos, T.P.: Groups of services delivered by Brazilian branchless banking and respective network integration models. Electron. Commer. Res. Appl. 11(5), 504–517 (2012)
Kwapien, A.: How big data helps to fight crime. https://www.datapine.com/blog/big-data-helps-to-fight-crime/
Herbert, M.: Understanding terror networks. Mil. Rev. 85(4), 101 (2005)
Mascarenhas, A., Nunes, L.M., Ramos, T.B.: Selection of sustainability indicators for planning: combining stakeholders participation and data reduction techniques. J. Clean. Prod. 92, 295–307 (2015)
Ougiaroglou, S., Diamantaras, K.I., Evangelidis, G.: Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280, 101–110 (2017)
Bou-Harb, E., Debbabi, M., Assi, C.: Cyber scanning: a comprehensive survey. IEEE Commun. Surv. Tutor. 16(3), 1496–1519 (2014)
Herrera-Semenets, V., Pérez-García, O.A., Hernández-León, R., van den Berg, J., Doerr, C.: A data reduction strategy and its application on scan and backscatter detection using rule-based classifiers. Expert Syst. Appl. 95, 272–279 (2018)
Wang, J., Yue, S., Yu, X., Wang, Y.: An efficient data reduction method and its application to cluster analysis. Neurocomputing 238, 234–244 (2017)
Amor, L.B., Lahyani, I., Jmaiel, M.: Data accuracy aware mobile healthcare applications. Comput. Ind. 97, 54–66 (2018)
Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Laclavík, M., Šeleng, M., Hluchý, L.: Towards large scale semantic annotation built on MapReduce architecture. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, Peter M.A. (eds.) ICCS 2008. LNCS, vol. 5103, pp. 331–338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69389-5_38
Thakur, S., Dharavath, R.: Artificial neural network based prediction of malaria abundances using big data: a knowledge capturing approach. Clin. Epidemiol. Glob. Health (2018)
Chen, A.-S., Leung, M.T., Daouk, H.: Application of neural networks to an emerging financial market: forecasting and trading the taiwan stock index. Comput. Oper. Res. 30(6), 901–923 (2003)
Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inf. Fusion 42, 146–157 (2018)
Maillo, J., Triguero, I., Herrera, F.: A MapReduce-based k-nearest neighbor approach for big data classification. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, pp. 167–172. IEEE February 2015
Dixon, J.K.: Pattern recognition with partly missing data. IEEE Trans. Syst. Man Cybern. B Cybern. 9(10), 617–621 (1979)
Inyaem, U., Meesad, P., Haruechaiyasak, C.: Named-entity techniques for terrorism event extraction and classification. In: 2009 Eighth International Symposium on Natural Language Processing SNLP 2009, pp. 175–179. IEEE (2009)
Sanderson, T.M.: Transnational terror and organized crime: blurring the lines. SAIS Rev. Int. Aff. 24(1), 49–61 (2004)
Wang, X., Miller, E., Smarick, K., Ribarsky, W., Chang, R.: Investigative visual analysis of global terrorism. Comput. Graph. Forum 27, 919–926 (2008)
How to Detect Criminal Gangs Using Mobile Phone Data (2014). https://www.technologyreview.com/s/526471/how-to-detect-criminal-gangs-using-mobile-phone-data/
Data Protection Act (1998). https://www.huntonprivacyblog.com/wp-content/uploads/sites/28/2016/11/big-data-and-data-protection.pdf
ur Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods: a survey. Data Science and Engineering 1(4), 265–284 (2016)
Yalagandula, P., Nath, S., Yu, H., Gibbons, P.B., Seshan, S.: Beyond availability: towards a deeper understanding of machine failure characteristics in large distributed systems. In: WORLDS (2004)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Jiang, Y., Zhou, Z.-H.: Editing Training Data for kNN Classifiers with Neural Network Ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28647-9_60
Bagheri, B., Ahmadi, H., Labbafi, R.: Application of data mining and feature extraction on intelligent fault diagnosis by artificial neural network and k-nearest neighbor. In: 2010 XIX International Conference on Electrical Machines (ICEM). IEEE, pp. 1–7 (2010)
Acknowledgment
Project supported by the National Nature Science Foundation of China (No. 60973040, No. 61602057), the Outstanding Young Talent Project of Jilin Providence (No. 2017052005954), the Key Scientific and Technology Projects of Jilin Province. (No. 20130206051GX).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Adhikari, B.K., Zuo, W.L., Maharjan, R., Guo, L. (2018). Sensitive Data Detection Using NN and KNN from Big Data. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11337. Springer, Cham. https://doi.org/10.1007/978-3-030-05063-4_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-05063-4_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05062-7
Online ISBN: 978-3-030-05063-4
eBook Packages: Computer ScienceComputer Science (R0)