Sensitive Data Detection Using NN and KNN from Big Data

  • Binod Kumar AdhikariEmail author
  • Wan Li Zuo
  • Ramesh Maharjan
  • Lin Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11337)


This paper focuses on the determination of sensitive data from huge mass of data collected from social network, cloud drives, local repository files etc. With the advancement of technology, numerous technologies have emerged and are actively being used in extracting useful and critical information about criminal activities from big data that get accumulated due to the use of communicating devices and applications. Numerous reduction techniques and data retrieval algorithm have been invented to extract sensitive information from accumulated data of criminals to prevent future criminal activities and to control unexpected events. In this paper, two different reduction techniques – Neural Network and K-Nearest Neighbor algorithms are used. Experiments for both algorithms were done in the similar environment by changing data size and node numbers in the processing cluster. From the experiment, it is found that Neural Network classification algorithm is more superior to retrieve sensitive data from big data than K- nearest neighbor algorithm.


Terrorism Hadoop Distributed File System (HDFS) MapReduce Neurons Cluster 



Project supported by the National Nature Science Foundation of China (No. 60973040, No. 61602057), the Outstanding Young Talent Project of Jilin Providence (No. 2017052005954), the Key Scientific and Technology Projects of Jilin Province. (No. 20130206051GX).


  1. 1.
    Tan, W., Blake, M.B., Saleh, I., Dustdar, S.: Social-network-sourced big data analytics. IEEE Internet Comput. 17(5), 62–69 (2013)CrossRefGoogle Scholar
  2. 2.
    Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
  3. 3.
    Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012)CrossRefGoogle Scholar
  4. 4.
    Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)CrossRefGoogle Scholar
  5. 5.
    Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36, 1165–1188 (2012)CrossRefGoogle Scholar
  6. 6.
    Rivinius, J.: Majority of 2013 terrorist attacks occurred in just a few countries, pp. 1–2. Press Release (2014)Google Scholar
  7. 7.
    Strang, K.D., Alamieyeseigha, S.: What and where are the risks of international terrorist attacks: a descriptive study of the evidence. Int. J. Risk Conting. Manag. (IJRCM) 4(1), 1–20 (2015)CrossRefGoogle Scholar
  8. 8.
    Jayo, M., Diniz, E.H., Zambaldi, F., Christopoulos, T.P.: Groups of services delivered by Brazilian branchless banking and respective network integration models. Electron. Commer. Res. Appl. 11(5), 504–517 (2012)CrossRefGoogle Scholar
  9. 9.
    Kwapien, A.: How big data helps to fight crime.
  10. 10.
    Herbert, M.: Understanding terror networks. Mil. Rev. 85(4), 101 (2005)Google Scholar
  11. 11.
    Mascarenhas, A., Nunes, L.M., Ramos, T.B.: Selection of sustainability indicators for planning: combining stakeholders participation and data reduction techniques. J. Clean. Prod. 92, 295–307 (2015)CrossRefGoogle Scholar
  12. 12.
    Ougiaroglou, S., Diamantaras, K.I., Evangelidis, G.: Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280, 101–110 (2017)CrossRefGoogle Scholar
  13. 13.
    Bou-Harb, E., Debbabi, M., Assi, C.: Cyber scanning: a comprehensive survey. IEEE Commun. Surv. Tutor. 16(3), 1496–1519 (2014)CrossRefGoogle Scholar
  14. 14.
    Herrera-Semenets, V., Pérez-García, O.A., Hernández-León, R., van den Berg, J., Doerr, C.: A data reduction strategy and its application on scan and backscatter detection using rule-based classifiers. Expert Syst. Appl. 95, 272–279 (2018)CrossRefGoogle Scholar
  15. 15.
    Wang, J., Yue, S., Yu, X., Wang, Y.: An efficient data reduction method and its application to cluster analysis. Neurocomputing 238, 234–244 (2017)CrossRefGoogle Scholar
  16. 16.
    Amor, L.B., Lahyani, I., Jmaiel, M.: Data accuracy aware mobile healthcare applications. Comput. Ind. 97, 54–66 (2018)CrossRefGoogle Scholar
  17. 17.
    Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010)Google Scholar
  18. 18.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)Google Scholar
  19. 19.
    Laclavík, M., Šeleng, M., Hluchý, L.: Towards large scale semantic annotation built on MapReduce architecture. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, Peter M.A. (eds.) ICCS 2008. LNCS, vol. 5103, pp. 331–338. Springer, Heidelberg (2008). Scholar
  20. 20.
    Thakur, S., Dharavath, R.: Artificial neural network based prediction of malaria abundances using big data: a knowledge capturing approach. Clin. Epidemiol. Glob. Health (2018)Google Scholar
  21. 21.
    Chen, A.-S., Leung, M.T., Daouk, H.: Application of neural networks to an emerging financial market: forecasting and trading the taiwan stock index. Comput. Oper. Res. 30(6), 901–923 (2003)CrossRefGoogle Scholar
  22. 22.
    Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inf. Fusion 42, 146–157 (2018)CrossRefGoogle Scholar
  23. 23.
    Maillo, J., Triguero, I., Herrera, F.: A MapReduce-based k-nearest neighbor approach for big data classification. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, pp. 167–172. IEEE February 2015Google Scholar
  24. 24.
    Dixon, J.K.: Pattern recognition with partly missing data. IEEE Trans. Syst. Man Cybern. B Cybern. 9(10), 617–621 (1979)CrossRefGoogle Scholar
  25. 25.
    Inyaem, U., Meesad, P., Haruechaiyasak, C.: Named-entity techniques for terrorism event extraction and classification. In: 2009 Eighth International Symposium on Natural Language Processing SNLP 2009, pp. 175–179. IEEE (2009)Google Scholar
  26. 26.
    Sanderson, T.M.: Transnational terror and organized crime: blurring the lines. SAIS Rev. Int. Aff. 24(1), 49–61 (2004)CrossRefGoogle Scholar
  27. 27.
    Wang, X., Miller, E., Smarick, K., Ribarsky, W., Chang, R.: Investigative visual analysis of global terrorism. Comput. Graph. Forum 27, 919–926 (2008)CrossRefGoogle Scholar
  28. 28.
  29. 29.
  30. 30.
    ur Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods: a survey. Data Science and Engineering 1(4), 265–284 (2016)CrossRefGoogle Scholar
  31. 31.
    Yalagandula, P., Nath, S., Yu, H., Gibbons, P.B., Seshan, S.: Beyond availability: towards a deeper understanding of machine failure characteristics in large distributed systems. In: WORLDS (2004)Google Scholar
  32. 32.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  33. 33.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  34. 34.
    Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)CrossRefGoogle Scholar
  35. 35.
    Jiang, Y., Zhou, Z.-H.: Editing Training Data for kNN Classifiers with Neural Network Ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004). Scholar
  36. 36.
    Bagheri, B., Ahmadi, H., Labbafi, R.: Application of data mining and feature extraction on intelligent fault diagnosis by artificial neural network and k-nearest neighbor. In: 2010 XIX International Conference on Electrical Machines (ICEM). IEEE, pp. 1–7 (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.Amrit Campus, Tribhuvan UniversityKathmanduNepal
  3. 3.School of Economic ManagementChangchun University of Science and TechnologyChangchunChina

Personalised recommendations