Skip to main content

Sensitive Data Detection Using NN and KNN from Big Data

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2018)

Abstract

This paper focuses on the determination of sensitive data from huge mass of data collected from social network, cloud drives, local repository files etc. With the advancement of technology, numerous technologies have emerged and are actively being used in extracting useful and critical information about criminal activities from big data that get accumulated due to the use of communicating devices and applications. Numerous reduction techniques and data retrieval algorithm have been invented to extract sensitive information from accumulated data of criminals to prevent future criminal activities and to control unexpected events. In this paper, two different reduction techniques – Neural Network and K-Nearest Neighbor algorithms are used. Experiments for both algorithms were done in the similar environment by changing data size and node numbers in the processing cluster. From the experiment, it is found that Neural Network classification algorithm is more superior to retrieve sensitive data from big data than K- nearest neighbor algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Tan, W., Blake, M.B., Saleh, I., Dustdar, S.: Social-network-sourced big data analytics. IEEE Internet Comput. 17(5), 62–69 (2013)

    Article  Google Scholar 

  2. Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  3. Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012)

    Article  Google Scholar 

  4. Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)

    Article  Google Scholar 

  5. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36, 1165–1188 (2012)

    Article  Google Scholar 

  6. Rivinius, J.: Majority of 2013 terrorist attacks occurred in just a few countries, pp. 1–2. Press Release (2014)

    Google Scholar 

  7. Strang, K.D., Alamieyeseigha, S.: What and where are the risks of international terrorist attacks: a descriptive study of the evidence. Int. J. Risk Conting. Manag. (IJRCM) 4(1), 1–20 (2015)

    Article  Google Scholar 

  8. Jayo, M., Diniz, E.H., Zambaldi, F., Christopoulos, T.P.: Groups of services delivered by Brazilian branchless banking and respective network integration models. Electron. Commer. Res. Appl. 11(5), 504–517 (2012)

    Article  Google Scholar 

  9. Kwapien, A.: How big data helps to fight crime. https://www.datapine.com/blog/big-data-helps-to-fight-crime/

  10. Herbert, M.: Understanding terror networks. Mil. Rev. 85(4), 101 (2005)

    Google Scholar 

  11. Mascarenhas, A., Nunes, L.M., Ramos, T.B.: Selection of sustainability indicators for planning: combining stakeholders participation and data reduction techniques. J. Clean. Prod. 92, 295–307 (2015)

    Article  Google Scholar 

  12. Ougiaroglou, S., Diamantaras, K.I., Evangelidis, G.: Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280, 101–110 (2017)

    Article  Google Scholar 

  13. Bou-Harb, E., Debbabi, M., Assi, C.: Cyber scanning: a comprehensive survey. IEEE Commun. Surv. Tutor. 16(3), 1496–1519 (2014)

    Article  Google Scholar 

  14. Herrera-Semenets, V., Pérez-García, O.A., Hernández-León, R., van den Berg, J., Doerr, C.: A data reduction strategy and its application on scan and backscatter detection using rule-based classifiers. Expert Syst. Appl. 95, 272–279 (2018)

    Article  Google Scholar 

  15. Wang, J., Yue, S., Yu, X., Wang, Y.: An efficient data reduction method and its application to cluster analysis. Neurocomputing 238, 234–244 (2017)

    Article  Google Scholar 

  16. Amor, L.B., Lahyani, I., Jmaiel, M.: Data accuracy aware mobile healthcare applications. Comput. Ind. 97, 54–66 (2018)

    Article  Google Scholar 

  17. Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010)

    Google Scholar 

  18. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  19. Laclavík, M., Šeleng, M., Hluchý, L.: Towards large scale semantic annotation built on MapReduce architecture. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, Peter M.A. (eds.) ICCS 2008. LNCS, vol. 5103, pp. 331–338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69389-5_38

    Chapter  Google Scholar 

  20. Thakur, S., Dharavath, R.: Artificial neural network based prediction of malaria abundances using big data: a knowledge capturing approach. Clin. Epidemiol. Glob. Health (2018)

    Google Scholar 

  21. Chen, A.-S., Leung, M.T., Daouk, H.: Application of neural networks to an emerging financial market: forecasting and trading the taiwan stock index. Comput. Oper. Res. 30(6), 901–923 (2003)

    Article  Google Scholar 

  22. Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Inf. Fusion 42, 146–157 (2018)

    Article  Google Scholar 

  23. Maillo, J., Triguero, I., Herrera, F.: A MapReduce-based k-nearest neighbor approach for big data classification. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, pp. 167–172. IEEE February 2015

    Google Scholar 

  24. Dixon, J.K.: Pattern recognition with partly missing data. IEEE Trans. Syst. Man Cybern. B Cybern. 9(10), 617–621 (1979)

    Article  Google Scholar 

  25. Inyaem, U., Meesad, P., Haruechaiyasak, C.: Named-entity techniques for terrorism event extraction and classification. In: 2009 Eighth International Symposium on Natural Language Processing SNLP 2009, pp. 175–179. IEEE (2009)

    Google Scholar 

  26. Sanderson, T.M.: Transnational terror and organized crime: blurring the lines. SAIS Rev. Int. Aff. 24(1), 49–61 (2004)

    Article  Google Scholar 

  27. Wang, X., Miller, E., Smarick, K., Ribarsky, W., Chang, R.: Investigative visual analysis of global terrorism. Comput. Graph. Forum 27, 919–926 (2008)

    Article  Google Scholar 

  28. How to Detect Criminal Gangs Using Mobile Phone Data (2014). https://www.technologyreview.com/s/526471/how-to-detect-criminal-gangs-using-mobile-phone-data/

  29. Data Protection Act (1998). https://www.huntonprivacyblog.com/wp-content/uploads/sites/28/2016/11/big-data-and-data-protection.pdf

  30. ur Rehman, M.H., Liew, C.S., Abbas, A., Jayaraman, P.P., Wah, T.Y., Khan, S.U.: Big data reduction methods: a survey. Data Science and Engineering 1(4), 265–284 (2016)

    Article  Google Scholar 

  31. Yalagandula, P., Nath, S., Yu, H., Gibbons, P.B., Seshan, S.: Beyond availability: towards a deeper understanding of machine failure characteristics in large distributed systems. In: WORLDS (2004)

    Google Scholar 

  32. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  33. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  34. Zhang, M.-L., Zhou, Z.-H.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  35. Jiang, Y., Zhou, Z.-H.: Editing Training Data for kNN Classifiers with Neural Network Ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28647-9_60

    Chapter  Google Scholar 

  36. Bagheri, B., Ahmadi, H., Labbafi, R.: Application of data mining and feature extraction on intelligent fault diagnosis by artificial neural network and k-nearest neighbor. In: 2010 XIX International Conference on Electrical Machines (ICEM). IEEE, pp. 1–7 (2010)

    Google Scholar 

Download references

Acknowledgment

Project supported by the National Nature Science Foundation of China (No. 60973040, No. 61602057), the Outstanding Young Talent Project of Jilin Providence (No. 2017052005954), the Key Scientific and Technology Projects of Jilin Province. (No. 20130206051GX).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binod Kumar Adhikari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adhikari, B.K., Zuo, W.L., Maharjan, R., Guo, L. (2018). Sensitive Data Detection Using NN and KNN from Big Data. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11337. Springer, Cham. https://doi.org/10.1007/978-3-030-05063-4_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05063-4_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05062-7

  • Online ISBN: 978-3-030-05063-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics