Skip to main content

A Comprehensive Review on Transforming Security and Privacy with NLP

  • Conference paper
  • First Online:
Cryptology and Network Security with Machine Learning (ICCNSML 2023)

Abstract

The revolutionary field of natural language processing has broad implications for privacy and safety. This article reviews the wide range of natural language processing uses for privacy protection. Through a systematic literature review, we identify the most important natural language processing approaches employed for various forms of security, such as phishing email detection, cyberthreat analysis, anomaly detection, and privacy-aware text processing. We also discuss the moral issues that must be taken into account while implementing NLP, including defenses against adversarial assaults, good AI protocol, and safeguards for personal data. While examining the possible benefits and hazards of NLP systems, the study emphasizes the significance of responsible and ethical use. The future scope is also discussed to strengthen NLP’s utilization in the domain of privacy and security in data-driven era. This study is of related interest to researchers, practitioners, and policymakers in learning about natural language processing and how it relates to security and privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Garg R, Kiwelekar AW, Netak LD, Bhate SS (2021) Potential use-cases of natural language processing for a logistics organization. In: Modern approaches in machine learning and cognitive science: a walkthrough: latest trends in AI, vol 2. Springer, pp 157–191. https://doi.org/10.1007/978-3-030-68291-0_13

  2. Georgescu T-M (2020) Natural language processing model for automatic analysis of cybersecurity-related documents. Symmetry (Basel) 12(3):354. https://doi.org/10.3390/sym12030354

    Article  Google Scholar 

  3. Sarker IH, Furhad MH, Nowrozy R (2021) AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput. Sci. 2(3):173. https://doi.org/10.1007/s42979-021-00557-0

    Article  Google Scholar 

  4. Behera RK, Bala PK, Rana NP, Irani Z (2023) Responsible natural language processing: a principlist framework for social benefits. Technol Forecast Soc Change 188:122306. https://doi.org/10.1016/j.techfore.2022.122306

    Article  Google Scholar 

  5. Feng Q, He D, Liu Z, Wang H, Choo K-KR (2020) SecureNLP: a system for multi-party privacy-preserving natural language processing. IEEE Trans Inf Forensics Secur 15:3709–3721. https://doi.org/10.1109/TIFS.2020.2997134

    Article  Google Scholar 

  6. Steverson K, Carlin C, Mullin J, Ahiskali M (2021) Cyber intrusion detection using natural language processing on windows event logs. In: 2021 International conference on military communication and information systems (ICMCIS), IEEE, pp 1–7. https://doi.org/10.1109/ICMCIS52405.2021.9486307

  7. Cheng Z, Cui B, Qi T, Yang W, Fu J (2021) An improved feature extraction approach for web anomaly detection based on semantic structure. Secur Commun Netw 2021:1–11. https://doi.org/10.1155/2021/6661124

    Article  Google Scholar 

  8. Unal U, Dag H (2022) Anomaly adapters: parameter-efficient multi-anomaly task detection. IEEE Access 10:5635–5646. https://doi.org/10.1109/ACCESS.2022.3141161

    Article  Google Scholar 

  9. Ryciak P, Wasielewska K, Janicki A (2022) Anomaly detection in log files using selected natural language processing methods. Appl Sci 12(10):5089. https://doi.org/10.3390/app12105089

    Article  Google Scholar 

  10. Meira J, Carneiro J, BolĂłn-Canedo V, Alonso-Betanzos A, Novais P, Marreiros G (2022) Anomaly detection on natural language processing to improve predictions on tourist preferences. Electronics 11(5):779. https://doi.org/10.3390/electronics11050779

    Article  Google Scholar 

  11. Horak M, Chandrasekaran S, Tobar G (2022) NLP based anomaly detection for categorical time series. In: 2022 IEEE 23rd international conference on information reuse and integration for data science (IRI). IEEE, pp 27–34. https://doi.org/10.1109/IRI54793.2022.00019

  12. Xie F et al (2022) Scrutinizing privacy policy compliance of virtual personal assistant apps. In: Proceedings of the 37th IEEE/ACM international conference on automated software engineering. ACM, New York, pp 1–13. https://doi.org/10.1145/3551349.3560416

  13. Del Alamo JM, Guaman DS, García B, Diez A (2022) A systematic mapping study on automated analysis of privacy policies. Computing 104(9):2053–2076. https://doi.org/10.1007/s00607-022-01076-3

    Article  Google Scholar 

  14. Wagner I (2023) Privacy policies across the ages: content of privacy policies 1996–2021. ACM Trans Priv Secur 26(3):1–32. https://doi.org/10.1145/3590152

    Article  Google Scholar 

  15. John S, Ajayi BA, Marafa SM (2022) Natural language processing and deep learning based techniques for evaluation of companies’ privacy policies, pp 15–32. https://doi.org/10.1007/978-3-031-10536-4_2

  16. Ravichander A, Black AW, Norton T, Wilson S, Sadeh N (2021) Breaking down walls of text: how can NLP benefit consumer privacy? ACL-IJCNLP 2021—59th annual meeting of the association for computational linguistics. 11th international joint conference on natural language processing, pp 4125–4140. https://doi.org/10.18653/v1/2021.acl-long.319

  17. Liu Z et al (2023) DeID-GPT: zero-shot medical text de-identification by GPT-4. https://doi.org/10.48550/arXiv.2303.11032

  18. Abu-El-Rub et al N (2022) Natural language processing for enterprise-scale de-identification of protected health information in clinical notes. AMIA joint summits on translational science proceedings. AMIA Jt Summits Transl Sci 2022:92–101 (Online). Available: http://www.ncbi.nlm.nih.gov/pubmed/35854742

  19. Lothritz et al C (2023) Evaluating the impact of text de-identification on downstream {NLP} tasks. In: Proceedings of the 24th Nordic conference on computational linguistics (NoDaLiDa), pp 10–16 (Online). Available: https://aclanthology.org/2023.nodalida-1.2

  20. Larbi IBC, Burchardt A, Roller R (2023) Clinical text anonymization, its influence on downstream NLP tasks and the risk of re-identification. In: EACL 2023—17th conference of the European chapter of the association for computational linguistics: student research workshop, pp 105–111

    Google Scholar 

  21. Catelli R, Esposito M (2023) De-identification techniques to preserve privacy in medical records. In: Artificial intelligence in healthcare and COVID-19. Elsevier, pp 125–148. https://doi.org/10.1016/B978-0-323-90531-2.00007-2

  22. Marinho R, Holanda R (2023) Automated emerging cyber threat identification and profiling based on natural language processing. IEEE Access 11:58915–58936. https://doi.org/10.1109/ACCESS.2023.3260020

    Article  Google Scholar 

  23. Silvestri S, Islam S, Papastergiou S, Tzagkarakis C, Ciampi M (2023) A machine learning approach for the NLP-based analysis of cyber threats and vulnerabilities of the healthcare ecosystem. Sensors 23(2):651. https://doi.org/10.3390/s23020651

    Article  Google Scholar 

  24. Kim N et al (2019) Study of natural language processing for collecting cyber threat intelligence using SyntaxNet, pp 10–18. https://doi.org/10.1007/978-3-030-20717-5_2

  25. Mimura M, Ito R (2022) Applying NLP techniques to malware detection in a practical environment. Int J Inf Secur 21(2):279–291. https://doi.org/10.1007/s10207-021-00553-8

    Article  Google Scholar 

  26. Boukabous M, Azizi M (2022) Crime prediction using a hybrid sentiment analysis approach based on the bidirectional encoder representations from transformers. Indones J Electr Eng Comput Sci 25(2):1131. https://doi.org/10.11591/ijeecs.v25.i2.pp1131-1139

    Article  Google Scholar 

  27. Cho DX, Son VN, Duc D (2022) Automatically detect software security vulnerabilities based on natural language processing techniques and machine learning algorithms. J ICT Res Appl 16(1):70–87. https://doi.org/10.5614/itbj.ict.res.appl.2022.16.1.5

    Article  Google Scholar 

  28. Shi P, Song Y, Fei Z, Griffioen J (2021) Checking network security policy violations via natural language questions. In: 2021 international conference on computer communications and networks (ICCCN). IEEE, pp 1–9. https://doi.org/10.1109/ICCCN52240.2021.9522325

  29. Sousa S, Kern R (2023) How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing. Artif Intell Rev 56(2):1427–1492. https://doi.org/10.1007/s10462-022-10204-6

    Article  Google Scholar 

  30. Habernal I, Mireshghallah F, Thaine P, Ghanavati S, Feyisetan O (2023) Tutorial on privacy-preserving natural language processing. EACL 2023—17th conference of the European chapter of the association for computational linguistics: proceedings of tutoring abstract, pp 27–30

    Google Scholar 

  31. Kim D, Lee G, Oh S (2022) Toward privacy-preserving text embedding similarity with homomorphic encryption. In: FinNLP 2022—proceedings of the fourth workshop on financial technology and natural language processing (FinNLP), pp 25–36

    Google Scholar 

  32. Mahendran D, Luo C, Mcinnes BT (2021) Review: privacy-preservation in the context of natural language processing. IEEE Access 9:147600–147612. https://doi.org/10.1109/ACCESS.2021.3124163

    Article  Google Scholar 

  33. Yin Y, Habernal I (2022) Privacy-preserving models for legal natural language processing (Online). Available: http://arxiv.org/abs/2211.02956

  34. Silva P, Goncalves C, Godinho C, Antunes N, Curado M (2020) Using NLP and machine learning to detect data privacy violations. In: IEEE INFOCOM 2020—IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE, pp 972–977. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162683

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachit Garg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garg, R., Gupta, A., Srivastava, A. (2024). A Comprehensive Review on Transforming Security and Privacy with NLP. In: Chaturvedi, A., Hasan, S.U., Roy, B.K., Tsaban, B. (eds) Cryptology and Network Security with Machine Learning. ICCNSML 2023. Lecture Notes in Networks and Systems, vol 918. Springer, Singapore. https://doi.org/10.1007/978-981-97-0641-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0641-9_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0640-2

  • Online ISBN: 978-981-97-0641-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics