Skip to main content

Efficient Approaches for the Discovery of Sensitive Information by Using Natural Language Processing Techniques

  • Conference paper
  • First Online:
Artificial Intelligence Applications and Innovations (AIAI 2023)

Abstract

The quantity of data and documents created on a huge scale has recently increased. This signifies a rise in unstructured textual information. Extracting sensitive data is usually done manually, following certain rules, which means additional time costs and elevated chances of errors and non-performance. There is a growing need to automatically solve these types of problems, leading to the need of utilizing methodologies that are more intelligent than those previously used. The likelihood of automating this kind of assignment might significantly facilitate compliance with safety policies and imposed regulations. This work aims to highlight the status of Named Entity Recognition (NER) by evaluating models and presenting their overall presentation. It also describes the conflicts and factors that affect the perception of a given entity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-4010, https://aclanthology.org/N19-4010

  2. Aura, T., Kuhn, T.A., Roe, M.: Scanning electronic documents for personally identifiable information. In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, pp. 41–50. WPES 2006, Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1179601.1179608

  3. Ghiasvand, O., Kate, R.J.: Learning for clinical named entity recognition without manual annotations. Inf. Med. Unlocked 13, 122–127 (2018). https://doi.org/10.1016/j.imu.2018.10.011, https://www.sciencedirect.com/science/article/pii/S2352914818301965

  4. Hathurusinghe, R., Nejadgholi, I., Bolic, M.: A privacy-preserving approach to extraction of personal information through automatic annotation and federated learning. In: Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 36–45. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.privatenlp-1.5, https://aclanthology.org/2021.privatenlp-1.5

  5. Kumar, D., Pandey, S., Patel, P., Choudhari, K., Hajare, A., Jante, S.: Generalized named entity recognition framework. In: 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–4 (2021). https://doi.org/10.1109/ASIANCON51346.2021.9544652

  6. Kushida, C., Nichols, D., Jadrnicek, R., Miller, R., Walsh, J., Griffin, K.: Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. care 50(Suppl), S82–S101 (2012). https://doi.org/10.1097/MLR.0b013e3182585355

  7. Mosallanezhad, A., Beigi, G., Liu, H.: Deep reinforcement learning-based text anonymization against private-attribute inference. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2360–2369. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-1240, https://aclanthology.org/D19-1240

  8. Naseer, S., et al.: Named entity recognition (NER) in NLP techniques, tools accuracy and performance (2022)

    Google Scholar 

  9. Ramachandran, R., Arutchelvan, K.: Named entity recognition on bio-medical literature documents using hybrid based approach. J. Ambient Intell. Humaniz. Comput., 1–10, March 2021

    Google Scholar 

  10. Surabhi, M.: Natural language processing future, pp. 1–3 (2013). https://doi.org/10.1109/ICOISS.2013.6678407

  11. Tamper, M., Oksanen, A., Tuominen, J., Hietanen, A., Hyvönen, E.: Automatic annotation service APPI: named entity linking in legal domain. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12124, pp. 208–213. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62327-2_36

    Chapter  Google Scholar 

  12. Ting, K.M.: Confusion Matrix, p. 209. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_157

    Book  Google Scholar 

  13. Tripathi, S.P., Rai, H.: SimNER - an accurate and faster algorithm for named entity recognition. In: 2018 Second International Conference on Advances in Computing, Control and Communication Technology (IAC3T), pp. 115–119 (2018). https://doi.org/10.1109/IAC3T.2018.8674025

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Doina Logofătu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dhani, K.S., Zundel, B., Logofătu, D. (2023). Efficient Approaches for the Discovery of Sensitive Information by Using Natural Language Processing Techniques. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 676. Springer, Cham. https://doi.org/10.1007/978-3-031-34107-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34107-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34106-9

  • Online ISBN: 978-3-031-34107-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics