Efficient Approaches for the Discovery of Sensitive Information by Using Natural Language Processing Techniques

Dhani, Kushal Shree; Zundel, Benedikt; Logofătu, Doina

doi:10.1007/978-3-031-34107-6_32

Kushal Shree Dhani¹⁹,
Benedikt Zundel¹⁹ &
Doina Logofătu¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 676))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

692 Accesses

Abstract

The quantity of data and documents created on a huge scale has recently increased. This signifies a rise in unstructured textual information. Extracting sensitive data is usually done manually, following certain rules, which means additional time costs and elevated chances of errors and non-performance. There is a growing need to automatically solve these types of problems, leading to the need of utilizing methodologies that are more intelligent than those previously used. The likelihood of automating this kind of assignment might significantly facilitate compliance with safety policies and imposed regulations. This work aims to highlight the status of Named Entity Recognition (NER) by evaluating models and presenting their overall presentation. It also describes the conflicts and factors that affect the perception of a given entity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-4010, https://aclanthology.org/N19-4010
Aura, T., Kuhn, T.A., Roe, M.: Scanning electronic documents for personally identifiable information. In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, pp. 41–50. WPES 2006, Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1179601.1179608
Ghiasvand, O., Kate, R.J.: Learning for clinical named entity recognition without manual annotations. Inf. Med. Unlocked 13, 122–127 (2018). https://doi.org/10.1016/j.imu.2018.10.011, https://www.sciencedirect.com/science/article/pii/S2352914818301965
Hathurusinghe, R., Nejadgholi, I., Bolic, M.: A privacy-preserving approach to extraction of personal information through automatic annotation and federated learning. In: Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 36–45. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.privatenlp-1.5, https://aclanthology.org/2021.privatenlp-1.5
Kumar, D., Pandey, S., Patel, P., Choudhari, K., Hajare, A., Jante, S.: Generalized named entity recognition framework. In: 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–4 (2021). https://doi.org/10.1109/ASIANCON51346.2021.9544652
Kushida, C., Nichols, D., Jadrnicek, R., Miller, R., Walsh, J., Griffin, K.: Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. care 50(Suppl), S82–S101 (2012). https://doi.org/10.1097/MLR.0b013e3182585355
Mosallanezhad, A., Beigi, G., Liu, H.: Deep reinforcement learning-based text anonymization against private-attribute inference. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2360–2369. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-1240, https://aclanthology.org/D19-1240
Naseer, S., et al.: Named entity recognition (NER) in NLP techniques, tools accuracy and performance (2022)
Google Scholar
Ramachandran, R., Arutchelvan, K.: Named entity recognition on bio-medical literature documents using hybrid based approach. J. Ambient Intell. Humaniz. Comput., 1–10, March 2021
Google Scholar
Surabhi, M.: Natural language processing future, pp. 1–3 (2013). https://doi.org/10.1109/ICOISS.2013.6678407
Tamper, M., Oksanen, A., Tuominen, J., Hietanen, A., Hyvönen, E.: Automatic annotation service APPI: named entity linking in legal domain. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12124, pp. 208–213. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62327-2_36
Chapter Google Scholar
Ting, K.M.: Confusion Matrix, p. 209. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_157
Book Google Scholar
Tripathi, S.P., Rai, H.: SimNER - an accurate and faster algorithm for named entity recognition. In: 2018 Second International Conference on Advances in Computing, Control and Communication Technology (IAC3T), pp. 115–119 (2018). https://doi.org/10.1109/IAC3T.2018.8674025

Download references

Author information

Authors and Affiliations

Frankfurt University of Applied Sciences, Frankfurt am Main, Germany
Kushal Shree Dhani, Benedikt Zundel & Doina Logofătu

Authors

Kushal Shree Dhani
View author publications
You can also search for this author in PubMed Google Scholar
Benedikt Zundel
View author publications
You can also search for this author in PubMed Google Scholar
Doina Logofătu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Doina Logofătu .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Ilias Maglogiannis
Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Sunderland, Sunderland, UK
John MacIntyre
University of Leon, León, Spain
Manuel Dominguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhani, K.S., Zundel, B., Logofătu, D. (2023). Efficient Approaches for the Discovery of Sensitive Information by Using Natural Language Processing Techniques. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 676. Springer, Cham. https://doi.org/10.1007/978-3-031-34107-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-34107-6_32
Published: 01 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34106-9
Online ISBN: 978-3-031-34107-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Efficient Approaches for the Discovery of Sensitive Information by Using Natural Language Processing Techniques