Abstract
The quantity of data and documents created on a huge scale has recently increased. This signifies a rise in unstructured textual information. Extracting sensitive data is usually done manually, following certain rules, which means additional time costs and elevated chances of errors and non-performance. There is a growing need to automatically solve these types of problems, leading to the need of utilizing methodologies that are more intelligent than those previously used. The likelihood of automating this kind of assignment might significantly facilitate compliance with safety policies and imposed regulations. This work aims to highlight the status of Named Entity Recognition (NER) by evaluating models and presenting their overall presentation. It also describes the conflicts and factors that affect the perception of a given entity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-4010, https://aclanthology.org/N19-4010
Aura, T., Kuhn, T.A., Roe, M.: Scanning electronic documents for personally identifiable information. In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, pp. 41–50. WPES 2006, Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1179601.1179608
Ghiasvand, O., Kate, R.J.: Learning for clinical named entity recognition without manual annotations. Inf. Med. Unlocked 13, 122–127 (2018). https://doi.org/10.1016/j.imu.2018.10.011, https://www.sciencedirect.com/science/article/pii/S2352914818301965
Hathurusinghe, R., Nejadgholi, I., Bolic, M.: A privacy-preserving approach to extraction of personal information through automatic annotation and federated learning. In: Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 36–45. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.privatenlp-1.5, https://aclanthology.org/2021.privatenlp-1.5
Kumar, D., Pandey, S., Patel, P., Choudhari, K., Hajare, A., Jante, S.: Generalized named entity recognition framework. In: 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–4 (2021). https://doi.org/10.1109/ASIANCON51346.2021.9544652
Kushida, C., Nichols, D., Jadrnicek, R., Miller, R., Walsh, J., Griffin, K.: Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. care 50(Suppl), S82–S101 (2012). https://doi.org/10.1097/MLR.0b013e3182585355
Mosallanezhad, A., Beigi, G., Liu, H.: Deep reinforcement learning-based text anonymization against private-attribute inference. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2360–2369. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-1240, https://aclanthology.org/D19-1240
Naseer, S., et al.: Named entity recognition (NER) in NLP techniques, tools accuracy and performance (2022)
Ramachandran, R., Arutchelvan, K.: Named entity recognition on bio-medical literature documents using hybrid based approach. J. Ambient Intell. Humaniz. Comput., 1–10, March 2021
Surabhi, M.: Natural language processing future, pp. 1–3 (2013). https://doi.org/10.1109/ICOISS.2013.6678407
Tamper, M., Oksanen, A., Tuominen, J., Hietanen, A., Hyvönen, E.: Automatic annotation service APPI: named entity linking in legal domain. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12124, pp. 208–213. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62327-2_36
Ting, K.M.: Confusion Matrix, p. 209. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_157
Tripathi, S.P., Rai, H.: SimNER - an accurate and faster algorithm for named entity recognition. In: 2018 Second International Conference on Advances in Computing, Control and Communication Technology (IAC3T), pp. 115–119 (2018). https://doi.org/10.1109/IAC3T.2018.8674025
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Dhani, K.S., Zundel, B., Logofătu, D. (2023). Efficient Approaches for the Discovery of Sensitive Information by Using Natural Language Processing Techniques. In: Maglogiannis, I., Iliadis, L., MacIntyre, J., Dominguez, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 676. Springer, Cham. https://doi.org/10.1007/978-3-031-34107-6_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-34107-6_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34106-9
Online ISBN: 978-3-031-34107-6
eBook Packages: Computer ScienceComputer Science (R0)