NLP-Based Tools for Decoding the Language of Life

Chauhan, Aparna; Hasija, Yasha

doi:10.1007/978-981-16-3097-2_18

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1371))

Included in the following conference series:

International Conference on Emerging Trends and Technologies on Intelligent Systems

352 Accesses

Abstract

As the scientific know-how of the people around the world is expanding, the requirement of new technologies is also growing rapidly. This is evident by the number of papers being published and the new discoveries of scientists that are changing the definition of impossibility day by day. This paper explains one such technology which has made possible not only the recognition of natural language (i.e., human language) by computers but generation of speech and text which is natural language processing (NLP). When machine learning came into picture for assaying large amount data (statistical), deriving meaning from data became easy. Statistical prediction could be made for data containing millions of data points. However, analysis and prediction from textual data still remained a challenge. In 1950s, Alan Turing’s publication—Computing Machinery and Intelligence, introduced NLP in computational field which dealt with conversion of human language to machine-readable form and generated written or spoken output. NLP can be further be applied in bioinformatics for deducing the structure and function of a protein from its primary chain sequence or deriving end products of functional genes from their basal sequences as many researchers have found the sequences to be similar to human language calling it the ‘language of life’. Current studies are based upon using the rules of NLP in analyzing gene and protein sequences. This article is aimed at exploring the various applications of natural language processing in the field of bioinformatics and medical informatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chapelle, C.A., Chung, Y.-R.: The promise of NLP and speech processing technologies in language assessment. Lang. Test. 27, 301–315 (2010). https://doi.org/10.1177/0265532210364405
Article Google Scholar
Khan, N.S., Abid, A., Abid, K.: A Novel Natural Language Processing (NLP)–based machine translation model for English to Pakistan sign language translation. Cognit. Comput. 12, 748–765 (2020). https://doi.org/10.1007/s12559-020-09731-7
Article Google Scholar
Velupillai, S., Mowery, D., South, B.R., Kvist, M., Dalianis, H.: Recent advances in clinical natural language processing in support of semantic analysis. Yearb. Med. Inform. 24, 183–193 (2015). https://doi.org/10.15265/IY-2015-009
Nambiar, A., Heflin, M., Liu, S., Maslov, S., Hopkins, M., Ritz, A.: Transforming the language of life. In: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 1–8. ACM, New York, NY, USA (2020). https://doi.org/10.1145/3388440.3412467
Guo, W., Gao, H., Shi, J., Long, B., Zhang, L., Chen, B.-C., Agarwal, D.: Deep natural language processing for search and recommender systems. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3199–3200. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3292500.3332290.
Zaky, D., Romadhony, A.: An LSTM-based Spell Checker for Indonesian Text. In: 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pp. 1–6. IEEE (2019). https://doi.org/10.1109/ICAICTA.2019.8904218
Srinivasan, S., Ravi, V., Alazab, M., Ketha, S., Al-Zoubi, A.M., Kotti Padannayil, S.: Spam Emails detection based on distributed word embedding with deep learning. Presented at the (2021). https://doi.org/10.1007/978-3-030-57024-8_7
Brinker, T.J., Hekler, A., Utikal, J.S., Grabe, N., Schadendorf, D., Klode, J., Berking, C., Steeb, T., Enk, A.H., Von Kalle, C.: Skin cancer classification using convolutional neural networks: systematic review. J. Med. Internet Res. 20, 1–8 (2018). https://doi.org/10.2196/11936
Article Google Scholar
Madankar, M., Chandak, M.B., Chavhan, N.: Information retrieval system and machine translation: a review. Procedia Comput. Sci. 78, 845–850 (2016). https://doi.org/10.1016/j.procs.2016.02.071
Article Google Scholar
Vucetic, S., Obradovic, Z., Vacic, V., Radivojac, P., Peng, K., Iakoucheva, L.M., Cortese, M.S., Lawson, J.D., Brown, C.J., Sikes, J.G., Newton, C.D., Dunker, A.K.: DisProt: a database of protein disorder. Bioinformatics 21, 137–140 (2005). https://doi.org/10.1093/bioinformatics/bth476
Article Google Scholar
Langdon, Q.K., Peris, D., Kyle, B., Hittinger, C.T.: Sppider: A species identification tool to investigate hybrid genomes with high-throughput sequencing. Mol. Biol. Evol. 35, 2835–2849 (2018). https://doi.org/10.1093/molbev/msy166
Article Google Scholar
Russell, R.B., Aloy, P.: InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics 19, 161–162 (2003). https://doi.org/10.1093/bioinformatics/19.1.161
Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19, 2046–2053 (2003). https://doi.org/10.1093/bioinformatics/btg279
Article Google Scholar
Skusa, A., Rüegg, A., Köhler, J.: Extraction of biological interaction networks from scientific literature. Brief. Bioinform. 6, 263–276 (2005). https://doi.org/10.1093/bib/6.3.263
Article Google Scholar
Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Baumgartner, W.A., Bada, M., Palmer, M., Hunter, L.E.: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinform. 13 (2012). https://doi.org/10.1186/1471-2105-13-207
Clegg, A.B., Shepherd, A.J.: Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinform. 8, 1–17 (2007). https://doi.org/10.1186/1471-2105-8-24
Article Google Scholar
Rodríguez-Penagos, C., Salgado, H., Martínez-Flores, I., Collado-Vides, J.: Automatic reconstruction of a bacterial regulatory network using Natural Language Processing. BMC Bioinform. 8, 1–11 (2007). https://doi.org/10.1186/1471-2105-8-293
Article Google Scholar
Miyao, Y., Sagae, K., Sætre, R., Matsuzaki, T., Tsujii, J.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25, 394–400 (2009). https://doi.org/10.1093/bioinformatics/btn631
Article Google Scholar
McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser, p. 216 (2006). https://doi.org/10.3115/1596276.1596317
Sagae, K., Tsujii, J.: Shift-reduce dependency DAG parsing, pp. 753–760 (2008). https://doi.org/10.3115/1599081.1599176
Chiang, D.: Statistical parsing with an automatically-extracted tree adjoining grammar, pp. 456–463 (2000). https://doi.org/10.3115/1075218.1075276
McClosky, D., Charniak, E., Johnson, M.: Reranking and self-training for parser adaptation, pp. 337–344 (2006). https://doi.org/10.3115/1220175.1220218
Sætre, R., Sagae, K., Tsujii, J.: Syntactic features for protein-protein interaction extraction. In: CEUR Workshop Proceedings, p. 319 (2007)
Google Scholar
Kang, N., Singh, B., Afzal, Z., van Mulligen, E.M., Kors, J.A.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Informatics Assoc. 20, 876–881 (2013). https://doi.org/10.1136/amiajnl-2012-001173
Article Google Scholar
Wei, C.H., Kao, H.Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41, 518–522 (2013). https://doi.org/10.1093/nar/gkt441
Article Google Scholar
Huang, M., Liu, J., Zhu, X.: GeneTUKit: a software for document-level gene normalization. Bioinformatics 27, 1032–1033 (2011). https://doi.org/10.1093/bioinformatics/btr042
Article Google Scholar
Wei, C.H., Kao, H.Y.: Cross-species gene normalization by species inference. BMC Bioinform. 12 (2011). https://doi.org/10.1186/1471-2105-12-S8-S5
Wei, C.H., Kao, H.Y., Lu, Z.: SR4GN: a species recognition software tool for gene normalization. PLoS ONE 7, 7–11 (2012). https://doi.org/10.1371/journal.pone.0038460
Article Google Scholar
Wei, C.H., Harris, B.R., Kao, H.Y., Lu, Z.: TmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29, 1433–1439 (2013). https://doi.org/10.1093/bioinformatics/btt156
Article Google Scholar
Li, J., Bi, L., Sun, Y., Lu, Z., Lin, Y., Bai, O., Shao, H.: Text mining and network analysis of molecular interaction in non-small cell lung cancer by using natural language processing. Mol. Biol. Rep. 41, 8071–8079 (2014). https://doi.org/10.1007/s11033-014-3705-5
Article Google Scholar
Badal, V.D., Kundrotas, P.J., Vakser, I.A.: Natural language processing in text mining for structural modeling of protein complexes. BMC Bioinform. 19, 1–10 (2018). https://doi.org/10.1186/s12859-018-2079-4
Article Google Scholar
McEwan, R., Melton, G.B., Knoll, B.C., Wang, Y., Hultman, G., Dale, J.L., Meyer, T., Pakhomov, S.V: NLP-PIER: a scalable natural language processing, indexing, and searching architecture for clinical notes. AMIA Jt. Summits Transl. Sci. Proceedings. AMIA Jt. Summits Transl. Sci. 2016, 150–159 (2016)
Google Scholar
Qu, J., Steppi, A., Zhong, D., Hao, J., Wang, J., Lung, P.-Y., Zhao, T., He, Z., Zhang, J.: Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach. BMC Genomics 21, 773 (2020). https://doi.org/10.1186/s12864-020-07185-7
Article Google Scholar
Austerjost, J., Porr, M., Riedel, N., Geier, D., Becker, T., Scheper, T., Marquard, D., Lindner, P., Beutel, S.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Transl. Life Sci. Innov. 23, 476–482 (2018). https://doi.org/10.1177/2472630318788040
Article Google Scholar
Jin, Y., Li, F., Yu, H.: BENTO: A visual platform for building clinical NLP pipelines based on CodaLab. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 95–100. Association for Computational Linguistics, Stroudsburg, PA, USA (2020). https://doi.org/10.18653/v1/2020.acl-demos.13
Liu, B., Zhang, D., Xu, R., Xu, J., Wang, X., Chen, Q., Dong, Q., Chou, K.-C.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014). https://doi.org/10.1093/bioinformatics/btt709
Article Google Scholar
Zou, Q., Li, J., Wang, C., Zeng, X.: Approaches for recognizing disease genes based on network. Biomed Res. Int. 2014, 1–10 (2014). https://doi.org/10.1155/2014/416323
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biotechnology, Delhi Technological University, Bawana Rd, Shahbad Daulatpur Village, Rohini, New Delhi, 110042, India
Aparna Chauhan & Yasha Hasija

Authors

Aparna Chauhan
View author publications
You can also search for this author in PubMed Google Scholar
Yasha Hasija
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Education and Training Division, Centre for Development of Advanced Computing (CDAC), Noida, Uttar Pradesh, India
Arti Noor
Computer Science and Information Technology, Kwantlen Polytechnic University, Surrey, BC, Canada
Abhijit Sen
Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India
Gaurav Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chauhan, A., Hasija, Y. (2022). NLP-Based Tools for Decoding the Language of Life. In: Noor, A., Sen, A., Trivedi, G. (eds) Proceedings of Emerging Trends and Technologies on Intelligent Systems . ETTIS 2021. Advances in Intelligent Systems and Computing, vol 1371. Springer, Singapore. https://doi.org/10.1007/978-981-16-3097-2_18

Download citation

DOI: https://doi.org/10.1007/978-981-16-3097-2_18
Published: 02 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3096-5
Online ISBN: 978-981-16-3097-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics