Skip to main content

NLP-Based Tools for Decoding the Language of Life

  • Conference paper
  • First Online:
Proceedings of Emerging Trends and Technologies on Intelligent Systems (ETTIS 2021)

Abstract

As the scientific know-how of the people around the world is expanding, the requirement of new technologies is also growing rapidly. This is evident by the number of papers being published and the new discoveries of scientists that are changing the definition of impossibility day by day. This paper explains one such technology which has made possible not only the recognition of natural language (i.e., human language) by computers but generation of speech and text which is natural language processing (NLP). When machine learning came into picture for assaying large amount data (statistical), deriving meaning from data became easy. Statistical prediction could be made for data containing millions of data points. However, analysis and prediction from textual data still remained a challenge. In 1950s, Alan Turing’s publication—Computing Machinery and Intelligence, introduced NLP in computational field which dealt with conversion of human language to machine-readable form and generated written or spoken output. NLP can be further be applied in bioinformatics for deducing the structure and function of a protein from its primary chain sequence or deriving end products of functional genes from their basal sequences as many researchers have found the sequences to be similar to human language calling it the ‘language of life’. Current studies are based upon using the rules of NLP in analyzing gene and protein sequences. This article is aimed at exploring the various applications of natural language processing in the field of bioinformatics and medical informatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chapelle, C.A., Chung, Y.-R.: The promise of NLP and speech processing technologies in language assessment. Lang. Test. 27, 301–315 (2010). https://doi.org/10.1177/0265532210364405

    Article  Google Scholar 

  2. Khan, N.S., Abid, A., Abid, K.: A Novel Natural Language Processing (NLP)–based machine translation model for English to Pakistan sign language translation. Cognit. Comput. 12, 748–765 (2020). https://doi.org/10.1007/s12559-020-09731-7

    Article  Google Scholar 

  3. Velupillai, S., Mowery, D., South, B.R., Kvist, M., Dalianis, H.: Recent advances in clinical natural language processing in support of semantic analysis. Yearb. Med. Inform. 24, 183–193 (2015). https://doi.org/10.15265/IY-2015-009

  4. Nambiar, A., Heflin, M., Liu, S., Maslov, S., Hopkins, M., Ritz, A.: Transforming the language of life. In: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 1–8. ACM, New York, NY, USA (2020). https://doi.org/10.1145/3388440.3412467

  5. Guo, W., Gao, H., Shi, J., Long, B., Zhang, L., Chen, B.-C., Agarwal, D.: Deep natural language processing for search and recommender systems. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3199–3200. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3292500.3332290.

  6. Zaky, D., Romadhony, A.: An LSTM-based Spell Checker for Indonesian Text. In: 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pp. 1–6. IEEE (2019). https://doi.org/10.1109/ICAICTA.2019.8904218

  7. Srinivasan, S., Ravi, V., Alazab, M., Ketha, S., Al-Zoubi, A.M., Kotti Padannayil, S.: Spam Emails detection based on distributed word embedding with deep learning. Presented at the (2021). https://doi.org/10.1007/978-3-030-57024-8_7

  8. Brinker, T.J., Hekler, A., Utikal, J.S., Grabe, N., Schadendorf, D., Klode, J., Berking, C., Steeb, T., Enk, A.H., Von Kalle, C.: Skin cancer classification using convolutional neural networks: systematic review. J. Med. Internet Res. 20, 1–8 (2018). https://doi.org/10.2196/11936

    Article  Google Scholar 

  9. Madankar, M., Chandak, M.B., Chavhan, N.: Information retrieval system and machine translation: a review. Procedia Comput. Sci. 78, 845–850 (2016). https://doi.org/10.1016/j.procs.2016.02.071

    Article  Google Scholar 

  10. Vucetic, S., Obradovic, Z., Vacic, V., Radivojac, P., Peng, K., Iakoucheva, L.M., Cortese, M.S., Lawson, J.D., Brown, C.J., Sikes, J.G., Newton, C.D., Dunker, A.K.: DisProt: a database of protein disorder. Bioinformatics 21, 137–140 (2005). https://doi.org/10.1093/bioinformatics/bth476

    Article  Google Scholar 

  11. Langdon, Q.K., Peris, D., Kyle, B., Hittinger, C.T.: Sppider: A species identification tool to investigate hybrid genomes with high-throughput sequencing. Mol. Biol. Evol. 35, 2835–2849 (2018). https://doi.org/10.1093/molbev/msy166

    Article  Google Scholar 

  12. Russell, R.B., Aloy, P.: InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics 19, 161–162 (2003). https://doi.org/10.1093/bioinformatics/19.1.161

  13. Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19, 2046–2053 (2003). https://doi.org/10.1093/bioinformatics/btg279

    Article  Google Scholar 

  14. Skusa, A., Rüegg, A., Köhler, J.: Extraction of biological interaction networks from scientific literature. Brief. Bioinform. 6, 263–276 (2005). https://doi.org/10.1093/bib/6.3.263

    Article  Google Scholar 

  15. Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Baumgartner, W.A., Bada, M., Palmer, M., Hunter, L.E.: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinform. 13 (2012). https://doi.org/10.1186/1471-2105-13-207

  16. Clegg, A.B., Shepherd, A.J.: Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinform. 8, 1–17 (2007). https://doi.org/10.1186/1471-2105-8-24

    Article  Google Scholar 

  17. Rodríguez-Penagos, C., Salgado, H., Martínez-Flores, I., Collado-Vides, J.: Automatic reconstruction of a bacterial regulatory network using Natural Language Processing. BMC Bioinform. 8, 1–11 (2007). https://doi.org/10.1186/1471-2105-8-293

    Article  Google Scholar 

  18. Miyao, Y., Sagae, K., Sætre, R., Matsuzaki, T., Tsujii, J.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25, 394–400 (2009). https://doi.org/10.1093/bioinformatics/btn631

    Article  Google Scholar 

  19. McDonald, R., Lerman, K., Pereira, F.: Multilingual dependency analysis with a two-stage discriminative parser, p. 216 (2006). https://doi.org/10.3115/1596276.1596317

  20. Sagae, K., Tsujii, J.: Shift-reduce dependency DAG parsing, pp. 753–760 (2008). https://doi.org/10.3115/1599081.1599176

  21. Chiang, D.: Statistical parsing with an automatically-extracted tree adjoining grammar, pp. 456–463 (2000). https://doi.org/10.3115/1075218.1075276

  22. McClosky, D., Charniak, E., Johnson, M.: Reranking and self-training for parser adaptation, pp. 337–344 (2006). https://doi.org/10.3115/1220175.1220218

  23. Sætre, R., Sagae, K., Tsujii, J.: Syntactic features for protein-protein interaction extraction. In: CEUR Workshop Proceedings, p. 319 (2007)

    Google Scholar 

  24. Kang, N., Singh, B., Afzal, Z., van Mulligen, E.M., Kors, J.A.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Informatics Assoc. 20, 876–881 (2013). https://doi.org/10.1136/amiajnl-2012-001173

    Article  Google Scholar 

  25. Wei, C.H., Kao, H.Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41, 518–522 (2013). https://doi.org/10.1093/nar/gkt441

    Article  Google Scholar 

  26. Huang, M., Liu, J., Zhu, X.: GeneTUKit: a software for document-level gene normalization. Bioinformatics 27, 1032–1033 (2011). https://doi.org/10.1093/bioinformatics/btr042

    Article  Google Scholar 

  27. Wei, C.H., Kao, H.Y.: Cross-species gene normalization by species inference. BMC Bioinform. 12 (2011). https://doi.org/10.1186/1471-2105-12-S8-S5

  28. Wei, C.H., Kao, H.Y., Lu, Z.: SR4GN: a species recognition software tool for gene normalization. PLoS ONE 7, 7–11 (2012). https://doi.org/10.1371/journal.pone.0038460

    Article  Google Scholar 

  29. Wei, C.H., Harris, B.R., Kao, H.Y., Lu, Z.: TmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29, 1433–1439 (2013). https://doi.org/10.1093/bioinformatics/btt156

    Article  Google Scholar 

  30. Li, J., Bi, L., Sun, Y., Lu, Z., Lin, Y., Bai, O., Shao, H.: Text mining and network analysis of molecular interaction in non-small cell lung cancer by using natural language processing. Mol. Biol. Rep. 41, 8071–8079 (2014). https://doi.org/10.1007/s11033-014-3705-5

    Article  Google Scholar 

  31. Badal, V.D., Kundrotas, P.J., Vakser, I.A.: Natural language processing in text mining for structural modeling of protein complexes. BMC Bioinform. 19, 1–10 (2018). https://doi.org/10.1186/s12859-018-2079-4

    Article  Google Scholar 

  32. McEwan, R., Melton, G.B., Knoll, B.C., Wang, Y., Hultman, G., Dale, J.L., Meyer, T., Pakhomov, S.V: NLP-PIER: a scalable natural language processing, indexing, and searching architecture for clinical notes. AMIA Jt. Summits Transl. Sci. Proceedings. AMIA Jt. Summits Transl. Sci. 2016, 150–159 (2016)

    Google Scholar 

  33. Qu, J., Steppi, A., Zhong, D., Hao, J., Wang, J., Lung, P.-Y., Zhao, T., He, Z., Zhang, J.: Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach. BMC Genomics 21, 773 (2020). https://doi.org/10.1186/s12864-020-07185-7

    Article  Google Scholar 

  34. Austerjost, J., Porr, M., Riedel, N., Geier, D., Becker, T., Scheper, T., Marquard, D., Lindner, P., Beutel, S.: Introducing a virtual assistant to the lab: a voice user interface for the intuitive control of laboratory instruments. SLAS Technol. Transl. Life Sci. Innov. 23, 476–482 (2018). https://doi.org/10.1177/2472630318788040

    Article  Google Scholar 

  35. Jin, Y., Li, F., Yu, H.: BENTO: A visual platform for building clinical NLP pipelines based on CodaLab. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 95–100. Association for Computational Linguistics, Stroudsburg, PA, USA (2020). https://doi.org/10.18653/v1/2020.acl-demos.13

  36. Liu, B., Zhang, D., Xu, R., Xu, J., Wang, X., Chen, Q., Dong, Q., Chou, K.-C.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30, 472–479 (2014). https://doi.org/10.1093/bioinformatics/btt709

    Article  Google Scholar 

  37. Zou, Q., Li, J., Wang, C., Zeng, X.: Approaches for recognizing disease genes based on network. Biomed Res. Int. 2014, 1–10 (2014). https://doi.org/10.1155/2014/416323

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chauhan, A., Hasija, Y. (2022). NLP-Based Tools for Decoding the Language of Life. In: Noor, A., Sen, A., Trivedi, G. (eds) Proceedings of Emerging Trends and Technologies on Intelligent Systems . ETTIS 2021. Advances in Intelligent Systems and Computing, vol 1371. Springer, Singapore. https://doi.org/10.1007/978-981-16-3097-2_18

Download citation

Publish with us

Policies and ethics