Skip to main content

Multi-perspective and Domain Specific Tagging of Chemical Documents

  • Conference paper
  • First Online:
Data Science Analytics and Applications (DaSAA 2017)

Abstract

Text document search typically retrieves documents by performing an exact match based on keywords. In all domains the exact match may not yield good performance as the morpheme or structure of the words has not been considered for the search. This problem becomes significant in the research field of chemistry, where the user could search using a keyword and the document could contain the keyword as a part of the chemical name. For example, the chemical name pentanone contains ketone functional group in it, which can be found by doing a morphemic analysis with the help of chemical nomenclature. Each of the chemical names contains a lot of information about the chemical compound for which it is being named. Hence, the chemical names in the document need to be tagged with all its possible meaningful morphemes to have efficient performance. A multi-perspective and domain specific tagging system was designed based on the available chemical nomenclature, considering the type of bond, number of carbon atoms and the functional group of the chemical entity. The tagging system begins with extraction of the chemical names in the document based on morphological and domain specific features. Based on these features and the contextual knowledge, models were created by designing a linear-chain conditional random field of order two, and they serve as a baseline for the chemical entity extraction process. A morphemic or structural analysis of the extracted named entity was done for the multi-perspective tagging system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kolárik, C., Klinger, R., Friedrich, C.M., Hofmann-Apitius, M., Fluck, J.: Chemical names: terminological resources and corpora annotation. In: Workshop on Building and Evaluating Resources for Biomedical Text Mining. Language Resources and Evaluation Conference, 6th edn., pp. 51–58 (2008)

    Google Scholar 

  2. Roberts, P.M., Hayes, W.S.: Information needs and the role of text mining in drug development. In: Pacific Symposium on Biocomputing, vol. 13, pp. 592–603 (2008)

    Google Scholar 

  3. Sonu, G.S., Harikumar, S.L., Navis, S.: A review on drug-drug and drug-food interactions in patients during the treatment of diabetes mellitus. Int. J. Pharmacol. Clin. Sci. 4(4), 98–105 (2015)

    Google Scholar 

  4. Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings Bioinform. 6(1), 57–71 (2005)

    Article  Google Scholar 

  5. Friedrich, C.M., Revillion, T., Hofmann, M., Fluck, J.: Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features. In: Proceedings of the Second International Symposium on Semantic Mining in Biomedicine, BMC Bioinformatics, vol. 7, pp. 85–89 (2006)

    Google Scholar 

  6. John Wilbur, W., Hazard, G.F., Divita, G., Mork, J.G., Aronson, A.R., Browne, A.C.: Analysis of biomedical text for chemical names: a comparison of three methods. In: Proceedings of the AMIA Symposium, pp. 176–180 (1999)

    Google Scholar 

  7. Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.: Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7), 1178–1190 (2004)

    Article  Google Scholar 

  8. Grego, T., Pesquita, C., Bastos, H.P., Couto, F.M.: Chemical entity recognition and resolution to ChEBI. ISRN Bioinform. (2012). https://doi.org/10.5402/2012/619427. Article ID 619427, 9 pages

  9. Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6, 1–17 (2014)

    Article  Google Scholar 

  10. Umare, S.P., Deshpande, N.A.: A survey on machine learning techniques to extract chemical names from text documents. Int. J. Comput. Sci. Inf. Technol. 6(2), 1263–1266 (2015)

    Google Scholar 

  11. Algorri, M., Zimmermann, M., Friedrich C.M., Akle, S., Hofmann-Apitius, M.: Reconstruction of chemical molecules from images. In: Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC, pp. 4609–4612 (2007)

    Google Scholar 

  12. Sun, B., Tan, Q., Mitra, P., Giles, C.L.: Extraction and search of chemical formulae in text documents. In: Proceedings of the 16th International Conference on World Wide Web, pp. 251–260 (2007)

    Google Scholar 

  13. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML (2001)

    Google Scholar 

  14. de Matos, P., Dekker, A., Ennis, M., Hastings, J., Haug, K., Turne, S., Steinbeck, C.: ChEBI: a chemistry ontology and database. J. Cheminform. 2, P6 (2010)

    Article  Google Scholar 

  15. Jessop, D.M., Adams, S.E., Willighagen, E.L., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical textmining. J. Cheminform. 3, 1–41 (2011)

    Article  Google Scholar 

  16. IUPAC: Commission on the Nomenclature of Organic Chemistry. A Guide to IUPAC Nomenclature of Organic Compounds (Recommendations 1993). Blackwell Scientific Publications, Oxford (1993)

    Google Scholar 

  17. Lana-Serrano, S., Sanchez-Cisneros, D., Campillos, L., Segura-Bedmar, I.: Recognizing chemical compounds and drugs: a rule-based approach using semantic information. In: BioCreative Challenge Evaluation Workshop, vol. 2 (2013)

    Google Scholar 

  18. Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9(Suppl. 11), 54–62 (2008)

    Google Scholar 

  19. Usié, A., Alves, R., Solsona, F., Vázquez, M., Valencia, A.: CheNER: chemical named entity recognizer. Bioinformatics 30(7), 1039–1040 (2014)

    Article  Google Scholar 

  20. Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(1), i180–i182 (2003)

    Article  Google Scholar 

  21. Corbett, P., Batchelor, C., Teufel, S.: Annotation of chemical named entities. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 57–64 (2007)

    Google Scholar 

  22. Hawizy, L., Jessop, D.M., Adams, N., Murray-Rust, P.: Chemical tagger: a tool for semantic text-mining in chemistry. J. Cheminform. 3, 1–17 (2011)

    Article  Google Scholar 

  23. Klinger, R., Kolárik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of IUPAC and IUPAC-like chemical names. Bioinformatics 24, i268–i276 (2008)

    Article  Google Scholar 

  24. Rocktäschel, T., Weidlich, M., Leser, U.: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012)

    Article  Google Scholar 

  25. Sun, B., Mitra, P., Giles, C.L.: Mining, indexing, and searching for textual chemical molecule information. In: Proceedings of the 17th International Conference on World Wide Web, pp. 735–744 (2008)

    Google Scholar 

  26. Wu, X., Zhang, L., Chen, Y., Rhodes, J., Griffin, T.D., Boyer, S.K., Alba, A., Cai, K.: ChemBrowser: a flexible framework for mining chemical documents. In: Arabnia, H. (ed.) Advances in Computational Biology, pp. 57–64. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-5913-3_7

    Chapter  Google Scholar 

  27. Lee, C., Hou, W.-J., Chen, H.-H.: Annotating multiple types of biomedical entities: a single word classification approach. In: International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (NLPBA/BioNLP) (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. S. Deepika .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deepika, S.S., Geetha, T.V., Sridhar, R. (2018). Multi-perspective and Domain Specific Tagging of Chemical Documents. In: R, S., Sharma, M. (eds) Data Science Analytics and Applications. DaSAA 2017. Communications in Computer and Information Science, vol 804. Springer, Singapore. https://doi.org/10.1007/978-981-10-8603-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8603-8_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8602-1

  • Online ISBN: 978-981-10-8603-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics