Multi-perspective and Domain Specific Tagging of Chemical Documents

Deepika, S. S.; Geetha, T. V.; Sridhar, Rajeswari

doi:10.1007/978-981-10-8603-8_7

S. S. Deepika¹¹,
T. V. Geetha¹¹ &
Rajeswari Sridhar¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 804))

Included in the following conference series:

International Conference on Data Science Analytics and Applications

Abstract

Text document search typically retrieves documents by performing an exact match based on keywords. In all domains the exact match may not yield good performance as the morpheme or structure of the words has not been considered for the search. This problem becomes significant in the research field of chemistry, where the user could search using a keyword and the document could contain the keyword as a part of the chemical name. For example, the chemical name pentanone contains ketone functional group in it, which can be found by doing a morphemic analysis with the help of chemical nomenclature. Each of the chemical names contains a lot of information about the chemical compound for which it is being named. Hence, the chemical names in the document need to be tagged with all its possible meaningful morphemes to have efficient performance. A multi-perspective and domain specific tagging system was designed based on the available chemical nomenclature, considering the type of bond, number of carbon atoms and the functional group of the chemical entity. The tagging system begins with extraction of the chemical names in the document based on morphological and domain specific features. Based on these features and the contextual knowledge, models were created by designing a linear-chain conditional random field of order two, and they serve as a baseline for the chemical entity extraction process. A morphemic or structural analysis of the extracted named entity was done for the multi-perspective tagging system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kolárik, C., Klinger, R., Friedrich, C.M., Hofmann-Apitius, M., Fluck, J.: Chemical names: terminological resources and corpora annotation. In: Workshop on Building and Evaluating Resources for Biomedical Text Mining. Language Resources and Evaluation Conference, 6th edn., pp. 51–58 (2008)
Google Scholar
Roberts, P.M., Hayes, W.S.: Information needs and the role of text mining in drug development. In: Pacific Symposium on Biocomputing, vol. 13, pp. 592–603 (2008)
Google Scholar
Sonu, G.S., Harikumar, S.L., Navis, S.: A review on drug-drug and drug-food interactions in patients during the treatment of diabetes mellitus. Int. J. Pharmacol. Clin. Sci. 4(4), 98–105 (2015)
Google Scholar
Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings Bioinform. 6(1), 57–71 (2005)
Article Google Scholar
Friedrich, C.M., Revillion, T., Hofmann, M., Fluck, J.: Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features. In: Proceedings of the Second International Symposium on Semantic Mining in Biomedicine, BMC Bioinformatics, vol. 7, pp. 85–89 (2006)
Google Scholar
John Wilbur, W., Hazard, G.F., Divita, G., Mork, J.G., Aronson, A.R., Browne, A.C.: Analysis of biomedical text for chemical names: a comparison of three methods. In: Proceedings of the AMIA Symposium, pp. 176–180 (1999)
Google Scholar
Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.: Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7), 1178–1190 (2004)
Article Google Scholar
Grego, T., Pesquita, C., Bastos, H.P., Couto, F.M.: Chemical entity recognition and resolution to ChEBI. ISRN Bioinform. (2012). https://doi.org/10.5402/2012/619427. Article ID 619427, 9 pages
Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6, 1–17 (2014)
Article Google Scholar
Umare, S.P., Deshpande, N.A.: A survey on machine learning techniques to extract chemical names from text documents. Int. J. Comput. Sci. Inf. Technol. 6(2), 1263–1266 (2015)
Google Scholar
Algorri, M., Zimmermann, M., Friedrich C.M., Akle, S., Hofmann-Apitius, M.: Reconstruction of chemical molecules from images. In: Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC, pp. 4609–4612 (2007)
Google Scholar
Sun, B., Tan, Q., Mitra, P., Giles, C.L.: Extraction and search of chemical formulae in text documents. In: Proceedings of the 16th International Conference on World Wide Web, pp. 251–260 (2007)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML (2001)
Google Scholar
de Matos, P., Dekker, A., Ennis, M., Hastings, J., Haug, K., Turne, S., Steinbeck, C.: ChEBI: a chemistry ontology and database. J. Cheminform. 2, P6 (2010)
Article Google Scholar
Jessop, D.M., Adams, S.E., Willighagen, E.L., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical textmining. J. Cheminform. 3, 1–41 (2011)
Article Google Scholar
IUPAC: Commission on the Nomenclature of Organic Chemistry. A Guide to IUPAC Nomenclature of Organic Compounds (Recommendations 1993). Blackwell Scientific Publications, Oxford (1993)
Google Scholar
Lana-Serrano, S., Sanchez-Cisneros, D., Campillos, L., Segura-Bedmar, I.: Recognizing chemical compounds and drugs: a rule-based approach using semantic information. In: BioCreative Challenge Evaluation Workshop, vol. 2 (2013)
Google Scholar
Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9(Suppl. 11), 54–62 (2008)
Google Scholar
Usié, A., Alves, R., Solsona, F., Vázquez, M., Valencia, A.: CheNER: chemical named entity recognizer. Bioinformatics 30(7), 1039–1040 (2014)
Article Google Scholar
Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(1), i180–i182 (2003)
Article Google Scholar
Corbett, P., Batchelor, C., Teufel, S.: Annotation of chemical named entities. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 57–64 (2007)
Google Scholar
Hawizy, L., Jessop, D.M., Adams, N., Murray-Rust, P.: Chemical tagger: a tool for semantic text-mining in chemistry. J. Cheminform. 3, 1–17 (2011)
Article Google Scholar
Klinger, R., Kolárik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of IUPAC and IUPAC-like chemical names. Bioinformatics 24, i268–i276 (2008)
Article Google Scholar
Rocktäschel, T., Weidlich, M., Leser, U.: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012)
Article Google Scholar
Sun, B., Mitra, P., Giles, C.L.: Mining, indexing, and searching for textual chemical molecule information. In: Proceedings of the 17th International Conference on World Wide Web, pp. 735–744 (2008)
Google Scholar
Wu, X., Zhang, L., Chen, Y., Rhodes, J., Griffin, T.D., Boyer, S.K., Alba, A., Cai, K.: ChemBrowser: a flexible framework for mining chemical documents. In: Arabnia, H. (ed.) Advances in Computational Biology, pp. 57–64. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-5913-3_7
Chapter Google Scholar
Lee, C., Hou, W.-J., Chen, H.-H.: Annotating multiple types of biomedical entities: a single word classification approach. In: International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (NLPBA/BioNLP) (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, College of Engineering, Anna University, Chennai, India
S. S. Deepika, T. V. Geetha & Rajeswari Sridhar

Authors

S. S. Deepika
View author publications
You can also search for this author in PubMed Google Scholar
T. V. Geetha
View author publications
You can also search for this author in PubMed Google Scholar
Rajeswari Sridhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. S. Deepika .

Editor information

Editors and Affiliations

Crescent University, Chennai, India
Shriram R
Birmingham City University, Birmingham, United Kingdom
Mak Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deepika, S.S., Geetha, T.V., Sridhar, R. (2018). Multi-perspective and Domain Specific Tagging of Chemical Documents. In: R, S., Sharma, M. (eds) Data Science Analytics and Applications. DaSAA 2017. Communications in Computer and Information Science, vol 804. Springer, Singapore. https://doi.org/10.1007/978-981-10-8603-8_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-8603-8_7
Published: 24 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8602-1
Online ISBN: 978-981-10-8603-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics