Abstract
Text document search typically retrieves documents by performing an exact match based on keywords. In all domains the exact match may not yield good performance as the morpheme or structure of the words has not been considered for the search. This problem becomes significant in the research field of chemistry, where the user could search using a keyword and the document could contain the keyword as a part of the chemical name. For example, the chemical name pentanone contains ketone functional group in it, which can be found by doing a morphemic analysis with the help of chemical nomenclature. Each of the chemical names contains a lot of information about the chemical compound for which it is being named. Hence, the chemical names in the document need to be tagged with all its possible meaningful morphemes to have efficient performance. A multi-perspective and domain specific tagging system was designed based on the available chemical nomenclature, considering the type of bond, number of carbon atoms and the functional group of the chemical entity. The tagging system begins with extraction of the chemical names in the document based on morphological and domain specific features. Based on these features and the contextual knowledge, models were created by designing a linear-chain conditional random field of order two, and they serve as a baseline for the chemical entity extraction process. A morphemic or structural analysis of the extracted named entity was done for the multi-perspective tagging system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kolárik, C., Klinger, R., Friedrich, C.M., Hofmann-Apitius, M., Fluck, J.: Chemical names: terminological resources and corpora annotation. In: Workshop on Building and Evaluating Resources for Biomedical Text Mining. Language Resources and Evaluation Conference, 6th edn., pp. 51–58 (2008)
Roberts, P.M., Hayes, W.S.: Information needs and the role of text mining in drug development. In: Pacific Symposium on Biocomputing, vol. 13, pp. 592–603 (2008)
Sonu, G.S., Harikumar, S.L., Navis, S.: A review on drug-drug and drug-food interactions in patients during the treatment of diabetes mellitus. Int. J. Pharmacol. Clin. Sci. 4(4), 98–105 (2015)
Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings Bioinform. 6(1), 57–71 (2005)
Friedrich, C.M., Revillion, T., Hofmann, M., Fluck, J.: Biomedical and chemical named entity recognition with conditional random fields: the advantage of dictionary features. In: Proceedings of the Second International Symposium on Semantic Mining in Biomedicine, BMC Bioinformatics, vol. 7, pp. 85–89 (2006)
John Wilbur, W., Hazard, G.F., Divita, G., Mork, J.G., Aronson, A.R., Browne, A.C.: Analysis of biomedical text for chemical names: a comparison of three methods. In: Proceedings of the AMIA Symposium, pp. 176–180 (1999)
Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.: Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7), 1178–1190 (2004)
Grego, T., Pesquita, C., Bastos, H.P., Couto, F.M.: Chemical entity recognition and resolution to ChEBI. ISRN Bioinform. (2012). https://doi.org/10.5402/2012/619427. Article ID 619427, 9 pages
Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6, 1–17 (2014)
Umare, S.P., Deshpande, N.A.: A survey on machine learning techniques to extract chemical names from text documents. Int. J. Comput. Sci. Inf. Technol. 6(2), 1263–1266 (2015)
Algorri, M., Zimmermann, M., Friedrich C.M., Akle, S., Hofmann-Apitius, M.: Reconstruction of chemical molecules from images. In: Proceedings of the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC, pp. 4609–4612 (2007)
Sun, B., Tan, Q., Mitra, P., Giles, C.L.: Extraction and search of chemical formulae in text documents. In: Proceedings of the 16th International Conference on World Wide Web, pp. 251–260 (2007)
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML (2001)
de Matos, P., Dekker, A., Ennis, M., Hastings, J., Haug, K., Turne, S., Steinbeck, C.: ChEBI: a chemistry ontology and database. J. Cheminform. 2, P6 (2010)
Jessop, D.M., Adams, S.E., Willighagen, E.L., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical textmining. J. Cheminform. 3, 1–41 (2011)
IUPAC: Commission on the Nomenclature of Organic Chemistry. A Guide to IUPAC Nomenclature of Organic Compounds (Recommendations 1993). Blackwell Scientific Publications, Oxford (1993)
Lana-Serrano, S., Sanchez-Cisneros, D., Campillos, L., Segura-Bedmar, I.: Recognizing chemical compounds and drugs: a rule-based approach using semantic information. In: BioCreative Challenge Evaluation Workshop, vol. 2 (2013)
Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9(Suppl. 11), 54–62 (2008)
Usié, A., Alves, R., Solsona, F., Vázquez, M., Valencia, A.: CheNER: chemical named entity recognizer. Bioinformatics 30(7), 1039–1040 (2014)
Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(1), i180–i182 (2003)
Corbett, P., Batchelor, C., Teufel, S.: Annotation of chemical named entities. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, pp. 57–64 (2007)
Hawizy, L., Jessop, D.M., Adams, N., Murray-Rust, P.: Chemical tagger: a tool for semantic text-mining in chemistry. J. Cheminform. 3, 1–17 (2011)
Klinger, R., Kolárik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of IUPAC and IUPAC-like chemical names. Bioinformatics 24, i268–i276 (2008)
Rocktäschel, T., Weidlich, M., Leser, U.: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012)
Sun, B., Mitra, P., Giles, C.L.: Mining, indexing, and searching for textual chemical molecule information. In: Proceedings of the 17th International Conference on World Wide Web, pp. 735–744 (2008)
Wu, X., Zhang, L., Chen, Y., Rhodes, J., Griffin, T.D., Boyer, S.K., Alba, A., Cai, K.: ChemBrowser: a flexible framework for mining chemical documents. In: Arabnia, H. (ed.) Advances in Computational Biology, pp. 57–64. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-5913-3_7
Lee, C., Hou, W.-J., Chen, H.-H.: Annotating multiple types of biomedical entities: a single word classification approach. In: International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (NLPBA/BioNLP) (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deepika, S.S., Geetha, T.V., Sridhar, R. (2018). Multi-perspective and Domain Specific Tagging of Chemical Documents. In: R, S., Sharma, M. (eds) Data Science Analytics and Applications. DaSAA 2017. Communications in Computer and Information Science, vol 804. Springer, Singapore. https://doi.org/10.1007/978-981-10-8603-8_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-8603-8_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8602-1
Online ISBN: 978-981-10-8603-8
eBook Packages: Computer ScienceComputer Science (R0)