Abstract
Statistical and machine learning approaches to named entity recognition have risen to prominence in the field of natural language processing. Certain named entities, specifically biomedical software, is a challenge to identify as a named entity. One direction is investigating the use of contextual semantic information to assist in this task as alluded to by previous researchers. We introduce an ontology-driven method that experiments with both information extraction and inherited features of ontologies (e.g., embedded semantic relationships and links to entities) to automatically identify familiar and unfamiliar software names. We evaluated this method with a set of biomedical research abstracts containing software entities. Our proposed approach could be used to further augment other named entity recognition methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
Hirst and St-Onge (1988), Leacock and Chodorow (1988), Banerjee and Pedersen (2002), Wu and Palmer (1994), Resnik (1995), Jiang and Conrath (1997), Lin (1998), and ws4J’s PATH.
- 8.
- 9.
- 10.
References
Brazas, M.D., Yim, D.S., Yamada, J.T., Ouellette, B.F.F.: The 2011 bioinformatics links directory update: more resources, tools and databases and features to empower the bioinformatics community. Nucl. Acids Res. 39(suppl), W3–W7 (2011). http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkr514
Brooks, F.P.: The Mythical Man-Month, vol. 1995. Addison-Wesley, Reading (1975)
Budgen, D.: Software Design. Pearson Education, Harlow (2003)
de la Calle, G., Garca-Remesal, M., Chiesa, S., de la Iglesia, D., Maojo, V.: BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature. BMC Bioinform. 10(1), 320 (2009). http://www.biomedcentral.com/1471-2105/10/320
Cannata, N., Merelli, E., Altman, R.B.: Time to organize the bioinformatics resourceome. PLoS Comput. Biol. 1(7), e76 (2005)
Chiang, J.H., Yu, H.C.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)
Cimiano, P., Vlker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) (2005)
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003). https://www.cs.cmu.edu/afs/cs/Web/People/wcohen/postscript/kdd-2003-match-ws.pdf
Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366. International World Wide Web Conferences Steering Committee (2013). http://dl.acm.org/citation.cfm?id=2488420
Duck, G., Nenadic, G., Brass, A., Robertson, D.L., Stevens, R.: Extracting patterns of database and software usage from the bioinformatics literature. Bioinformatics 30(17), i601–i608 (2014). http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btu471
Duck, G., Kovacevic, A., Robertson, D.L., Stevens, R., Nenadic, G.: Ambiguity and variability of database and software names in bioinformatics. J. Biomed. Semant. 6(1), 29 (2015). http://www.jbiomedsem.com/content/6/1/29
Duck, G., Nenadic, G., Brass, A., Robertson, D.L., Stevens, R.: bioNerDS: exploring bioinformatics database and software use through literature mining. BMC Bioinform. 14(1), 194 (2013). http://www.biomedcentral.com/1471-2105/14/194
Finlayson, M.A.: Java libraries for accessing the princeton wordnet: comparison and evaluation. In: Proceedings of the 7th Global Wordnet Conference, pp. 78–85 (2014)
Grannis, S.J., Overhage, J.M., McDonald, C.: Real world performance of approximate string comparators for use in patient matching. Medinfo 11, 43–47 (2004)
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum. Comput. Stud. 43(56), 907–928 (1995). http://www.sciencedirect.com/science/article/pii/S1071581985710816
Hassell, J., Aleman-Meza, B., Arpinar, I.B.: Ontology-driven automatic entity disambiguation in unstructured text. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 44–57. Springer, Heidelberg (2006). doi:10.1007/11926078_4
Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinform. 6(Suppl 1), S1 (2005). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869002/
Kolluru, B., Hawizy, L., Murray-Rust, P., Tsujii, J., Ananiadou, S.: Using workflows to explore and optimise named entity recognition for chemistry. PLoS ONE 6(5), e20181 (2011). http://dx.doi.org/10.1371/journal.pone.0020181
Lei, J., Tang, B., Lu, X., Gao, K., Jiang, M., Xu, H.: A comprehensive study of named entity recognition in Chinese clinical text. J. Am. Med. Inform. Assoc. 21(5), 808–814 (2014). http://jamia.oxfordjournals.org/content/21/5/808
Malone, J., Brown, A., Lister, A.L., Ison, J., Hull, D., Parkinson, H., Stevens, R.: The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. J. Biomed. Semant. 5(1), 25 (2014). http://www.jbiomedsem.com/content/5/1/25/abstract
Mendes, P.N., Jakob, M., Garca-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011). http://dl.acm.org/citation.cfm?id=2063519
Mukherjea, S., Subramaniam, L.V., Chanda, G., Sankararaman, S., Kothari, R., Batra, V., Bhardwaj, D., Srivastava, B.: Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM J. Res. Dev. 48(5.6), 693–701 (2004)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007). http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002
Sekine, S.: Extended named entity ontology with attribute information. In: LREC, pp. 52–57 (2008). http://nlp.cs.nyu.edu/sekine/papers/lrec08.pdf
Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005). http://bioinformatics.oxfordjournals.org/content/21/14/3191
Spasic, I., Ananiadou, S., McNaught, J., Kumar, A.: Text mining and ontologies in biomedicine: making sense of raw text. Brief. Bioinform. 6(3), 239–251 (2005). http://bib.oxfordjournals.org/content/6/3/239.short
Tsuruoka, Y., Tsujii, J.: Improving the performance of dictionary-based approaches in protein name recognition. J. Biomed. Inform. 37(6), 461–470 (2004). http://linkinghub.elsevier.com/retrieve/pii/S1532046404000814
Yamamoto, Y., Takagi, T.: OReFiL: an online resource finder for life sciences. BMC Bioinform. 8(1), 287 (2007). http://www.biomedcentral.com/1471-2105/8/287
Acknowledgements
Research was partially supported by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM011829 and R01AI130460, by the National Institutes of Health (NIH) through the NIH Big Data to Knowledge, Grant 1U24AI117966-01, and by the Cancer Prevention Research Institute of Texas (CPRIT) Training Grant #RP160015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Amith, M., Zhang, Y., Xu, H., Tao, C. (2017). Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-60045-1_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60044-4
Online ISBN: 978-3-319-60045-1
eBook Packages: Computer ScienceComputer Science (R0)