Skip to main content

Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification

  • Conference paper
  • First Online:
Advances in Artificial Intelligence: From Theory to Practice (IEA/AIE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10351))

Abstract

Statistical and machine learning approaches to named entity recognition have risen to prominence in the field of natural language processing. Certain named entities, specifically biomedical software, is a challenge to identify as a named entity. One direction is investigating the use of contextual semantic information to assist in this task as alluded to by previous researchers. We introduce an ontology-driven method that experiments with both information extraction and inherited features of ontologies (e.g., embedded semantic relationships and links to entities) to automatically identify familiar and unfamiliar software names. We evaluated this method with a set of biomedical research abstracts containing software entities. Our proposed approach could be used to further augment other named entity recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.

  2. 2.

    http://www.w3.org/TR/owl2-overview/.

  3. 3.

    https://bd2k.nih.gov/.

  4. 4.

    http://bioportal.bioontology.org/ontologies/SWO.

  5. 5.

    http://owlapi.sourceforge.net/.

  6. 6.

    https://code.google.com/p/ws4j/.

  7. 7.

    Hirst and St-Onge (1988), Leacock and Chodorow (1988), Banerjee and Pedersen (2002), Wu and Palmer (1994), Resnik (1995), Jiang and Conrath (1997), Lin (1998), and ws4J’s PATH.

  8. 8.

    http://commons.apache.org/.

  9. 9.

    https://github.com/google/guava.

  10. 10.

    http://eclipse.org.

References

  1. Brazas, M.D., Yim, D.S., Yamada, J.T., Ouellette, B.F.F.: The 2011 bioinformatics links directory update: more resources, tools and databases and features to empower the bioinformatics community. Nucl. Acids Res. 39(suppl), W3–W7 (2011). http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkr514

    Article  Google Scholar 

  2. Brooks, F.P.: The Mythical Man-Month, vol. 1995. Addison-Wesley, Reading (1975)

    Google Scholar 

  3. Budgen, D.: Software Design. Pearson Education, Harlow (2003)

    MATH  Google Scholar 

  4. de la Calle, G., Garca-Remesal, M., Chiesa, S., de la Iglesia, D., Maojo, V.: BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature. BMC Bioinform. 10(1), 320 (2009). http://www.biomedcentral.com/1471-2105/10/320

    Article  Google Scholar 

  5. Cannata, N., Merelli, E., Altman, R.B.: Time to organize the bioinformatics resourceome. PLoS Comput. Biol. 1(7), e76 (2005)

    Article  Google Scholar 

  6. Chiang, J.H., Yu, H.C.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)

    Article  Google Scholar 

  7. Cimiano, P., Vlker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) (2005)

    Google Scholar 

  8. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003). https://www.cs.cmu.edu/afs/cs/Web/People/wcohen/postscript/kdd-2003-match-ws.pdf

  9. Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366. International World Wide Web Conferences Steering Committee (2013). http://dl.acm.org/citation.cfm?id=2488420

  10. Duck, G., Nenadic, G., Brass, A., Robertson, D.L., Stevens, R.: Extracting patterns of database and software usage from the bioinformatics literature. Bioinformatics 30(17), i601–i608 (2014). http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btu471

    Article  Google Scholar 

  11. Duck, G., Kovacevic, A., Robertson, D.L., Stevens, R., Nenadic, G.: Ambiguity and variability of database and software names in bioinformatics. J. Biomed. Semant. 6(1), 29 (2015). http://www.jbiomedsem.com/content/6/1/29

    Article  Google Scholar 

  12. Duck, G., Nenadic, G., Brass, A., Robertson, D.L., Stevens, R.: bioNerDS: exploring bioinformatics database and software use through literature mining. BMC Bioinform. 14(1), 194 (2013). http://www.biomedcentral.com/1471-2105/14/194

    Article  Google Scholar 

  13. Finlayson, M.A.: Java libraries for accessing the princeton wordnet: comparison and evaluation. In: Proceedings of the 7th Global Wordnet Conference, pp. 78–85 (2014)

    Google Scholar 

  14. Grannis, S.J., Overhage, J.M., McDonald, C.: Real world performance of approximate string comparators for use in patient matching. Medinfo 11, 43–47 (2004)

    Google Scholar 

  15. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum. Comput. Stud. 43(56), 907–928 (1995). http://www.sciencedirect.com/science/article/pii/S1071581985710816

    Article  Google Scholar 

  16. Hassell, J., Aleman-Meza, B., Arpinar, I.B.: Ontology-driven automatic entity disambiguation in unstructured text. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 44–57. Springer, Heidelberg (2006). doi:10.1007/11926078_4

    Chapter  Google Scholar 

  17. Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinform. 6(Suppl 1), S1 (2005). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869002/

    Article  Google Scholar 

  18. Kolluru, B., Hawizy, L., Murray-Rust, P., Tsujii, J., Ananiadou, S.: Using workflows to explore and optimise named entity recognition for chemistry. PLoS ONE 6(5), e20181 (2011). http://dx.doi.org/10.1371/journal.pone.0020181

    Article  Google Scholar 

  19. Lei, J., Tang, B., Lu, X., Gao, K., Jiang, M., Xu, H.: A comprehensive study of named entity recognition in Chinese clinical text. J. Am. Med. Inform. Assoc. 21(5), 808–814 (2014). http://jamia.oxfordjournals.org/content/21/5/808

    Article  Google Scholar 

  20. Malone, J., Brown, A., Lister, A.L., Ison, J., Hull, D., Parkinson, H., Stevens, R.: The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. J. Biomed. Semant. 5(1), 25 (2014). http://www.jbiomedsem.com/content/5/1/25/abstract

    Article  Google Scholar 

  21. Mendes, P.N., Jakob, M., Garca-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011). http://dl.acm.org/citation.cfm?id=2063519

  22. Mukherjea, S., Subramaniam, L.V., Chanda, G., Sankararaman, S., Kothari, R., Batra, V., Bhardwaj, D., Srivastava, B.: Enhancing a biomedical information extraction system with dictionary mining and context disambiguation. IBM J. Res. Dev. 48(5.6), 693–701 (2004)

    Article  Google Scholar 

  23. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007). http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002

    Article  Google Scholar 

  24. Sekine, S.: Extended named entity ontology with attribute information. In: LREC, pp. 52–57 (2008). http://nlp.cs.nyu.edu/sekine/papers/lrec08.pdf

  25. Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005). http://bioinformatics.oxfordjournals.org/content/21/14/3191

    Article  Google Scholar 

  26. Spasic, I., Ananiadou, S., McNaught, J., Kumar, A.: Text mining and ontologies in biomedicine: making sense of raw text. Brief. Bioinform. 6(3), 239–251 (2005). http://bib.oxfordjournals.org/content/6/3/239.short

    Article  Google Scholar 

  27. Tsuruoka, Y., Tsujii, J.: Improving the performance of dictionary-based approaches in protein name recognition. J. Biomed. Inform. 37(6), 461–470 (2004). http://linkinghub.elsevier.com/retrieve/pii/S1532046404000814

    Article  Google Scholar 

  28. Yamamoto, Y., Takagi, T.: OReFiL: an online resource finder for life sciences. BMC Bioinform. 8(1), 287 (2007). http://www.biomedcentral.com/1471-2105/8/287

    Article  Google Scholar 

Download references

Acknowledgements

Research was partially supported by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM011829 and R01AI130460, by the National Institutes of Health (NIH) through the NIH Big Data to Knowledge, Grant 1U24AI117966-01, and by the Cancer Prevention Research Institute of Texas (CPRIT) Training Grant #RP160015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cui Tao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Amith, M., Zhang, Y., Xu, H., Tao, C. (2017). Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60045-1_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60044-4

  • Online ISBN: 978-3-319-60045-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics