Skip to main content

Ontology Design for Biomedical Text Mining

  • Chapter
Semantic Web

Abstract

Text Mining in biology and biomedicine requires a large amount of domain-specific knowledge. Publicly accessible resources hold much of the information needed, yet their practical integration into natural language processing (NLP) systems is fraught with manifold hurdles, especially the problem of semantic disconnectedness throughout the various resources and components. Ontologies can provide the necessary framework for a consistent semantic integration, while additionally delivering formal reasoning capabilities to NLP.

In this chapter, we address four important aspects relating to the integration of ontology and NLP: (i) An analysis of the different integration alternatives and their respective vantages; (ii) The design requirements for an ontology supporting NLP tasks; (iii) Creation and initialization of an ontology using publicly available tools and databases; and (iv) The connection of common NLP tasks with an ontology, including technical aspects of ontology deployment in a text mining framework. A concrete application example—text mining of enzyme mutations—is provided to motivate and illustrate these points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ananiadou S. and McNaught J., editors. Text Mining for Biology and Biomedicine. Artech House, 2006.

    Google Scholar 

  2. Baader F., Calvanese D., McGuinness D.L., Nardi D., and Patel-Schneider P.R, editors. The Description Logic Handbook: Theory, Implementation and Application. Cambridge University Press, 2002.

    Google Scholar 

  3. Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O’Donovan C., Redaschi N., and Yeh L.S.L. The Universal Protein Resource (UniProt). Nucleic Acids Research, 2005.

    Google Scholar 

  4. Baker C.J.O., Shaban-Nejad A., Su X., Haarslev V., and Butler G. Semantic Web Infrastructure for Fungal Enzyme Biotechnologists. Journal of Web Semantics, vol. 4(3), 2006. Special issue on Semantic Web for the Life Sciences.

    Google Scholar 

  5. Baker C.J.O., Su X., Butler G., and Haarslev V. Ontoligent Interactive Query Tool. In M.T. Koné and D. Lemire, editors, Canadian Semantic Web Series, vol. 2 of Semantic Web and Beyond. Springer, 2006.

    Google Scholar 

  6. Baker C.J.O. and Witte R. Mutation Mining—A Prospector’s Tale. Information Systems Frontiers (ISF), vol. 8(1):47–57, February 2006.

    Article  Google Scholar 

  7. Baker C.J.O., Witte R., Shaban-Nejad A., Butler G., and Haarslev V. The FungalWeb Ontology: Application Scenarios. In Eighth Annual Bio-Ontologies Meeting, pages 1–2. Detroit, Michigan, USA, June 24 2005.

    Google Scholar 

  8. Bodenreider O. Lexical, Terminological, and Ontological Resources for Biological Text Mining. In Ananiadou and McNaught [1], chapter 3.

    Google Scholar 

  9. Bontcheva K., Tablan V, Maynard D., and Cunningham H. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, 2004.

    Google Scholar 

  10. Buitelaar P., Cimiano P., and Magnini B., editors. Ontology Learning from Text: Methods, Evaluation and Applications, vol. 123 of Frontiers in Artificial Intelligence and Applications. IOS Press, 2005.

    Google Scholar 

  11. Camon E.B., Barrell D.G., Dimmer E.C., Lee V., Magrane M., Maslen J., Binns D., and Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics, vol. 6(Suppl 1), 2005.

    Google Scholar 

  12. Castaño J., Zhang J., and Pustejovsky J. Anaphora Resolution in Biomedical Literature. In International Symposium on Reference Resolution. 2002.

    Google Scholar 

  13. Chang J. and Schütze H. Abbreviations in Biomedical Text. In Ananiadou and McNaught [1], chapter 5.

    Google Scholar 

  14. Cohen A.M. and Hersh W.R. A survey of current work in biomedical text mining. Briefings in Bioinformatics, vol. 6:57–71, 2005.

    Article  PubMed  CAS  Google Scholar 

  15. Couto F.M., Silva M.J., and Coutinho P. ProFAL: PROtein Functional Annotation through Literature. In VII Conference on Software Engineering and Databases (JISBD), pages 747–756. 2003.

    Google Scholar 

  16. Cunningham H., Maynard D., Bontcheva K., and Tablan V. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the ACL. 2002. http://gate.ac.uk.

    Google Scholar 

  17. Cunningham H., Maynard D., and Tablan V. JAPE: a Java Annotation Patterns Engine (Second Edition). Technical report, University of Sheffield, Department of Computer Science, 2000.

    Google Scholar 

  18. Doms A. and Schroeder M. GoPubMed: Exploring PubMed with the GeneOntology. Nucleic Acids Research, vol. 33:W783–W786, 2005.

    Article  PubMed  CAS  Google Scholar 

  19. Federhen S. The Taxonomy Project. In J. McEntyre and J. Ostell, editors, The NCBI Handbook, chapter 4. National Library of Medicine (US), National Center for Biotechnology Information, 2003.

    Google Scholar 

  20. Gabdoulline R.R., Hoffmann R., Leitner F., and Wade R.C. ProSAT: functional annotation of protein 3D structures. Bioinformatics, vol. 19(13): 1723–1725, 2003.

    Article  PubMed  CAS  Google Scholar 

  21. Gasperin C. Semi-supervised anaphora resolution in biomedical texts. In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP). New York City, NY, USA, 2006.

    Google Scholar 

  22. Haarslev V. and Möller R. RACER System Description. In Proceedings of International Joint Conference on Automated Reasoning (IJCAR), pages 701–705. Springer-Verlag Berlin, Siena, Italy, June 18–23 2001.

    Google Scholar 

  23. Hahn U. and Wermter J. Levels of Natural Language Processing for Text Mining. In Ananiadou and McNaught [1], chapter 2.

    Google Scholar 

  24. Hirschman L. and Blaschke C. Evaluation of Text Mining in Biology. In Ananiadou and McNaught [1], chapter 9.

    Google Scholar 

  25. Hirschman L., Yeh A., Blaschke C, and Valencia A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, vol. 6(Suppl 1), 2005.

    Google Scholar 

  26. Horn F., Lau A.L., and Cohen RE. Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics, vol. 20(4):557–568, 2004.

    Article  PubMed  CAS  Google Scholar 

  27. Kawabata T., Ota M., and Nishikawa K. The protein mutant database. Nucleic Acids Research, vol. 27(1), 1999.

    Google Scholar 

  28. Kim J.J. and Park J.C. BioAR: Anaphora Resolution for Relating Protein Names to Proteome Database Entries. In S. Harabagiu and D. Farwell, editors, ACL 2004: Workshop on Reference Resolution and its Applications, pages 79–86. Association for Computational Linguistics, Barcelona, Spain, 2004.

    Google Scholar 

  29. Kiryakov A., Popov B., Terziev I., Manov D., and Ognyanoffe D. Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics, vol. 2(1), 2005.

    Google Scholar 

  30. Leroy G. and Chen H. Genescene: An Ontology-enhanced Integration of Linguistic and Co-occurrence based Relations in Biomedical Texts. Journal of the American Society for Information Systems and Technology (JASIST), vol. 56(5):457–468, March 2005.

    Article  CAS  Google Scholar 

  31. Leroy G., Chen H., and Martinez J.D. A shallow parser based on closed-class words to capture relations in biomedical text. J. of Biomedical Informatics, vol. 36:145–158, 2003.

    Article  Google Scholar 

  32. Li Y., Bontcheva K., and Cunningham H. Using Uneven Margins SVM and Perceptron for Information Extraction. In Proceedings of Ninth Conference on Computational Natural Language Learning (CoNLL). 2005.

    Google Scholar 

  33. Manning C.D. and Schütze H. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.

    Google Scholar 

  34. McNaught J. and Black W.J. Information Extraction. In Ananiadou and McNaught [1], chapter 7.

    Google Scholar 

  35. Müller H.M., Kenny E.E., and Steniberg P.W. Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. PLoS Biology, vol. 2(11): 1984–1998, November 2004.

    Article  Google Scholar 

  36. Niles I. and Pease A. Towards a Standard Upper Ontology. In C. Welty and B. Smith, editors, Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS). Ogunquit, Maine, 2001.

    Google Scholar 

  37. Park J.C. and Kim J.J. Named Entity Recognition. In Ananiadou and McNaught [1], chapter 6.

    Google Scholar 

  38. Pearson W.R. and Lipman D.J. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the USA, vol. 85(8):2444–2448, April 1988.

    Article  PubMed  CAS  Google Scholar 

  39. Popov B., Kiryakov A., Ognyanoff D., Manov D., Kirilov A., and Goranov M. Towards Semantic Web Information Extraction. In Human Language Technologies Workshop at the 2nd International Semantic Web Conference (ISWC). Sanibel Island, Florida, USA, October 20 2003.

    Google Scholar 

  40. Rebholz-Schuhmann D., Kirsch H., and Couto F. Facts from Text—Is Text Mining Ready to Deliver? PLoS Biology, vol. 3:188–191, 2005.

    Article  CAS  Google Scholar 

  41. Rebholz-Schuhmann D., Marcel S., Albert S., Tolle R., Casari G., and Kirsch H. Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Research, vol. 32(1):135–142, 2004.

    Article  PubMed  CAS  Google Scholar 

  42. Roche E. and Schabes Y., editors. Finite-State Language Processing. MIT Press, 1997.

    Google Scholar 

  43. Schuman J. and Bergler S. Postnominal prepositional attachment in proteomics. In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP). New York City, NY, USA, 2006.

    Google Scholar 

  44. Shaban-Nejad A., Baker C.J.O., Haarslev V., and Butler G. The FungalWeb Ontology: Semantic Web Challenges in Bioinformatics and Genomics. In Springer LNCS 3729, pages 1063–1066. 2005.

    Google Scholar 

  45. Smith M.K., Welty C., and McGuinness D.L., editors. OWL Web Ontology Language Guide. World Wide Web Consortium, 2004. http://www.w3.org/TR/owl-guide/.

    Google Scholar 

  46. Spasic I., Ananiadou S., McNaught J., and Kumar A. Text mining and ontologies in biomedicine: making sense of raw text. Briefings in Bioinformatics, vol. 6, 2005.

    Google Scholar 

  47. Staab S. and Studer R., editors. Handbook on Ontologies. Springer, 2004.

    Google Scholar 

  48. Stoica E. and Hearst M. Predicting Gene Functions from Text Using a Cross-Species Approach. In Pacific Symposium on Biocomputing (PSB), pages 88–99. 2006.

    Google Scholar 

  49. Tsujii J. and Ananiadou S. Thesaurus or logical ontology, which one do we need for text mining? Language Resources and Evaluation, vol. 39(1):77–90, 2005.

    Article  Google Scholar 

  50. Vlachos A., Gasperin C., Lewin I., and Briscoe T. Bootstrapping the Recognition and Anaphoric Linking of Named Entities in Drosophila Articles. In Pacific Symposium on Biocomputing, pages 100–111. 2006.

    Google Scholar 

  51. Wattarujeekrit T., Shah P.K., and Collier N. PASBio: predicate-argument structures for event extraction in molecular biology. BioMed Central Bioinformatics, vol. 5(155), 2004.

    Google Scholar 

  52. Wessel M. and Möller R. High Performance Semantic Web Query Answering Engine. In International Workshop on Description Logics (DL). Edinburgh, Scotland, UK, 2005.

    Google Scholar 

  53. Witte R. and Baker C.J.O. Combining Biological Databases and Text Mining to support New Bioinformatics Applications. In 10th International Conference on Applications of Natural Language to Information Systems (NLDB), vol. 3513 of LNCS, pages 310–321. Springer, Alicante, Spain, June 15–17 2005.

    Google Scholar 

  54. Wood M.M., Lydon S.J., Tablan V., Maynard D., and Cunningham H. Populating a Database from Parallel Texts Using Ontology-Based Information Extraction. In 9th International Conference on Applications of Natural Language to Information Systems (NLDB), vol. 3136 of LNCS. Springer, 2004.

    Google Scholar 

  55. Yakushiji A., Tateisi Y., Miyao Y., and Tsujii J. Event extraction from biomedical papers using a full parser. In Proceedings of the 6th Pacific Symposium on BioComputing (PSB), pages 408–419. Hawaii, USA, January 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Witte, R., Kappler, T., Baker, C.J.O. (2007). Ontology Design for Biomedical Text Mining. In: Baker, C.J.O., Cheung, KH. (eds) Semantic Web. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48438-9_14

Download citation

Publish with us

Policies and ethics