Skip to main content

Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions

  • Conference paper
Data Integration in the Life Sciences (DILS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Included in the following conference series:

Abstract

Linking the biomedical literature to other data resources is notoriously difficult and requires text mining. Text mining aims to automatically extract facts from literature. Since authors write in natural language, text mining is a great natural language processing challenge, which is far from being solved. We propose an alternative: If authors and editors summarize the main facts in a controlled natural language, text mining will become easier and more powerful. To demonstrate this approach, we use the language Attempto Controlled English (ACE). We define a simple model to capture the main aspects of protein interactions. To evaluate our approach, we collected a dataset of 459 paragraph headings about protein interaction from literature. 56% of these headings can be represented exactly in ACE and another 23% partially. These results indicate that our approach is feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, A., Kaufmann, E., Kaiser, C.: Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine. In: Department of Informatics, University of Zurich (2005)

    Google Scholar 

  2. Booch, G., Rumbaugh, J., Jacobson, I.: The Unified Modeling Language User Guide, 1st edn. Addison-Wesley, Reading (1998)

    Google Scholar 

  3. Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 57–71 (2004)

    Article  Google Scholar 

  4. Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604–611 (2004)

    Article  Google Scholar 

  5. Doms, A., Schroeder, M.: GoPubMed: exploring PubMed with the Gene Ontology. In Nucleic Acids Research 33, W783–W786 (2005)

    Article  Google Scholar 

  6. Fuchs, N.E., Hoefler, S., Kaljurand, K., Kuhn, T., Schneider, G., Schwertel, U.: Discourse Representation Structures of ACE 4 Sentences, Technical Report ifi-2006.07. Department of Informatics, University of Zurich (2006), ftp://ftp.ifi.unizh.ch/pub/techreports/TR-2006/ifi-2006.07.pdf

  7. Fuchs, N.E., Kaljurand, K., Schneider, G.: Attempto Controlled English Meets the Challenges of Knowledge Representation, Reasoning, Interoperability and User Interfaces. In: The 19th International FLAIRS Conference (FLAIRS 2006) (2006)

    Google Scholar 

  8. Fuchs, N.E., Schwertel, U., Schwitter, R.: Attempto Controlled English – Not Just Another Logic Specification Language. In: Flener, P. (ed.) LOPSTR 1998. LNCS, vol. 1559, p. 1. Springer, Heidelberg (1999), http://www.ifi.unizh.ch/attempto/publications/papers/LOPSTR98.pdf

    Chapter  Google Scholar 

  9. Fitting, M.: First-Order Logic and Automated Theorem Proving, 2nd edn. Springer, New York (1996)

    MATH  Google Scholar 

  10. Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., et al.: A Protein Interaction Map of Drosophila melanogaster. Science 302(5651), 1727–1736 (2003)

    Article  Google Scholar 

  11. Gruber, T.R.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies 43(5-6), 907–928 (1995)

    Article  Google Scholar 

  12. Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. In Bioinformatics Review 18(12), 1553–1561 (2002)

    Article  Google Scholar 

  13. Stefan Hoefler. The Syntax of Attempto Controlled English: An Abstract Grammar for ACE 4.0, Technical Report ifi-2004.03. Department of Informatics, University of Zurich (2004), ftp://ftp.ifi.unizh.ch/pub/techreports/TR-2004/ifi-2004.03.pdf

  14. Deborah, L.: McGuinness, Frank van Harmelen. OWL Web Ontology Language Overview. W3C Recommendation (2004), http://www.w3.org/TR/2004/REC-owl-features-20040210/

  15. Nardi, D., Brachman, R.J.: An Introduction to Description Logics. In: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  16. Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. In Nature Biotechnology 18, 1257–1261 (2000)

    Article  Google Scholar 

  17. Schwitter, R., Ljungberg, A., Hood, D.: ECOLE: A Look-ahead Editor for a Controlled Language. In: Proceedings of EAMT-CLAW 2003, Controlled Language Translation, pp. 141–150. Dublin City University (2003)

    Google Scholar 

  18. Schwitter, R., Tilbrook, M.: Let’s Talk in Description Logic via Controlled Natural Language. In: Logic and Engineering of Natural Language Semantics 2006 (LENLS 2006), Japan (2006)

    Google Scholar 

  19. Thompson, C.W., Pazandak, P., Tennant, H.R.: Talk to Your Semantic Web. In IEEE Internet Computing 9(6), 75–79 (2005)

    Article  Google Scholar 

  20. Uschold, M., Gruninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Engineering Review 11(2) (1996)

    Google Scholar 

  21. Yeh, A., Morgan, A., Colosimo, M., Hirschman, L.: BioCreAtIvE Task 1A: gene mention finding evaluation. BMC Bioinformatics 6 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kuhn, T., Royer, L., Fuchs, N.E., Schröder, M. (2006). Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_7

Download citation

  • DOI: https://doi.org/10.1007/11799511_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics