Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions

  • Tobias Kuhn
  • Loïc Royer
  • Norbert E. Fuchs
  • Michael Schröder
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4075)


Linking the biomedical literature to other data resources is notoriously difficult and requires text mining. Text mining aims to automatically extract facts from literature. Since authors write in natural language, text mining is a great natural language processing challenge, which is far from being solved. We propose an alternative: If authors and editors summarize the main facts in a controlled natural language, text mining will become easier and more powerful. To demonstrate this approach, we use the language Attempto Controlled English (ACE). We define a simple model to capture the main aspects of protein interactions. To evaluate our approach, we collected a dataset of 459 paragraph headings about protein interaction from literature. 56% of these headings can be represented exactly in ACE and another 23% partially. These results indicate that our approach is feasible.


Gene Ontology Protein Interaction Natural Language Formal Language Text Mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bernstein, A., Kaufmann, E., Kaiser, C.: Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine. In: Department of Informatics, University of Zurich (2005)Google Scholar
  2. 2.
    Booch, G., Rumbaugh, J., Jacobson, I.: The Unified Modeling Language User Guide, 1st edn. Addison-Wesley, Reading (1998)Google Scholar
  3. 3.
    Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 57–71 (2004)CrossRefGoogle Scholar
  4. 4.
    Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20(5), 604–611 (2004)CrossRefGoogle Scholar
  5. 5.
    Doms, A., Schroeder, M.: GoPubMed: exploring PubMed with the Gene Ontology. In Nucleic Acids Research 33, W783–W786 (2005)CrossRefGoogle Scholar
  6. 6.
    Fuchs, N.E., Hoefler, S., Kaljurand, K., Kuhn, T., Schneider, G., Schwertel, U.: Discourse Representation Structures of ACE 4 Sentences, Technical Report ifi-2006.07. Department of Informatics, University of Zurich (2006),
  7. 7.
    Fuchs, N.E., Kaljurand, K., Schneider, G.: Attempto Controlled English Meets the Challenges of Knowledge Representation, Reasoning, Interoperability and User Interfaces. In: The 19th International FLAIRS Conference (FLAIRS 2006) (2006)Google Scholar
  8. 8.
    Fuchs, N.E., Schwertel, U., Schwitter, R.: Attempto Controlled English – Not Just Another Logic Specification Language. In: Flener, P. (ed.) LOPSTR 1998. LNCS, vol. 1559, p. 1. Springer, Heidelberg (1999), CrossRefGoogle Scholar
  9. 9.
    Fitting, M.: First-Order Logic and Automated Theorem Proving, 2nd edn. Springer, New York (1996)MATHGoogle Scholar
  10. 10.
    Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., et al.: A Protein Interaction Map of Drosophila melanogaster. Science 302(5651), 1727–1736 (2003)CrossRefGoogle Scholar
  11. 11.
    Gruber, T.R.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies 43(5-6), 907–928 (1995)CrossRefGoogle Scholar
  12. 12.
    Hirschman, L., Park, J.C., Tsujii, J., Wong, L., Wu, C.H.: Accomplishments and challenges in literature data mining for biology. In Bioinformatics Review 18(12), 1553–1561 (2002)CrossRefGoogle Scholar
  13. 13.
    Stefan Hoefler. The Syntax of Attempto Controlled English: An Abstract Grammar for ACE 4.0, Technical Report ifi-2004.03. Department of Informatics, University of Zurich (2004),
  14. 14.
    Deborah, L.: McGuinness, Frank van Harmelen. OWL Web Ontology Language Overview. W3C Recommendation (2004),
  15. 15.
    Nardi, D., Brachman, R.J.: An Introduction to Description Logics. In: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)Google Scholar
  16. 16.
    Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. In Nature Biotechnology 18, 1257–1261 (2000)CrossRefGoogle Scholar
  17. 17.
    Schwitter, R., Ljungberg, A., Hood, D.: ECOLE: A Look-ahead Editor for a Controlled Language. In: Proceedings of EAMT-CLAW 2003, Controlled Language Translation, pp. 141–150. Dublin City University (2003)Google Scholar
  18. 18.
    Schwitter, R., Tilbrook, M.: Let’s Talk in Description Logic via Controlled Natural Language. In: Logic and Engineering of Natural Language Semantics 2006 (LENLS 2006), Japan (2006)Google Scholar
  19. 19.
    Thompson, C.W., Pazandak, P., Tennant, H.R.: Talk to Your Semantic Web. In IEEE Internet Computing 9(6), 75–79 (2005)CrossRefGoogle Scholar
  20. 20.
    Uschold, M., Gruninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Engineering Review 11(2) (1996)Google Scholar
  21. 21.
    Yeh, A., Morgan, A., Colosimo, M., Hirschman, L.: BioCreAtIvE Task 1A: gene mention finding evaluation. BMC Bioinformatics 6 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tobias Kuhn
    • 1
    • 2
  • Loïc Royer
    • 1
  • Norbert E. Fuchs
    • 2
  • Michael Schröder
    • 1
  1. 1.TU DresdenBiotechnological CenterGermany
  2. 2.Department of InformaticsUniversity of ZurichSwitzerland

Personalised recommendations