Grammatical Inference in Practice: A Case Study in the Biomedical Domain

  • Sophia Katrenko
  • Pieter Adriaans
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4201)


In this paper we discuss an approach to named entity recognition (NER) based on grammatical inference (GI). Previous GI approaches have aimed at constructing a grammar underlying a given text source. It has been noted that the rules produced by GI can also be interpreted semantically [16] where a non-terminal describes interchangeable elements which are the instances of the same concepts. Such an observation leads to the hypothesis that GI might be useful for finding concept instances in a text. Furthermore, it should also be possible to discover relations between concepts, or more precisely, the way such relations are expressed linguistically.

Throughout the paper, we propose a general framework for using GI for named entity recognition by discussing several possible approaches. In addition, we demonstrate that these methods successfully work on biomedical data using an existing GI tool.


Equivalence Class Jurkat Cell Name Entity Recog Entity Recognition Semantic Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adriaans, P., van Zaanen, M.: Computational Grammar Induction for Linguists. Grammars 7, 57–68 (2004)Google Scholar
  2. 2.
    Craven, M., Kumlien, J.: Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB 1999) (1999)Google Scholar
  3. 3.
    Dietteriech, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Freitag, D.: Using Grammatical Inference to Improve Precision in Information Extraction. In: Workshop on Grammatical Inference, Automata Induction, and Language Acquisition (ICML 1997), Nashville (1997)Google Scholar
  5. 5.
    Hachey, B., Grover, C., et al.: Use of Ontologies for Cross-lingual Information Management in the Web. In: Proceedings of the Ontologies and Information Extraction International Workshop (EUROLAN 2003), Bucarest, Romania (July 28 - August 8, 2003)Google Scholar
  6. 6.
    Hahn, U., Romacker, M.: An Integrated Model of Semantic and Conceptual Interpretation from Dependency Structures. In: Proceedings of the 18th Conference on Computational Linguistics, Saarbrücken, Germany, pp. 271–277 (2000)Google Scholar
  7. 7.
    Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France (1992)Google Scholar
  8. 8.
    Karampatziakis, N., Paliouras, G., Pierrakos, D., Stamatopoulos, P.: Navigation pattern discovery using grammatical inference. In: Paliouras, G., Sakakibara, Y. (eds.) ICGI 2004. LNCS (LNAI), vol. 3264, pp. 187–198. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Katrenko, S., Adriaans, P.W.: Learning Relations from Biomedical Corpora Using Dependency Tree Levels. In: Benelearn 2006 (2006)Google Scholar
  10. 10.
    Kim, J.-D., et al.: Introduction to the Bio-Entity Recognition Task at JNLPBA. In: JNLPBA 2004 (2004)Google Scholar
  11. 11.
    Kunik, V., Solan, Z., Edelman, S., Ruppin, E., Horn, D.: Motif Extraction and Protein Classification. In: CSB (2005)Google Scholar
  12. 12.
    Pradhan, S., Haciouglu, K., Ward, W., Martin, J.H., Jurafsky, D.: Semantic Role Chunking Combining Complementary Syntactic Views. In: Proceedings of the 9th Conference on Natural Language Learning (CONNL 2005), Ann Arbor, MI (2005)Google Scholar
  13. 13.
    Reinberger, M.-L., Spyns, P., Pretorius, A.J., Daelemans, W.: Automatic initiation of an ontology. In: Meersman, R., Tari, Z. (eds.) OTM 2004. LNCS, vol. 3290, pp. 600–617. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Roberts, A., Atwell, E.: The Use of Corpora for Automatic Evaluation of Grammar Inference Systems. In: Proceedings of of the Corpus Linguistics 2003 Conference, (2003)Google Scholar
  15. 15.
    Sigletos, G., Paliouras, G., Spyropoulos, C.D., Hatzopoulos, M.: Voting and Stacked Generalization. In: JMLR, pp. 1751–1782 (2005)Google Scholar
  16. 16.
    Solan, Z., Ruppin, E., Horn, D., Edelman, S.: Automatic acquisition and efficient representation of syntactic structures. In: NIPS (2002)Google Scholar
  17. 17.
    Thelen, M., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2002)Google Scholar
  18. 18.
    Valarakos, A.G., Paliouras, G., Karkaletsis, V., Vouros, G.A.: Enhancing Ontological Knowledge Through Ontology Population and Enrichment. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, pp. 144–156. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  19. 19.
    van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A Comparison. In: Proceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC), Amsterdam, The Netherlands, pp. 315–322 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sophia Katrenko
    • 1
  • Pieter Adriaans
    • 1
  1. 1.Human-Computer Studies LabUniversity of Amsterdam 

Personalised recommendations