Large-Scale Knowledge Acquisition from Botanical Texts

  • François Role
  • Milagros Fernandez Gavilanes
  • Éric Villemonte de la Clergerie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4592)


Free text botanical descriptions contained in printed floras can provide a wealth of valuable scientific information. In spite of this richness, these texts have seldom been analyzed on a large scale using NLP techniques. To fill this gap, we describe how we managed to extract a set of terminological resources by parsing a large corpus of botanical texts. The tools and techniques used are presented as well as the rationale for favoring a deep parsing approach coupled with error mining methods over a simple pattern matching approach.


Domain Ontology Syntactic Context Description Section Linguistic Marker Subcategorization Frame 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kirkup, D., Malcolm, P., Christian, G., Paton, A.: Towards a digital african flora. Taxon 54(2), 457–466 (2005)CrossRefGoogle Scholar
  2. 2.
    Rousse, G., de La Clergerie, É.V.: Analyse automatique de documents botaniques: le projet Biotim. In: Proc. of TIA 2005, Rouen, France, pp. 95–104 (April 2005)Google Scholar
  3. 3.
    Daille, B.: Terminology mining. In: Pazienza, M.T. (ed.) Information Extraction in the Web Era. Lectures Notes in Artifial Intelligence, pp. 29–44. Springer, Heidelberg (2003)Google Scholar
  4. 4.
    Faure, D., Nédellec, C.: ASIUM: learning subcategorization frames and restrictions of selection. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. LNCS, vol. 1398, Springer, Heidelberg (1998)Google Scholar
  5. 5.
    Grefenstette, G.: Explorations in Automatic Thesaurus Construction. Kluwer Academic Publishers, Dordrecht (1994)Google Scholar
  6. 6.
    Cimiano, P., Staab, S., Hotho, A.: Clustering ontologies from text. In: Proceedings of LREC 2004, pp. 1721–1724 (2004)Google Scholar
  7. 7.
    de Marneffe, M.-C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proc. of LREC 2006 (2006)Google Scholar
  8. 8.
    Lin, D., Pantel, P.: DIRT - discovery of inference rules from text. In: Proceedings of KDD-01, San Francisco, CA, pp. 323–328 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • François Role
    • 1
  • Milagros Fernandez Gavilanes
    • 2
  • Éric Villemonte de la Clergerie
    • 3
  1. 1.L3i, Université de La RochelleFrance
  2. 2.University of VigoSpain
  3. 3.INRIA RocquencourtFrance

Personalised recommendations