Populating a Database from Parallel Texts Using Ontology-Based Information Extraction

  • M. M. Wood
  • S. J. Lydon
  • V. Tablan
  • D. Maynard
  • H. Cunningham
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3136)


Legacy data in many mature descriptive sciences is distributed across multiple text descriptions. The challenge is both to extract this data, and to correlate it once extracted. The MultiFlora system does this using an established Information Extraction system tuned to the domain of botany and integrated with a formal ontology to structure and store the data. A range of output formats are supported through the W3C RDFS standard, making it simple to populate a database as desired.


Natural Language Processing Resource Description Framework Base Ontology Legacy Data Parallel Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bagga, A., Biermann, A.W.: A Methodology for Cross-Document Coreference. In: Proceedings of the Fifth Joint Conference on Information Sciences, pp. 207–210 (2000)Google Scholar
  2. 2.
    Chinchor, N.: MUC-4 Evaluation Metrics. In: Proceedings of the Fourth Message Understanding Conference, pp. 22–29 (1992)Google Scholar
  3. 3.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, USA (2002)Google Scholar
  4. 4.
    Lydon, S.J., Wood, M.M., Huxley, R., Sutton, D.: Data Patterns in Multiple Botanical Descriptions: implications for automatic processing of legacy data. Systematics and Biodiversity 1(2), 151–157 (2003)CrossRefGoogle Scholar
  5. 5.
    Lawrence, G.M.H.: Taxonomy of Vascular Plants. Macmillan, New York (1951)Google Scholar
  6. 6.
    Miller, C.J., Attwood, T.K.: Bioinformatics goes back to the future. Nature Reviews Molecular Cell Biology 4, 157–162 (2003)CrossRefGoogle Scholar
  7. 7.
    Radev, D.R., McKeown, K.R.: Generating Natural Language Summaries from Multiple On-Line Sources. Computational Linguistics 24(3) (1998)Google Scholar
  8. 8.
    Stace, C.: New Flora of the British Isles. Cambridge University Press, Cambridge (1997)Google Scholar
  9. 9.
    Stein, G.C., Bagga, A., Bowden Wise, G.: Multi-Document Summarization: Methodologies and Evaluations. In: Proceedings of the 7th Conference on Automatic Natural Language Processing, pp. 337–346 (2000) Google Scholar
  10. 10.
    Wood, M.M., Lydon, S.J., Tablan, V., Maynard, D., Cunningham, H.: Using parallel texts to improve recall in IE. In: Recent Advances in Natural Language Processing: Selected Papers from RANLP 2003, John Benjamins, Amsterdam (2003) (in press)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • M. M. Wood
    • 1
  • S. J. Lydon
    • 2
  • V. Tablan
    • 3
  • D. Maynard
    • 3
  • H. Cunningham
    • 3
  1. 1.Dept of Computer ScienceUniversity of ManchesterManchesterUK
  2. 2.Earth Science Education UnitKeele UniversityStaffordshireUK
  3. 3.Dept of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations