Skip to main content

Populating a Database from Parallel Texts Using Ontology-Based Information Extraction

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Abstract

Legacy data in many mature descriptive sciences is distributed across multiple text descriptions. The challenge is both to extract this data, and to correlate it once extracted. The MultiFlora system does this using an established Information Extraction system tuned to the domain of botany and integrated with a formal ontology to structure and store the data. A range of output formats are supported through the W3C RDFS standard, making it simple to populate a database as desired.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bagga, A., Biermann, A.W.: A Methodology for Cross-Document Coreference. In: Proceedings of the Fifth Joint Conference on Information Sciences, pp. 207–210 (2000)

    Google Scholar 

  2. Chinchor, N.: MUC-4 Evaluation Metrics. In: Proceedings of the Fourth Message Understanding Conference, pp. 22–29 (1992)

    Google Scholar 

  3. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, USA (2002)

    Google Scholar 

  4. Lydon, S.J., Wood, M.M., Huxley, R., Sutton, D.: Data Patterns in Multiple Botanical Descriptions: implications for automatic processing of legacy data. Systematics and Biodiversity 1(2), 151–157 (2003)

    Article  Google Scholar 

  5. Lawrence, G.M.H.: Taxonomy of Vascular Plants. Macmillan, New York (1951)

    Google Scholar 

  6. Miller, C.J., Attwood, T.K.: Bioinformatics goes back to the future. Nature Reviews Molecular Cell Biology 4, 157–162 (2003)

    Article  Google Scholar 

  7. Radev, D.R., McKeown, K.R.: Generating Natural Language Summaries from Multiple On-Line Sources. Computational Linguistics 24(3) (1998)

    Google Scholar 

  8. Stace, C.: New Flora of the British Isles. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  9. Stein, G.C., Bagga, A., Bowden Wise, G.: Multi-Document Summarization: Methodologies and Evaluations. In: Proceedings of the 7th Conference on Automatic Natural Language Processing, pp. 337–346 (2000)

    Google Scholar 

  10. Wood, M.M., Lydon, S.J., Tablan, V., Maynard, D., Cunningham, H.: Using parallel texts to improve recall in IE. In: Recent Advances in Natural Language Processing: Selected Papers from RANLP 2003, John Benjamins, Amsterdam (2003) (in press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wood, M.M., Lydon, S.J., Tablan, V., Maynard, D., Cunningham, H. (2004). Populating a Database from Parallel Texts Using Ontology-Based Information Extraction. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27779-8_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22564-5

  • Online ISBN: 978-3-540-27779-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics