Skip to main content

iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text

  • Conference paper
KI 2009: Advances in Artificial Intelligence (KI 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5803))

Included in the following conference series:

Abstract

Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontology-based information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument’s ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atzmüller, M., Klügl, P., Puppe, F.: Rule-Based Information Extraction for Structured Data Acquisition using TextMarker. In: Proc. LWA 2008 (Special Track on Knowledge Discovery and Machine Learning) (2008)

    Google Scholar 

  2. Ireson, N., Ciravegna, F., Califf, M.E., Freitag, D., Kushmerick, N., Lavelli, A.: Evaluating Machine Learning for Information Extraction. In: Raedt, L.D., Wrobel, S. (eds.) ICML. ACM Int. Conf. Proc. Series, vol. 119, pp. 345–352. ACM, New York (2005)

    Google Scholar 

  3. Brickley, D., Guha, R.V.: RDF Vocabulary Description Language 1.0: RDF Schema. W3C recommendation, World Wide Web Consortium (2004)

    Google Scholar 

  4. Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to meet new challenges in language engineering. JNLE 10(3-4), 349–373 (2004)

    Google Scholar 

  5. Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based Information Extraction and Integration from Heterogeneous Data Sources. Int. Journal of Human-Computer Studies (11), 759–788 (2008)

    Google Scholar 

  6. Endres-Niggemeyer, B., Jauris-Heipke, S., Pinsky, M., Ulbricht, U.: Wissen gewinnen durch Wissen: Ontologiebasierte Informationsextraktion. Information - Wissenschaft & Praxis 57(1), 301–308 (2006)

    Google Scholar 

  7. Embley, D.W., Campbell, D.M., Smith, R.D., Liddle, S.W.: Ontology-based Extraction and Structuring of Information from Data-Rich Unstructured Documents. In: CIKM 1998: Proc. of the 7th Int. Conf. on Information and Knowledge Management, pp. 52–59. ACM, New York (1998)

    Google Scholar 

  8. Sintek, M., Junker, M., van Elst, L., Abecker, A.: Using Information Extraction Rules for Extending Domain Ontologies. In: Workshop on Ontology Learning. CEUR-WS.org (2001)

    Google Scholar 

  9. Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology-based Information Extraction System. In: Szczepaniak, P., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds.) Intelligent Exploration of the Web. Springer, Berlin (2002)

    Google Scholar 

  10. Grishman, R., Sundheim, B.: Design of the MUC-6 evaluation. In: Proc. of a workshop held at Vienna, pp. 413–422. Association for Computational Linguistics, Virginia (1996)

    Google Scholar 

  11. Hobbs, J., Israel, D.: Principles of Template Design. In: HLT 1994: Proc. of the workshop on HLT, pp. 177–181. ACL, Morristown (1994)

    Google Scholar 

  12. Labský, M., Svátek, V., Nekvasil, M., Rak, D.: The Ex Project: Web Information Extraction using Extraction Ontologies. In: Proc. Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, PriCKL 2007 (2007)

    Google Scholar 

  13. Sauermann, L., van Elst, L., Dengel, A.: PIMO - a Framework for Representing Personal Information Models. In: Proc. of I-Semantics 2007, JUCS, pp. 270–277 (2007)

    Google Scholar 

  14. Adrian, B., Dengel, A.: Believing Finite-State cascades in Knowledge-based Information Extraction. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 152–159. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Grothkast, A., Adrian, B., Schumacher, K., Dengel, A.: OCAS: Ontology-Based Corpus and Annotation Scheme. In: Proc. of the HLIE Workshop 2008, ECML PKDD, pp. 25–35 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Adrian, B., Hees, J., van Elst, L., Dengel, A. (2009). iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text. In: Mertsching, B., Hund, M., Aziz, Z. (eds) KI 2009: Advances in Artificial Intelligence. KI 2009. Lecture Notes in Computer Science(), vol 5803. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04617-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04617-9_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04616-2

  • Online ISBN: 978-3-642-04617-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics