Skip to main content

Combining Information Extraction and Data Integration in the ESTEST System

  • Conference paper
Software and Data Technologies (ICSOFT 2006)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 10))

Included in the following conference series:

Abstract

We describe an approach which builds on techniques from Data Integration and Information Extraction in order to make better use of the unstructured data found in application domains such as the Semantic Web which require the integration of information from structured data sources, ontologies and text. We describe the design and implementation of the ESTEST system which integrates available structured and semi-structured data sources into a virtual global schema which is used to partially configure an information extraction process. The information extracted from the text is merged with this virtual global database and is available for query processing over the entire integrated resource. As a result of this semantic integration, new queries can now be answered which would not be possible from the structured and semi-structured data alone. We give some experimental results from the ESTEST system in use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bairoch, A., Boeckmann, B., Ferro, S., Gasteiger, E.: Swiss-Prot: Juggling between evolution and stability. Brief. Bioinform. 5, 39–55 (2000)

    Article  Google Scholar 

  2. Lenzerini, M.: Data Integration: A Theorectical Perspective. In: Proc. PODS 2002, pp. 247–258 (2002)

    Google Scholar 

  3. Halevy, A.Y.: Data Integration: A Status Report. In: Weikum, G., Schöning, H., Rahm, E. (eds.) BTW, GI. LNI, vol. 26, pp. 24–29 (2003)

    Google Scholar 

  4. Appelt, D.: An introduction to Information Extraction. Artificial Intelligence Communications 12, 161–172 (1999)

    Google Scholar 

  5. AutoMed Project (2006), http://www.doc.ic.ac.uk/automed/

  6. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  7. Poulovassilis, A.: A tutorial on the IQL query language. Technical report, AutoMed Project (2004)

    Google Scholar 

  8. McBrien, P., Poulovassilis, A.: Data integration by bi-directional schema transformation rules. In: Proc. ICDE 2003, pp. 227–238 (2003)

    Google Scholar 

  9. Lassila, O., Swick, R.: Resource description framework (RDF) model and syntax specification. W3C Recommendation (1999), http://www.w3.org/TR/REC-rdf-syntax/

  10. Brickley, D., Guha, R.: RDF vocabulary description language 1.0: RDF schema. W3C Recommendation (2004), http://www.w3.org/TR/rdf-schema/

  11. McBride, B.: Jena: A semantic web toolkit. IEEE Internet Computing 6, 55–59 (2002)

    Article  Google Scholar 

  12. Cunningham, H., Maynard, D., Tablan, V.: JAPE: a Java Annotation Patterns Engine. In: Research memorandum, 2nd edn. University of Sheffield (2000)

    Google Scholar 

  13. Williams, D., Poulovassilis, A.: An example of the ESTEST approach to combining unstructured text and structured data. In: Proc. of the Database and Expert Systems Applications (DEXA 2004), pp. 191–195, IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  14. Williams, D.: Combining data integration and information extraction techniques. In: Proc. Workshop on Data Mining and Knowledge Discovery, at BNCOD 2005, pp. 96–101 (2005)

    Google Scholar 

  15. UK Department for Transport: Stats20: Instructions for the completion of road accident report form (1999), http://www.dft.gov.uk

  16. Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  17. Tan, A.H.: Text mining: The state of the art and the challanges. In: Proc. of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, pp. 65–70 (1999)

    Google Scholar 

  18. Nahm, U.Y., R.M.: Using Information Extraction to aid the discovery of prediction rules from text. In: Proc. of the KDD-2000 Workshop on text Mining, pp. 51–58 (2000)

    Google Scholar 

  19. Cunningham, H., Bontcheva, K., Li, Y.: Knowledge Management and Human Language: Crossing the Chasm. Journal of Knowledge Management 9, 108–131 (2005)

    Article  Google Scholar 

  20. Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering 10, 349–373 (2004)

    Article  Google Scholar 

  21. Kiryakov, A., Popov, B., Ognyanoff, D., Manov, D., Kirilov, A., Goranov, M.: Semantic Annotation, Indexing, and Retrieval. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 484–499. Springer, Heidelberg (2003)

    Google Scholar 

  22. Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D., Kirilov, A.: KIM - a semantic platform for information extraction and retrieval. Nat. Lang. Eng. 10, 375–392 (2004)

    Article  Google Scholar 

  23. Wu, J., Heydecker, B.: Natural language understanding in road accident data analysis. Advances in Engineering Software 29, 599–610 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Joaquim Filipe Boris Shishkov Markus Helfert

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Williams, D., Poulovassilis, A. (2008). Combining Information Extraction and Data Integration in the ESTEST System. In: Filipe, J., Shishkov, B., Helfert, M. (eds) Software and Data Technologies. ICSOFT 2006. Communications in Computer and Information Science, vol 10. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70621-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70621-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70619-9

  • Online ISBN: 978-3-540-70621-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics