Abstract
We describe an approach which builds on techniques from Data Integration and Information Extraction in order to make better use of the unstructured data found in application domains such as the Semantic Web which require the integration of information from structured data sources, ontologies and text. We describe the design and implementation of the ESTEST system which integrates available structured and semi-structured data sources into a virtual global schema which is used to partially configure an information extraction process. The information extracted from the text is merged with this virtual global database and is available for query processing over the entire integrated resource. As a result of this semantic integration, new queries can now be answered which would not be possible from the structured and semi-structured data alone. We give some experimental results from the ESTEST system in use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bairoch, A., Boeckmann, B., Ferro, S., Gasteiger, E.: Swiss-Prot: Juggling between evolution and stability. Brief. Bioinform. 5, 39–55 (2000)
Lenzerini, M.: Data Integration: A Theorectical Perspective. In: Proc. PODS 2002, pp. 247–258 (2002)
Halevy, A.Y.: Data Integration: A Status Report. In: Weikum, G., Schöning, H., Rahm, E. (eds.) BTW, GI. LNI, vol. 26, pp. 24–29 (2003)
Appelt, D.: An introduction to Information Extraction. Artificial Intelligence Communications 12, 161–172 (1999)
AutoMed Project (2006), http://www.doc.ic.ac.uk/automed/
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)
Poulovassilis, A.: A tutorial on the IQL query language. Technical report, AutoMed Project (2004)
McBrien, P., Poulovassilis, A.: Data integration by bi-directional schema transformation rules. In: Proc. ICDE 2003, pp. 227–238 (2003)
Lassila, O., Swick, R.: Resource description framework (RDF) model and syntax specification. W3C Recommendation (1999), http://www.w3.org/TR/REC-rdf-syntax/
Brickley, D., Guha, R.: RDF vocabulary description language 1.0: RDF schema. W3C Recommendation (2004), http://www.w3.org/TR/rdf-schema/
McBride, B.: Jena: A semantic web toolkit. IEEE Internet Computing 6, 55–59 (2002)
Cunningham, H., Maynard, D., Tablan, V.: JAPE: a Java Annotation Patterns Engine. In: Research memorandum, 2nd edn. University of Sheffield (2000)
Williams, D., Poulovassilis, A.: An example of the ESTEST approach to combining unstructured text and structured data. In: Proc. of the Database and Expert Systems Applications (DEXA 2004), pp. 191–195, IEEE Computer Society, Los Alamitos (2004)
Williams, D.: Combining data integration and information extraction techniques. In: Proc. Workshop on Data Mining and Knowledge Discovery, at BNCOD 2005, pp. 96–101 (2005)
UK Department for Transport: Stats20: Instructions for the completion of road accident report form (1999), http://www.dft.gov.uk
Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press, Cambridge (1998)
Tan, A.H.: Text mining: The state of the art and the challanges. In: Proc. of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, pp. 65–70 (1999)
Nahm, U.Y., R.M.: Using Information Extraction to aid the discovery of prediction rules from text. In: Proc. of the KDD-2000 Workshop on text Mining, pp. 51–58 (2000)
Cunningham, H., Bontcheva, K., Li, Y.: Knowledge Management and Human Language: Crossing the Chasm. Journal of Knowledge Management 9, 108–131 (2005)
Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering 10, 349–373 (2004)
Kiryakov, A., Popov, B., Ognyanoff, D., Manov, D., Kirilov, A., Goranov, M.: Semantic Annotation, Indexing, and Retrieval. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 484–499. Springer, Heidelberg (2003)
Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D., Kirilov, A.: KIM - a semantic platform for information extraction and retrieval. Nat. Lang. Eng. 10, 375–392 (2004)
Wu, J., Heydecker, B.: Natural language understanding in road accident data analysis. Advances in Engineering Software 29, 599–610 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Williams, D., Poulovassilis, A. (2008). Combining Information Extraction and Data Integration in the ESTEST System. In: Filipe, J., Shishkov, B., Helfert, M. (eds) Software and Data Technologies. ICSOFT 2006. Communications in Computer and Information Science, vol 10. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70621-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-70621-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70619-9
Online ISBN: 978-3-540-70621-2
eBook Packages: Computer ScienceComputer Science (R0)