Abstract
XML data is explosively increasing, and a large amount of XML data, in which similar contents are described using different tag names and structures, have been emerging as a consequence. In such a situation, one cannot write a query against such XML data unless he/she knows the structure of the data. In this research, we propose a scheme to cope with this problem. Specifically, we expand XPath queries by replacing tag names with similar ones with the help of ontologies. In addition, we try to realize (structural) proximity matching of path expressions using edit similarity, which is a similarity measure based on edit distance. We also discuss application of SSJoin, which is an operator to support similarity joins in relational database systems, for speeding up the proposed scheme. We finally show the effectiveness of the proposed method by a series of experimentations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W3C: Extensible Markup Language (XML) 1.0, 3rd edn., Recommendation (April 2004), http://www.w3.org/TR/xml/
W3C: XML Path Language (XPath) Version 1.0. Recommendation (November 1999), http://www.w3.org/TR/xpath.html
W3C: XSL Transformations (XSLT) Version 1.0. Recommendation (November 1999), http://www.w3.org/TR/xslt
W3C: XQuery 1.0: An XML Query Language. Recommendation (January 2007), http://www.w3.org/TR/xquery/
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: Proc. ICDE 2006, p. 5 (2006)
Cohen, W.W.: Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems (TOIS) 18(3), 288–321 (2000)
Liang, W., Yokota, H.: A path-sequence based discrimination for subtree matching in approximate XML joins. In: Proc. The 2nd Int’l Special Workshop on Databases for Next-Generation Researchers (SWOD), p. 116 (2006)
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Jeffery, K.G., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002)
Zhang, K., Shasha, D.: 11. In: Tree pattern matching. Pattern Matching Algorithms, Oxford University Press, Oxford (1997)
WordNet a lexical database for the English language, http://wordnet.princeton.edu/
The Gene Ontology project, http://www.geneontology.org/
RDF/OWL Representation of WordNet (2006), http://www.w3.org/,/03/wn/wn20/
W3C: Resource Description Framework (RDF): Concepts and Abstract Syntax (February 2004) Recommendation (2004), http://www.w3.org/TR/,/REC-rdf-concepts-20040210/
W3C: SPARQL Query Language for RDF, Working Draft (October 2006), http://www.w3.org/TR/rdf-sparql-query/
Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: Looking Forward. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 109–127. Springer, Heidelberg (2002)
XBench – A Family of Benchmarks for XML DBMSs, http://se.uwaterloo.ca/~ddbms/projects/xbench/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amagasa, T., Wen, L., Kitagawa, H. (2007). Proximity Search of XML Data Using Ontology and XPath Edit Similarity. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-74469-6_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74467-2
Online ISBN: 978-3-540-74469-6
eBook Packages: Computer ScienceComputer Science (R0)