Advertisement

Tree String Path Subsequences Automaton and Its Use for Indexing XML Documents

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 563)

Abstract

The theory of indexing texts is well-researched, which does not hold for indexing other data structures, such as trees for example. In this paper a simple method of indexing a tree for subsequences of string paths in the tree by finite automaton is presented. The use of the index is shown on indexing XML documents for XPath descendant-or-self axis inspired queries. Given a subject tree \(\mathcal{T}\) with n nodes, the tree is preprocessed and an index, which is a directed acyclic subsequence graph for a set of strings, is constructed. The searching phase uses the index, reads an input string path subsequence \(\mathcal{Q}\) inspired by the specific XPath query of size m and computes the list of positions of all occurrences of \(\mathcal{Q}\) in the tree \(\mathcal{T}\). The searching is performed in time \(\mathcal {O}(m)\) and does not depend on n. Although the number of distinct valid queries is \(\mathcal {O}(2^n)\), the size of the index is \(\mathcal {O}(h^k)\), where h is the height of the tree \(\mathcal{T}\) and k is the number of its leaves. Moreover, we discuss that in the case of indexing a common XML document the size of the index is even smaller \(\mathcal {O}(h \cdot 2^k)\).

Keywords

Finite Automaton Input Query Suffix Array XPath Query Building Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Baeza-Yates, R.A.: Searching subsequences. Theoret. Comput. Sci. 78(2), 363–376 (1991)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.I.: The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Buneman, P., Davidson, S.B., Fan, W., Hara, C., Tan, W.-C.: Reasoning about Keys for XML. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 133–148. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  4. 4.
    Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for xml data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 121–132. ACM, New York (2002)Google Scholar
  5. 5.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007) CrossRefMATHGoogle Scholar
  6. 6.
    Crochemore, M., Melichar, B., Tronicek, Z.: Directed acyclic subsequence graph–Overview. J. Discrete Algorithms 1(3–4), 255–280 (2003)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994) MATHGoogle Scholar
  8. 8.
    Crochemore, M., Troníček, Z.: On the size of DASG for multiple texts. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 58–64. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  9. 9.
    Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases (1997)Google Scholar
  10. 10.
    Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: Online construction of subsequence automata for multiple texts. In: Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000. Proceedings, pp. 146–152 (2000)Google Scholar
  11. 11.
    Janoušek, J., Melichar, B., Polách, R., Poliak, M., Trávníček, J.: A full and linear index of a tree for tree patterns. In: Jürgensen, H., Karhumäki, J., Okhotin, A. (eds.) DCFS 2014. LNCS, vol. 8614, pp. 198–209. Springer, Heidelberg (2014) Google Scholar
  12. 12.
    Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 133–144. ACM, New York (2002)Google Scholar
  13. 13.
    Li, Q., Moon, B.: Indexing and querying xml data for regular path expressions. In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB 2001, pp. 361–370. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
  14. 14.
    Melichar, B., Janoušek, J., Flouri, T.: Arbology: trees and pushdown automata. Kybernetika 48(3), 402–428 (2012)MathSciNetMATHGoogle Scholar
  15. 15.
    Miklau, G., Suciu, D.: Containment and equivalence for an xpath fragment. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2002, pp. 65–76. ACM, New York (2002)Google Scholar
  16. 16.
    Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. J. ACM 51(1), 2–45 (2004)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Milo, T.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  18. 18.
    Mark Pettovello, P., Fotouhi, F.: Mtree: an xml xpath graph index. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 474–481. ACM, New York (2006)Google Scholar
  19. 19.
    Rao, P., Moon, B.: Prix: indexing and querying xml using prufer sequences. In: 20th International Conference on Data Engineering, 2004. Proceedings, pp. 288–299, March 2004Google Scholar
  20. 20.
    Tang, N., Yu, J.X., Ozsu, M.T., Wong, K.-F.: Hierarchical indexing approach to support xpath queries. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1510–1512, April 2008Google Scholar
  21. 21.
    Šestáková, E.: Indexing XML documents. Master’s thesis, Czech Technical University in Prague, Faculty of Information Technology, Prague (2015)Google Scholar
  22. 22.
    Wang, H., Park, S., Fan, W., Yu, P.S.: Vist: a dynamic index method for querying xml data by tree structures. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 110–121. ACM, New York (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Theoretical Computer Science, Faculty of Information TechnologyCzech Technical University in PraguePrague 6Czech Republic

Personalised recommendations