One-dimensional and multi-dimensional substring selectivity estimation

Jagadish, H.V.; Kapitskaia, Olga; Ng, Raymond T.; Srivastava, Divesh

doi:10.1007/s007780000029

One-dimensional and multi-dimensional substring selectivity estimation

Regular contribution
Published: December 2000

Volume 9, pages 214–230, (2000)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

H.V. Jagadish¹,
Olga Kapitskaia²,
Raymond T. Ng³ &
…
Divesh Srivastava⁴

106 Accesses
19 Citations
3 Altmetric
Explore all metrics

Abstract.

With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). We then develop and analyze two 1-D estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given PST. For the k-D problem, we first generalize PSTs to multiple dimensions and develop a space- and time-efficient probabilistic algorithm to construct k-D PSTs directly. We then show how to extend MO to multiple dimensions. Finally, we demonstrate, both analytically and experimentally, that MO is both practical and substantially superior to competing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

University of Michigan, Ann Arbor; E-mail: jag@umich.edu, , , , , , US
H.V. Jagadish
Pôle Universitaire Léonard de Vinci; E-mail: Olga.Kapitskaia@devinci.fr, , , , , , FR
Olga Kapitskaia
University of British Columbia; E-mail: rng@cs.ubc.ca, , , , , , CA
Raymond T. Ng
AT&T Labs – Research, 180 Park Avenue, Bldg 103, Florham Park, NJ 07932, USA; E-mail: divesh@research.att.com, , , , , , US
Divesh Srivastava

Authors

H.V. Jagadish
View author publications
You can also search for this author in PubMed Google Scholar
Olga Kapitskaia
View author publications
You can also search for this author in PubMed Google Scholar
Raymond T. Ng
View author publications
You can also search for this author in PubMed Google Scholar
Divesh Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received April 28, 2000 / Accepted July 11, 2000

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jagadish, H., Kapitskaia, O., Ng, R. et al. One-dimensional and multi-dimensional substring selectivity estimation. The VLDB Journal 9, 214–230 (2000). https://doi.org/10.1007/s007780000029

Download citation

Issue Date: December 2000
DOI: https://doi.org/10.1007/s007780000029

Key words: String selectivity – Maximal overlap – Short memory property – Pruned count-suffix tree

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One-dimensional and multi-dimensional substring selectivity estimation

Abstract.

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

The Hadamard decomposition problem

Data dependencies for query optimization: a survey

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Navigation

One-dimensional and multi-dimensional substring selectivity estimation

Abstract.

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

The Hadamard decomposition problem

Data dependencies for query optimization: a survey

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation