Advertisement

Efficient Quantile Retrieval on Multi-dimensional Data

  • Man Lung Yiu
  • Nikos Mamoulis
  • Yufei Tao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Given a set of N multi-dimensional points, we study the computation of φ-quantiles according to a ranking function F, which is provided by the user at runtime. Specifically, F computes a score based on the coordinates of each point; our objective is to report the object whose score is the φN-th smallest in the dataset. φ-quantiles provide a succinct summary about the F-distribution of the underlying data, which is useful for online decision support, data mining, selectivity estimation, query optimization, etc. Assuming that the dataset is indexed by a spatial access method, we propose several algorithms for retrieving a quantile efficiently. Analytical and experimental results demonstrate that a branch-and-bound method is highly effective in practice, outperforming alternative approaches by a significant factor.

Keywords

Ranking Function Query Point Multidimensional Data Skyline Query Range Count 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alsabti, K., Ranka, S., Singh, V.: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. In: VLDB (1997)Google Scholar
  2. 2.
    Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Sampling Algorithms: Lower Bounds and Applications. In: STOC (2001)Google Scholar
  3. 3.
    Blum, M., Floyd, R.W., Pratt, V.R., Rivest, R.L., Tarjan, R.E.: Time Bounds for Selection. J. Comput. Syst. Sci. 7, 448–461 (1973)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient Processing of Spatial Joins Using RTrees. In: SIGMOD (1993)Google Scholar
  5. 5.
    Clarkson, K., Eppstein, D., Miller, G., Sturtivant, C., Teng, S.-H.: Approximating Center Points with Iterated Radon Points. Int. J. Comp. Geom. and Appl. 6(3), 357–377 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Effective Computation of Biased Quantiles over Data Streams. In: ICDE (2005)Google Scholar
  7. 7.
    Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. In: PODS (2001)Google Scholar
  8. 8.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: VLDB (2002)Google Scholar
  9. 9.
    Greenwald, M., Khanna, S.: Space-Efficient Online Computation of Quantile Summaries. In: SIGMOD (2001)Google Scholar
  10. 10.
    Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD (1984)Google Scholar
  11. 11.
    Hjaltason, G.R., Samet, H.: Distance Browsing in Spatial Databases. TODS 24(2), 265–318 (1999)CrossRefGoogle Scholar
  12. 12.
    Jadhav, S., Mukhopadhyay, A.: Computing a Centerpoint of a Finite Planar Set of Points in Linear Time. In: ACM Symposium on Computational Geometry (1993)Google Scholar
  13. 13.
    Lazaridis, I., Mehrotra, S.: Progressive approximate aggregate queries with a multiresolution tree structure. In: SIGMOD (2001)Google Scholar
  14. 14.
    Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate Medians and other Quantiles in One Pass and with Limited Memory. In: SIGMOD (1998)Google Scholar
  15. 15.
    Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. In: SIGMOD (1999)Google Scholar
  16. 16.
    Munro, J.I., Paterson, M.: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12, 315–323 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP operations in spatial data warehouses. In: Jensen, C.S., Schneider, M., Seeger, B., Tsotras, V.J. (eds.) SSTD 2001. LNCS, vol. 2121, p. 443. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD (2003)Google Scholar
  19. 19.
    Paterson, M.: Progress in selection. Technical Report, University of Warwick, Conventry, UK (1997)Google Scholar
  20. 20.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. second edition. Cambridge University Press, Cambridge (1992)zbMATHGoogle Scholar
  21. 21.
    Stanoi, I., Riedewald, M., Agrawal, D., Abbadi, A.E.: Discovery of Influence Sets in Frequently Updated Databases. In: VLDB (2001)Google Scholar
  22. 22.
    Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic Multidimensional Histograms. In: SIGMOD (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Man Lung Yiu
    • 1
  • Nikos Mamoulis
    • 1
  • Yufei Tao
    • 2
  1. 1.Department of Computer ScienceUniversity of Hong KongHong Kong
  2. 2.Department of Computer ScienceCity University of Hong KongKowloon, Hong Kong

Personalised recommendations