Efficient Quantile Retrieval on Multi-dimensional Data

  • Man Lung Yiu
  • Nikos Mamoulis
  • Yufei Tao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Given a set of N multi-dimensional points, we study the computation of φ-quantiles according to a ranking function F, which is provided by the user at runtime. Specifically, F computes a score based on the coordinates of each point; our objective is to report the object whose score is the φN-th smallest in the dataset. φ-quantiles provide a succinct summary about the F-distribution of the underlying data, which is useful for online decision support, data mining, selectivity estimation, query optimization, etc. Assuming that the dataset is indexed by a spatial access method, we propose several algorithms for retrieving a quantile efficiently. Analytical and experimental results demonstrate that a branch-and-bound method is highly effective in practice, outperforming alternative approaches by a significant factor.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alsabti, K., Ranka, S., Singh, V.: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. In: VLDB (1997)Google Scholar
  2. 2.
    Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Sampling Algorithms: Lower Bounds and Applications. In: STOC (2001)Google Scholar
  3. 3.
    Blum, M., Floyd, R.W., Pratt, V.R., Rivest, R.L., Tarjan, R.E.: Time Bounds for Selection. J. Comput. Syst. Sci. 7, 448–461 (1973)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient Processing of Spatial Joins Using RTrees. In: SIGMOD (1993)Google Scholar
  5. 5.
    Clarkson, K., Eppstein, D., Miller, G., Sturtivant, C., Teng, S.-H.: Approximating Center Points with Iterated Radon Points. Int. J. Comp. Geom. and Appl. 6(3), 357–377 (1996)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Effective Computation of Biased Quantiles over Data Streams. In: ICDE (2005)Google Scholar
  7. 7.
    Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. In: PODS (2001)Google Scholar
  8. 8.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: VLDB (2002)Google Scholar
  9. 9.
    Greenwald, M., Khanna, S.: Space-Efficient Online Computation of Quantile Summaries. In: SIGMOD (2001)Google Scholar
  10. 10.
    Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD (1984)Google Scholar
  11. 11.
    Hjaltason, G.R., Samet, H.: Distance Browsing in Spatial Databases. TODS 24(2), 265–318 (1999)CrossRefGoogle Scholar
  12. 12.
    Jadhav, S., Mukhopadhyay, A.: Computing a Centerpoint of a Finite Planar Set of Points in Linear Time. In: ACM Symposium on Computational Geometry (1993)Google Scholar
  13. 13.
    Lazaridis, I., Mehrotra, S.: Progressive approximate aggregate queries with a multiresolution tree structure. In: SIGMOD (2001)Google Scholar
  14. 14.
    Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate Medians and other Quantiles in One Pass and with Limited Memory. In: SIGMOD (1998)Google Scholar
  15. 15.
    Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. In: SIGMOD (1999)Google Scholar
  16. 16.
    Munro, J.I., Paterson, M.: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12, 315–323 (1980)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP operations in spatial data warehouses. In: Jensen, C.S., Schneider, M., Seeger, B., Tsotras, V.J. (eds.) SSTD 2001. LNCS, vol. 2121, p. 443. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD (2003)Google Scholar
  19. 19.
    Paterson, M.: Progress in selection. Technical Report, University of Warwick, Conventry, UK (1997)Google Scholar
  20. 20.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. second edition. Cambridge University Press, Cambridge (1992)MATHGoogle Scholar
  21. 21.
    Stanoi, I., Riedewald, M., Agrawal, D., Abbadi, A.E.: Discovery of Influence Sets in Frequently Updated Databases. In: VLDB (2001)Google Scholar
  22. 22.
    Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic Multidimensional Histograms. In: SIGMOD (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Man Lung Yiu
    • 1
  • Nikos Mamoulis
    • 1
  • Yufei Tao
    • 2
  1. 1.Department of Computer ScienceUniversity of Hong KongHong Kong
  2. 2.Department of Computer ScienceCity University of Hong KongKowloon, Hong Kong

Personalised recommendations