The VLDB Journal

, Volume 18, Issue 2, pp 407–427 | Cite as

Anytime measures for top-k algorithms on exact and fuzzy data sets

  • Benjamin Arai
  • Gautam Das
  • Dimitrios Gunopulos
  • Nick Koudas
Special Issue Paper

Abstract

Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this article, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct result at any point during the algorithms’ execution. We adopt a probabilistic approach where we seek to report at any point of operation of the algorithm the confidence that the top-k result has been identified. Such a functionality can be a valuable asset when one is interested in reducing the runtime cost of top-k computations. We present a thorough experimental evaluation to validate our techniques using both synthetic and real data sets.

Keywords

Approximate query Anytime Top-k Fuzzy data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barbará D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)CrossRefGoogle Scholar
  2. 2.
    Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. TODS 27(2) (2002)Google Scholar
  3. 3.
    Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web accessible databases. In: Proceedings of ICDE, April 2002Google Scholar
  4. 4.
    Chang, K., Hwang, S.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD (2002)Google Scholar
  5. 5.
    Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: VLDB, pp. 397–410 (1999)Google Scholar
  6. 6.
    Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)Google Scholar
  7. 7.
    Cheng, R., Kalashnikov, D., Prabhakar, S.: Querying imprecise data in moving object environments. In: IEEE TKDE (2004)Google Scholar
  8. 8.
    Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)Google Scholar
  9. 9.
    chi Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M.L., Smith, J.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM SIGMOD, pp. 391–402 (2000)Google Scholar
  10. 10.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search metric spaces. In: Proceedings of VLDB, pp. 426–435, August 1997Google Scholar
  11. 11.
    Dean, T., Boddy, M.: An analysis of time dependent planning. In: Proceedings of the National Conference on AI (1988)Google Scholar
  12. 12.
    Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)Google Scholar
  13. 13.
    Donjerkovic, D., Ramakrishnan, R.: Probabilistic optimization of top-N queries. In: Proceedings of VLDB, August 1999Google Scholar
  14. 14.
    Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226, June 1996Google Scholar
  15. 15.
    Fagin, R.: Fuzzy queries in multimedia database systems. In: PODS, pp. 1–10, June 1998Google Scholar
  16. 16.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, June 2001Google Scholar
  17. 17.
    Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)MATHMathSciNetGoogle Scholar
  18. 18.
    Fagin, R., Wimmers, E.: Incorporating user preferences in multimedia queries. In: ICDT, pp. 247–261, Jan 1997Google Scholar
  19. 19.
    Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: SIGMOD, pp. 463–474 (2000)Google Scholar
  20. 20.
    Guntzer, U., Balke, W.-T., Kiesling, W.: Optimizing multi-feature queries for image databases. VLDB J. 419–428 (2000)Google Scholar
  21. 21.
    Horvitz, E.: Reasoning about belifs and actions under computational resource constraints. In: Proceedings of the Third Workshop on Uncertainy in AI (1987)Google Scholar
  22. 22.
    Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD Conference, pp. 259–270 (2001)Google Scholar
  23. 23.
    Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data. In: ICDE (2008)Google Scholar
  24. 24.
    Ilyas I.F., Aref W.G., Elmagarmid A.K.: Supporting Top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)CrossRefGoogle Scholar
  25. 25.
    Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)CrossRefGoogle Scholar
  26. 26.
    Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: EDBT (2008)Google Scholar
  27. 27.
    Marian, A., Bruno, N., Gravano, L.: Evaluating Top-k Queries Over Web Accesible Sources. TODS 29(2) (2004)Google Scholar
  28. 28.
    Mohamed Soliman, K.C.C.: Ihab Ilyas. Top-k query processing in uncertain databases. In: ICDE (2007)Google Scholar
  29. 29.
    Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB ’01: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 281–290 (2001)Google Scholar
  30. 30.
    Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)Google Scholar
  31. 31.
    Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB J. 486–495 (1997)Google Scholar
  32. 32.
    Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)Google Scholar
  33. 33.
    Tao, Y., Cheng, R., Xiao, X., Ngai, W., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density. In: VLDB (2005)Google Scholar
  34. 34.
    Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of VLDB (2004)Google Scholar
  35. 35.
    Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE (2003)Google Scholar
  36. 36.
    Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE (2008)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Benjamin Arai
    • 1
  • Gautam Das
    • 2
  • Dimitrios Gunopulos
    • 3
  • Nick Koudas
    • 4
  1. 1.University of CaliforniaRiversideUSA
  2. 2.University of TexasArlingtonUSA
  3. 3.University of AthensAthensGreece
  4. 4.University of TorontoTorontoCanada

Personalised recommendations