Parallelizing Skyline Queries for Scalable Distribution

  • Ping Wu
  • Caijie Zhang
  • Ying Feng
  • Ben Y. Zhao
  • Divyakant Agrawal
  • Amr El Abbadi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by leveraging content-based data partitioning. We present a novel distributed skyline query processing algorithm (DSL) that discovers skyline points progressively. We propose two mechanisms, recursive region partitioning and dynamic region encoding, to enforce a partial order on query propagation in order to pipeline query execution. Our analysis shows that DSL is optimal in terms of the total number of local query invocations across all machines. In addition, simulations and measurements of a deployed system show that our system load balances communication and processing costs across cluster machines, providing incremental scalability and significant performance improvement over alternative distribution mechanisms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Froogle data feeds. feed_instructions.html, https://www.google.com/froogle/merchants/
  2. 2.
    Yahoo! real estate, http://realestate.yahoo.com/
  3. 3.
    Anderson, T.E., Culler, D.E., Patterson, D.A.: A case for NOW (network of workstations). IEEE Micro 15(1), 54–64 (1995)CrossRefGoogle Scholar
  4. 4.
    Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: High-performance sorting on networks of workstations. In: Proc. of SIGMOD, Tucson, AZ (May 1997)Google Scholar
  5. 5.
    Balke, W.-T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 256–273. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 23(2), 22–28 (2003)CrossRefGoogle Scholar
  7. 7.
    Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of ICDE (2001)Google Scholar
  8. 8.
    Chan, C.-Y., Eng, P.-K., Tan, K.-L.: Stratified computation of skylines with partiallyordered domains. In: Proc. of SIGMOD (2005)Google Scholar
  9. 9.
    Dewitt, D., Gray, J.: Parallel database systems: The future of high performance database systems. CACM 35(6) (1992)Google Scholar
  10. 10.
    Dewitt, D., Naughton, J., Scheneider, D., Seshadri, S.: Parallel sorting on a shared-nothing architecture (1991)Google Scholar
  11. 11.
    Dewitt, D., Naughton, J., Schneider, D., Seshadri, S.: Practical skew handling in parallel joins. In: Proc. of VLDB (1992)Google Scholar
  12. 12.
    Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partioned data with applications to peer-to-peer systems. In: Proc. of VLDB (2004)Google Scholar
  13. 13.
    Godfrey, P., Shipley, R., Gryz, J.: Maximal vector computation in large data sets. In: Proc. of VLDB (2005)Google Scholar
  14. 14.
    Gupta, A., Sahin, O.D., Agrawal, D.P., El Abbadi, A.: Meghdoot: Content-based publish/Subscribe over P2P networks. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 254–273. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: Proc. of ICDE (2006)Google Scholar
  16. 16.
    Huebsch, R., Hellerstein, J.M., Boon, N.L., Loo, T., Shenker, S., Stoica, I.: Querying the internet with pier. In: Proc. of VLDB (2003)Google Scholar
  17. 17.
    Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proc. of VLDB (2002)Google Scholar
  18. 18.
    Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: Efficient skyline computation over sliding windows. In: Proc. of ICDE (2005)Google Scholar
  19. 19.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proc. of SIGMOD (2003)Google Scholar
  20. 20.
    Pei, J., Jin, W., Ester, M., Tao, Y.: Catching the best views of skyline: A semantic approach based on decisive subspaces. In: Proc. of VLDB (2005)Google Scholar
  21. 21.
    Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable contentaddressable network. In: Proc. of SIGCOMM (August 2001)Google Scholar
  22. 22.
    Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: A wide-area distributed database system. VLDB Journal 5(1) (1996)Google Scholar
  23. 23.
    Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proc. Of VLDB (2001)Google Scholar
  24. 24.
    Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: Proc. of ICDE (2006)Google Scholar
  25. 25.
    Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., Zhang, Q.: Efficient computation of the skyline cube. In: Proc. of VLDB (2005)Google Scholar
  26. 26.
    Zhou, Y., Ooi, B.C., Tan, K.-L.: Dynamic load management for distributed continous query systems. In: Proc. of ICDE (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ping Wu
    • 1
  • Caijie Zhang
    • 1
  • Ying Feng
    • 1
  • Ben Y. Zhao
    • 1
  • Divyakant Agrawal
    • 1
  • Amr El Abbadi
    • 1
  1. 1.University of California at Santa Barbara 

Personalised recommendations