Abstract
In large-scale Internet-based distributed systems, participants (consumers and providers) are typically autonomous, i.e. they may have special interests towards queries and other participants. In this context, a way to avoid a participant to voluntarily leave the system is satisfying its interests when allocating queries. However, participants satisfaction may also be negatively affected by the failures of other participants. Query replication is a solution to deal with providers failures, but, it is challenging because of autonomy: it cannot only quickly overload the system, but also it can dissatisfy participants with uninteresting queries. Thus, a natural question arises: should queries be replicated? If so, which ones? and how many times? In this paper, we answer these questions by revisiting query replication from a satisfaction and probabilistic point of view. We propose a new algorithm, called S b QR, that decides on-the-fly whether a query should be replicated and at which rate. As replicating a large number of queries might overload the system, we propose a variant of our algorithm, called S b QR+. The idea is to voluntarily fail to allocate as many replicas as required by consumers for low critical queries so as to keep resources for high critical queries during query-intensive periods. Our experimental results demonstrate that our algorithms significantly outperform the baseline algorithms from both the performance and satisfaction points of view. We also show that our algorithms automatically adapt to the criticality of queries and different rates of participant failures.
Similar content being viewed by others
References
Assayad, I., Girault, A., Kalla, H.: A Bi-criteria scheduling heuristics for distributed embedded systems under reliability and real-time constraints. In: DSN (2004)
Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Database Syst. 33(1), 1–44 (2008)
Berten, V., Goossens, J., Jeannot, E.: A probabilistic approach for fault tolerant multiproc. Real-time scheduling. In: IPDPS (2006)
Bhagwan, R., Savage, S., Voelker, G.M.: Understanding availability. In: IPTPS (2003)
Budhiraja, N., Marzullo, K., Schneider, F., Toueg, S.: The primary-backup approach. In: Distributed Systems, pp. 199–216. ACM Press, New York (1993)
Castro, M., Costa, M., Rowstron, A.: Performance and dependability of structured peer-to-peer overlays. In: DSN (2004)
Chandramouli, B., Bond, C., Babu, S., Yang, J.: Query suspend and resume. In: SIGMOD (2007)
Ghosh, S., Melhem, R., Mossé, D.: Fault-tolerant scheduling on a hard real-time multiprocessor system. In: IPDPS (1994)
Girault, A., Kalla, H., Sorel, Y.: An active replication scheme that tolerates failures in dist. Embedded real-time systems. In: DSN (2003)
Hashimoto, K., Tsuchiya, T., Kikuno, T.: Effective scheduling of duplicated tasks for fault-tolerance in multiprocessor systems. IEICE Trans. Inf. Syst. E85-D(3) (2002)
Kim, J., Lee, H., Lee, S.: Replicated process allocation for load distribution in fault-tolerant multicomputers. IEEE Comput. 46(4) (1997)
Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)
Ni, L., Harwood, A.: A comparative study on peer-to-peer failure rate estimation. In: ICPADS (2007)
Pentaris, F., Ioannidis, Y.: Query optimization in distributed networks of autonomous database systems. ACM Trans. Database Syst. 31(2) (2006)
Pieper, S., Paul, J., Schulte, M.: A new era of performance evaluation. IEEE Comput. 40(9) (2007)
Qu, H., Labrinidis, A., Mosse, D.: UNIT: user-centric transaction management in web-database systems. In: ICDE (2006)
Quiané-Ruiz, J.A., Lamarre, P., Cazalens, S., Valduriez, P.: Scaling up query allocation in the presence of autonomous participants. In: DASFAA (2011)
Quiané-Ruiz, J.A., Lamarre, P., Valduriez, P.: A self-adaptable query allocation framework for distributed information systems. VLDB J. 18(3), 649–674 (2009)
Rahm, E., Marek, R.: Dynamic multi-resource load balancing in parallel DB systems. In: VLDB (1995)
Roussopoulos, M., Baker, M.: Practical load balancing for content requests in P2P networks. Distrib. Comput. 18(6) (2006)
Saroiu, S., Gummadi, P.K., Gribble, S.D.: Measuring and analyzing the characteristics of Napters and Gnutella hosts. Multimed. Syst. 9(2), 170–184 (2003)
Schneider, F.: Replication management using the state-machine approach. In: Distributed Systems, pp. 169–197. ACM Press, New York (1993)
Shatz, S., Wang, J.P., Goto, M.: Task alloc. for maximizing reliability of dist. com. systems. IEEE Comput. 41(9) (1992)
Wolf, G., et al.: Query processing over incomplete autonomous databases. In: VLDB (2007)
Yu, H., Vahdat, A.: The costs and limits of availability for replicated services. In: SOSP (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Tamer Özsu.
Rights and permissions
About this article
Cite this article
Quiané-Ruiz, JA., Lamarre, P. & Valduriez, P. Satisfaction-based query replication. Distrib Parallel Databases 30, 1–26 (2012). https://doi.org/10.1007/s10619-011-7086-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-011-7086-7