Advertisement

Knowledge and Information Systems

, Volume 41, Issue 2, pp 379–400 | Cite as

AsyIter: tolerating computational skew of synchronous iterative applications via computing decomposition

  • Yu Zhang
  • Xiaofei Liao
  • Hai JinEmail author
  • Bing Bing Zhou
Regular Paper

Abstract

Iterative computing is pervasive in web applications, data mining and scientific computing. Many parallel algorithms for such applications are synchronous algorithms which need strict synchronization between iterations to ensure their correctness, making the performance sensitive to computational skews in each iteration. Current load balancing approaches may alleviate the effect of computational skew, but cannot completely solve the problem. As a result, for many applications, the skews in each iteration still exist and they are accumulated, seriously affecting the completion time of these applications. In this paper, we propose an effective approach to make synchronous iterative computing applications themselves have the ability to tolerate the negative effects of unresolved computational skews. This approach divides a large computational task in a computing node or worker into a number of sub-tasks which only depend on the states of a few objects from the previous iteration. This allows the sub-tasks in subsequent iterations to proceed in advance whenever the states of related data objects are available. Consequently, the idle time caused by strict synchronization is reduced and the overall performance is thus enhanced. Experimental results show that this approach can improve the overall performance by up to \(2.45\times \) in comparison with the state-of-the-art approaches.

Keywords

Synchronous iterative applications Computational skew  Skew tolerance Computing decomposition 

Notes

Acknowledgments

This work was supported by National High-tech Research and Development Program of China (863 Program) under Grant No. 2012AA010905, China National Natural Science Foundation under Grant No. 61322210, 61272408, Doctoral Fund of Ministry of Education of China under Grant No. 20130142110048 and Natural Science Foundation of Hubei under Grant No. 2012FFA007.

References

  1. 1.
    Zhang Y, Gao Q, Gao L, Wang C (2012) Accelerate large-scale iterative computation through asynchronous accumulative updates. In: Proceedings of the 3rd workshop on Scientific Cloud Computing Date. ACM, Delft, Netherlands, pp 13–22Google Scholar
  2. 2.
    Kambatla K, Rapolu N, Jagannathan S, Grama A (2010) Asynchronous algorithms in mapreduce. In Proceedings of the 2010 IEEE international conference on cluster computing. IEEE Computer society, Heraklion, Crete, Greece, pp 245–254Google Scholar
  3. 3.
    Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. In: Proceedings of the 26th conference on uncertainty in artificial intelligence. AUAI, Los Angeles, CA, USA, pp 1–10Google Scholar
  4. 4.
    Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727CrossRefGoogle Scholar
  5. 5.
    Zhang Y, Gao Q, Gao L, Wang C (2011) Priter: a distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM symposium on cloud computing. ACM, Cascais, Portugal, pp 1–13Google Scholar
  6. 6.
    Byna S, Chou J, Rübel O, Karimabadi H, Daughton WS, Roytershteyn V, Bethel E, Howison M, Hsu K-J, Lin K-W et al (2012) Parallel i/o, analysis, and visualization of a trillion particle simulation. In: Proceedings of the 2012 international conference on high performance computing, networking, storage and analysis. IEEE Computer society, Salt Lake City, Utah, USA, pp 1–12Google Scholar
  7. 7.
    Banerjee S, Agarwal N (2012) Analyzing collective behavior from blogs using swarm intelligence. Knowl Inf Syst 33(3):523–547CrossRefGoogle Scholar
  8. 8.
    Wang G, Salles MV, Sowell B, Wang X, Cao T, Demers A, Gehrke J, White W (2010) Behavioral simulations in mapreduce. Proc VLDB Endow 3(1):952–963CrossRefGoogle Scholar
  9. 9.
    Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892CrossRefGoogle Scholar
  10. 10.
    Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041CrossRefGoogle Scholar
  11. 11.
    Alimi J-M, Bouillot V, Rasera Y, Reverdy V, Corasaniti P-S, Balmes I, Requena S, Delaruelle X, Richet J-N (2012) First-ever full observable universe simulation. In: Proceedings of the 2012 international conference on high performance computing, networking, storage and analysis. IEEE Computer society, Salt Lake City, Utah, USA, pp 1–11Google Scholar
  12. 12.
    Makino J, Daisaka H (2012) Grape-8: An accelerator for gravitational n-body simulation with 20.5gflops/w performance. In Proceedings of the 2012 international conference on high performance computing, networking, storage and analysis. IEEE Computer society, Salt Lake City, Utah, USA, pp 1–10Google Scholar
  13. 13.
    Kwon Y, Balazinska M, Howe B, Rolia J (2010) Skew-resistant parallel processing of feature-extracting scientific user-defined functions. In Proceedings of the 1st ACM symposium on Cloud computing. ACM, Indianapolis, IN, USA, pp 75–86Google Scholar
  14. 14.
    Lifflander J, Krishnamoorthy S, Kale LV (2012) Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st international ACM symposium on high-performance parallel and distributed computing. ACM, Delft, the Netherlands, pp 137–148Google Scholar
  15. 15.
    Zhang Y, Gao Q, Gao L, Wang C (2011) imapreduce: a distributed computing framework for iterative computation. In: Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and Phd forum. IEEE Computer society, Anchorage, Alaska, USA, pp 1112–1121Google Scholar
  16. 16.
    Bu Y, Howe B, Balazinska M, Ernst MD (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1):285–296CrossRefGoogle Scholar
  17. 17.
    Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S-H, Qiu J, Fox G (2010) Twister: a runtime for iterative mapreduce. In Proceedings of the 19th International ACM symposium on high performance distributed computing. ACM, Chicago, Illinois, USA, pp 810–818Google Scholar
  18. 18.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  19. 19.
    Power R, Li J (2010) Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX conference on Operating systems design and implementation. USENIX Association, Vancouver, BC, Canada, pp 1–14Google Scholar
  20. 20.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association, Berkeley, CA, USA, pp 1–10Google Scholar
  21. 21.
    Murray DG, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S (2011) Ciel: a universal execution engine for distributed data-flow computing. In: Proceedings of the 8th USENIX conference on networked systems design and implementation. USENIX Association, Boston, MA, USA, pp 1–9Google Scholar
  22. 22.
    Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, Indianapolis, IN, USA, pp 135–146Google Scholar
  23. 23.
    Pearce O, Gamblin T, de Supinski BR, Schulz M, Amato NM (2012) Quantifying the effectiveness of load balance algorithms. In: Proceedings of the 26th ACM international conference on supercomputing. ACM, Venice, Italy, pp 185–194Google Scholar
  24. 24.
    Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX conference on operating systems design and implementation. USENIX Association, Hollywood, CA, USA, pp 17–30Google Scholar
  25. 25.
    Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX conference on Operating systems design and implementation. USENIX Association, Vancouver, BC, Canada, pp 1–16Google Scholar
  26. 26.
    Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, Scottsdale, AZ, USA, pp 25–36Google Scholar
  27. 27.
    Couzin ID, Krause J, Franks NR, Levin SA (2005) Effective leadership and decision-making in animal groups on the move. Nature 433(7025):513–516CrossRefGoogle Scholar
  28. 28.
    Raney B, Nagel K (2004) Iterative route planning for large-scale modular transportation simulations. Future Gener Comput Syst 20(7):1101–1118CrossRefGoogle Scholar
  29. 29.
    TS etc. (2012) Biological modeling and simulation. http://zool33.uni-graz.at/schmickl/index.html
  30. 30.
    Schrank D, Eisele B, Lomax T (2012) Tti’s 2012 urban mobility report. In: Proceedings of the 2012 annual urban mobility report. Texas A&M Transportation Institute, Texas, USAGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Yu Zhang
    • 1
  • Xiaofei Liao
    • 1
  • Hai Jin
    • 1
    Email author
  • Bing Bing Zhou
    • 2
  1. 1.Service Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.School of Information TechnologiesThe University of SydneySydneyAustralia

Personalised recommendations