Abstract
Many scientific disciplines use maximum likelihood evaluation (MLE) as an analytical tool. As the data to be analyzed grows increasingly, MLE demands more parallelism to improve analysis efficiency. Unfortunately, it is difficult for scientists and engineers to develop their own distributed/parallelized MLE applications. In addition, self-adaptability is an important characteristic for computing-intensive application for improving efficiency. This paper presents a self-adaptive and parallelized MLE framework that consists of a master process and a set of worker processes on a distributed environment. The workers are responsible to compute tasks, while the master needs to merge the computing results, to initiate or to terminate another computing iteration, and to decide how to re-distribute the computing tasks to workers. The proposed approach uses neither any monitoring mechanism to collect system state nor load-balancing-decision mechanism to balancing the workload. Instead, it measures the performance of each worker for computing an iteration, and uses the information to adjust the workload of workers accordingly. The experimental results show that not only the proposed framework can adapt to environmental changes, but also the proposed framework is effective; even in a stable environment that is dedicated for one application, the proposed framework still demonstrates its significant improvement in self-adaptability. The self-adaptability will be significantly improved while the workload of computing machines unbalanced.
Similar content being viewed by others
References
Anderson DP, Cobb J, Korpela E, Lebofsky M, Werthimer D (2002) SETI@home: an experiment in public-resource computing. Commun ACM 45(11):56–61
Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Stoica I, Zaharia M (2009) Above the cloud: a Berkeley view of cloud computing. Technical Report, Electrical Engineering and Computer Sciences University of California at Berkeley, February 10
Bündgen R, Göbel M, Küchlin W (1996) A master-slave approach to parallel term rewriting on a hierarchical multiprocessor. In: Design and implementation of symbolic computation systems, pp 183–194
Chen CC, Henson RN, Stephan KE, Kilner JM, Friston KJ (2009) Forward and backward connections in the brain: a DCM study of functional asymmetries in face processing. NeuroImage 45(2):453–462
Condor (2011) Available: http://www.cs.wisc.edu/condor/
Cramer HA (1958) Mathematical methods of statistics. Princeton University Press, Princeton
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of 6th symposium on operating system design and implementation, San Francisco, CA, December
Desell T, Cole N, Magdon-Ismail M, Newberg H, Szymanski B, Varela CA (2007) Distributed and generic maximum likelihood evaluation. In: 3rd IEEE international conference on e-science and grid computing (eScience2007), Bangalore, India, pp 337–344
Edjali G, Agarwal G, Sussman A, Saltz J (1995) Data parallel programming in an adaptive environment. In: Proceedings of the ninth international parallel processing symposium, Santa Barbara, CA, pp 827–832
El Maghraoui K (2007) A framework for the dynamic reconfiguration of scientific applications in grid environments. Rensselaer Polytechnic Institute, PhD thesis, USA
El Maghraoui K, Desell TJ, Szymanski BK, Varela CA (2006) The internet operating system: Middleware for adaptive distributed computing. Int J High Perform Comput Appl 10(4):467–480
Falkenauer E (1998) Genetic algorithms and grouping problems. Wiley, Chichester
Foster I, Kesselman C (2003) The Grid 2: blueprint for a new computing infrastructure. Morgan Kaufmann, San Francisco
Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: enabling scalable virtual organizations. Int J High Perform Comput Appl 15(3):200–222
Globus (2011) Available: http://www.globus.org/
Hummel SF, Schonberg E, Flynn LE (1992) Factoring: a method for scheduling parallel loops. Commun ACM 35(8):90–101
Kruskal CP, Weiss A (1985) Allocating independent subtasks on parallel processors. IEEE Trans Softw Eng 11:1001–1016
Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7):971–973
Mahanti A, Eager DL (2004) Adaptive data parallel computing on workstation clusters. J Parallel Distrib Comput 64(11):1241–1255
Martínez JA, Almeida F, Garzón EM, Acosta A, Blanco V (2011) Adaptive load balancing of iterative computation on heterogeneous nondedicated systems. J Supercomput. doi:10.1007/s11227-011-0595-3
Message Passing Interface Forum (2008) MPI: a message-passing interface standard version 2.1. June. Available: http://www.mpi-forum.org/docs/mpi21-report.pdf
MPICH2 (2011) Available: http://www.mcs.anl.gov/research/projects/mpich2/
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7:308–313
Polychronopoulos CD, Kuck DJ (1987) Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans Comput 36:1425–1439
Proutty R, Otto S, Walpole J (1994) Adaptive execution of data parallel computations on networks of heterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Science & Technology
Purnell J, Magdon-Ismail M, Newberg H (2005) A probabilistic approach to finding geometric objects in spatial datasets of the Milky Way. In: Proceedings of the 15th international symposium on methodologies for intelligent systems (ISMIS 2005), Saratoga Springs, NY, USA, pp 475–484
Ribler RL, Vetter JS, Simitci H, Reed DA (1998) Autopilot: adaptive control of distributed applications. In: Proceedings of the seventh IEEE symposium on high-performance distributed computing, pp 172–179
Salo O, Kolehmainen K, Kyllönen P, Löthman J, Salmijärvi S, Abrahamsson P (2004) Self-adaptability of agile software processes: a case study on post-iteration workshops. In: Lecture notes in computer science, vol 3092. Springer, Berlin, pp 184–193
Sanjuan-Estrada J, Casado L, García I (2011) Adaptive parallel interval branch and bound algorithms based on their performance for multicore architectures. J Supercomput. doi:10.1007/s11227-011-0594-4
Shao G (2001) Adaptive scheduling of master/worker applications on distributed computational resources. PhD thesis, UC-San Diego, San Diego, USA
Snyman JA (2005) Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms. Springer, Berlin
Wang W, El Maghraoui K, Cummings J, Napolitano J, Szymanski B, Varela C (2006) A middleware framework for maximum likelihood evaluation over dynamic grids. In: Second IEEE international conference on e-science and grid computing, Amsterdam, Netherlands
Zanikolas S, Sakellariou R (2005) A taxonomy of grid monitoring systems. Future Gener Comput Syst 21:163–188
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, WJ., Chang, YS., Wu, CH. et al. A self-adaptive computing framework for parallel maximum likelihood evaluation. J Supercomput 61, 67–83 (2012). https://doi.org/10.1007/s11227-011-0648-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0648-7