Application of Simulated Annealing to Data Distribution for All-to-All Comparison Problems in Homogeneous Systems
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop’s data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
KeywordsSimulated Annealing Load Balance Data Item Task Schedule Task Allocation
Author J. Gao would like to acknowledge the support from the National Natural Science Foundation of China under the Grant Number 61462070, and the Inner Mongolia Government under the Science and Technology Plan Grant Number 20130364.
- 2.Somasundaram, K., Karthikeyan, S., Nayagam, M.G., RadhaKrishnan, S.: Efficient resource scheduler for parallel implementation of MSA algorithm on computational grid. In: International Conference on Recent Trends in Information, Telecommunication and Computing, Kochi, Kerala, pp. 365–368, IEEE, 12–13 March 2010Google Scholar
- 3.Gunturu, S., Li, X., Yang, L.: Load scheduling strategies for parallel DNA sequencing applications. In: Proceedings of the 11th IEEE International Conference on High Performance Computing & Communication Seoul, pp. 124–131. IEEE, 25–27 June 2009Google Scholar
- 4.Matsunaga, A., Tsugawa, M., Fortes, J.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: IEEE 4th International Conference on eScience, Indianapolis, IN, pp. 222–229, 7–12 December 2008Google Scholar
- 6.Zhang, Y.F., Tian, Y.C., Fidge, C., Kelly, W.: A distributed computing framework for all-to-all comparison problems. In: The 40th Annual Conference of the IEEE Industrial Electronics Society (IECON 2014), Dallas, TX, USA. IEEE, 29 October–1 November 2014Google Scholar
- 7.Yu, X., Hong, B.: Bi-hadoop: Extending hadoop to improve support for binary-input applications. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, IE, pp. 245–252, 13–16 May 2013Google Scholar
- 9.Pedersen, E., Raknes, I.A., Ernstsen, M., Bongo, L.A.: Integrating data-intensive computing systems with biological data analysis frameworks. In: 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, pp. 733–740. IEEE, 4–6 March 2015Google Scholar
- 11.Wu, W., Li, L., Yao, X.: Improved simulated annealing algorithm for task allocation inreal-time distributed systems. In: IEEE International Conference on Signal Processing Communications & Computing (ICSPCC), Guilin, pp. 50–54. IEEE, 5–8 August 2014Google Scholar
- 12.Keikha, M.: Improved simulated annealing using momentum terms. In: 2011 Second International Conference on Intelligent Systems, Modelling and Simulation (ISMS), Kuala Lumpur, pp. 44–48. IEEE, 25–27 January 2011Google Scholar