Application of Simulated Annealing to Data Distribution for All-to-All Comparison Problems in Homogeneous Systems

  • Yi-Fan Zhang
  • Yu-Chu TianEmail author
  • Wayne Kelly
  • Colin Fidge
  • Jing Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9491)


Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop’s data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.


Simulated Annealing Load Balance Data Item Task Schedule Task Allocation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Author J. Gao would like to acknowledge the support from the National Natural Science Foundation of China under the Grant Number 61462070, and the Inner Mongolia Government under the Science and Technology Plan Grant Number 20130364.


  1. 1.
    Arora, R., Gupta, M.R., Kapila, A., Fazel, M.: Similarity-based clustering by left-stochastic matrix factorization. J. Mach. Learn. Res. 14, 1715–1746 (2013)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Somasundaram, K., Karthikeyan, S., Nayagam, M.G., RadhaKrishnan, S.: Efficient resource scheduler for parallel implementation of MSA algorithm on computational grid. In: International Conference on Recent Trends in Information, Telecommunication and Computing, Kochi, Kerala, pp. 365–368, IEEE, 12–13 March 2010Google Scholar
  3. 3.
    Gunturu, S., Li, X., Yang, L.: Load scheduling strategies for parallel DNA sequencing applications. In: Proceedings of the 11th IEEE International Conference on High Performance Computing & Communication Seoul, pp. 124–131. IEEE, 25–27 June 2009Google Scholar
  4. 4.
    Matsunaga, A., Tsugawa, M., Fortes, J.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: IEEE 4th International Conference on eScience, Indianapolis, IN, pp. 222–229, 7–12 December 2008Google Scholar
  5. 5.
    Pireddu, L., Leo, S., Zanetti, G.: Seal: a distributed short read mapping and duplicate removal tool. Bioinformatics 27(15), 2159–2160 (2011)CrossRefGoogle Scholar
  6. 6.
    Zhang, Y.F., Tian, Y.C., Fidge, C., Kelly, W.: A distributed computing framework for all-to-all comparison problems. In: The 40th Annual Conference of the IEEE Industrial Electronics Society (IECON 2014), Dallas, TX, USA. IEEE, 29 October–1 November 2014Google Scholar
  7. 7.
    Yu, X., Hong, B.: Bi-hadoop: Extending hadoop to improve support for binary-input applications. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, IE, pp. 245–252, 13–16 May 2013Google Scholar
  8. 8.
    Moretti, C., Bui, H., Hollingsworth, K., Rich, B., Flynn, P., Thain, D.: All-pairs: an abstraction for data-intensive computing on campus grids. IEEE Trans. Parallel Distrib. Syst. 21, 33–46 (2010)CrossRefGoogle Scholar
  9. 9.
    Pedersen, E., Raknes, I.A., Ernstsen, M., Bongo, L.A.: Integrating data-intensive computing systems with biological data analysis frameworks. In: 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, pp. 733–740. IEEE, 4–6 March 2015Google Scholar
  10. 10.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Wu, W., Li, L., Yao, X.: Improved simulated annealing algorithm for task allocation inreal-time distributed systems. In: IEEE International Conference on Signal Processing Communications & Computing (ICSPCC), Guilin, pp. 50–54. IEEE, 5–8 August 2014Google Scholar
  12. 12.
    Keikha, M.: Improved simulated annealing using momentum terms. In: 2011 Second International Conference on Intelligent Systems, Modelling and Simulation (ISMS), Kuala Lumpur, pp. 44–48. IEEE, 25–27 January 2011Google Scholar
  13. 13.
    Hao, B., Qi, J., Wang, B.: Prokaryotic phylogeny based on complete genomes without sequence alignment. Modern Phy. Lett. 2(4), 14–15 (2003)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yi-Fan Zhang
    • 1
  • Yu-Chu Tian
    • 1
    Email author
  • Wayne Kelly
    • 1
  • Colin Fidge
    • 1
  • Jing Gao
    • 2
  1. 1.School of Electrical Engineering and Computer ScienceQueensland University of TechnologyBrisbaneAustralia
  2. 2.College of Computer and Information EngineeringInner Mongolia Agricultural UniversityHohhotChina

Personalised recommendations