Skip to main content

Application of Simulated Annealing to Data Distribution for All-to-All Comparison Problems in Homogeneous Systems

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9491))

Included in the following conference series:

Abstract

Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop’s data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arora, R., Gupta, M.R., Kapila, A., Fazel, M.: Similarity-based clustering by left-stochastic matrix factorization. J. Mach. Learn. Res. 14, 1715–1746 (2013)

    MathSciNet  MATH  Google Scholar 

  2. Somasundaram, K., Karthikeyan, S., Nayagam, M.G., RadhaKrishnan, S.: Efficient resource scheduler for parallel implementation of MSA algorithm on computational grid. In: International Conference on Recent Trends in Information, Telecommunication and Computing, Kochi, Kerala, pp. 365–368, IEEE, 12–13 March 2010

    Google Scholar 

  3. Gunturu, S., Li, X., Yang, L.: Load scheduling strategies for parallel DNA sequencing applications. In: Proceedings of the 11th IEEE International Conference on High Performance Computing & Communication Seoul, pp. 124–131. IEEE, 25–27 June 2009

    Google Scholar 

  4. Matsunaga, A., Tsugawa, M., Fortes, J.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: IEEE 4th International Conference on eScience, Indianapolis, IN, pp. 222–229, 7–12 December 2008

    Google Scholar 

  5. Pireddu, L., Leo, S., Zanetti, G.: Seal: a distributed short read mapping and duplicate removal tool. Bioinformatics 27(15), 2159–2160 (2011)

    Article  Google Scholar 

  6. Zhang, Y.F., Tian, Y.C., Fidge, C., Kelly, W.: A distributed computing framework for all-to-all comparison problems. In: The 40th Annual Conference of the IEEE Industrial Electronics Society (IECON 2014), Dallas, TX, USA. IEEE, 29 October–1 November 2014

    Google Scholar 

  7. Yu, X., Hong, B.: Bi-hadoop: Extending hadoop to improve support for binary-input applications. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, IE, pp. 245–252, 13–16 May 2013

    Google Scholar 

  8. Moretti, C., Bui, H., Hollingsworth, K., Rich, B., Flynn, P., Thain, D.: All-pairs: an abstraction for data-intensive computing on campus grids. IEEE Trans. Parallel Distrib. Syst. 21, 33–46 (2010)

    Article  Google Scholar 

  9. Pedersen, E., Raknes, I.A., Ernstsen, M., Bongo, L.A.: Integrating data-intensive computing systems with biological data analysis frameworks. In: 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, pp. 733–740. IEEE, 4–6 March 2015

    Google Scholar 

  10. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  11. Wu, W., Li, L., Yao, X.: Improved simulated annealing algorithm for task allocation inreal-time distributed systems. In: IEEE International Conference on Signal Processing Communications & Computing (ICSPCC), Guilin, pp. 50–54. IEEE, 5–8 August 2014

    Google Scholar 

  12. Keikha, M.: Improved simulated annealing using momentum terms. In: 2011 Second International Conference on Intelligent Systems, Modelling and Simulation (ISMS), Kuala Lumpur, pp. 44–48. IEEE, 25–27 January 2011

    Google Scholar 

  13. Hao, B., Qi, J., Wang, B.: Prokaryotic phylogeny based on complete genomes without sequence alignment. Modern Phy. Lett. 2(4), 14–15 (2003)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Author J. Gao would like to acknowledge the support from the National Natural Science Foundation of China under the Grant Number 61462070, and the Inner Mongolia Government under the Science and Technology Plan Grant Number 20130364.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Chu Tian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, YF., Tian, YC., Kelly, W., Fidge, C., Gao, J. (2015). Application of Simulated Annealing to Data Distribution for All-to-All Comparison Problems in Homogeneous Systems. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_77

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26555-1_77

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26554-4

  • Online ISBN: 978-3-319-26555-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics