Advertisement

The Journal of Supercomputing

, Volume 62, Issue 3, pp 1305–1317 | Cite as

Analysis and improvement of map-reduce data distribution in read mapping applications

  • A. EspinosaEmail author
  • P. Hernandez
  • J. C. Moure
  • J. Protasio
  • A. Ripoll
Article

Abstract

The map-reduce paradigm has shown to be a simple and feasible way of filtering and analyzing large data sets in cloud and cluster systems. Algorithms designed for the paradigm must implement regular data distribution patterns so that appropriate use of resources is ensured. Good scalability and performance on Map-Reduce applications greatly depend on the design of regular intermediate data generation-consumption patterns at the map and reduce phases. We describe the data distribution patterns found in current Map-Reduce read mapping bioinformatics applications and show some data decomposition principles to greatly improve their scalability and performance

Keywords

Bioinformatics Read mapping Map reduce Scalability 

Notes

Acknowledgements

We want to thank Eduard Ayguade, David Carrera and the staff at Barcelona Supercomputing Center (BSC) for their help and support to the usage of the IBM Blade computer cluster.

This paper was supported by Consolider Project CSD2007-00050 of the Spanish Ministerio de Ciencia y Tecnologia.

References

  1. 1.
    Dean J et al (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113 CrossRefGoogle Scholar
  2. 2.
    Bialecki A, Cafarella M, Cutting D, O’Malley O (2005) Hadoop: a framework for running applications on large clusters built of commodity hardware. Wiki at http://hadoop.apache.org/
  3. 3.
    Shi X (2009) Evaluating MapReduce on virtual machines: the Hadoop case. In: CloudCom 2009. LNCS, vol 5931. Springer, Berlin, pp 519–528 Google Scholar
  4. 4.
    Schatz M (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369 CrossRefGoogle Scholar
  5. 5.
    Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Searching for SNPs with cloud computing. Genome Biol 10:R134 CrossRefGoogle Scholar
  6. 6.
    Matthews SJ, Williams TL (2010) MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinform 11:S15 CrossRefGoogle Scholar
  7. 7.
    Ranger C, Raghurama R, Penmetsa A, Bradski G, Kozykaris C (2007) Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the 13th international symposium on high-performance computer architecture (HPCA), Phoenix, AZ Google Scholar
  8. 8.
    Mao Y, Morris R, Kaashoek MF (2010) Optimizing MapReduce for multicore architectures. Tech Rep, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology Google Scholar
  9. 9.
    Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858 CrossRefGoogle Scholar
  10. 10.
    Baeza-Yates RA et al (1992) Fast and practical approximate string matching. In: Proceedings of the combinatorial pattern matching. Third annual symposium, Tucson, pp 185–192 CrossRefGoogle Scholar
  11. 11.
    Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714 CrossRefGoogle Scholar
  12. 12.
    Smith AD et al (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinform 9:128 CrossRefGoogle Scholar
  13. 13.
    Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM symposium on cloud computing. ACM, New York Google Scholar
  14. 14.
    Palla K (2009) A comparative analysis of join algorithms using the Hadoop Map/Reduce framework. Master of science thesis. School of informatics, University of Edinburgh Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • A. Espinosa
    • 1
    Email author
  • P. Hernandez
    • 1
  • J. C. Moure
    • 1
  • J. Protasio
    • 1
  • A. Ripoll
    • 1
  1. 1.Computer Architecture and Operating Systems DepartmentUniversitat Autonoma de BarcelonaBellaterraSpain

Personalised recommendations