PAGANtec: OpenMP Parallel Error Correction for Next-Generation Sequencing Data

  • Markus JoppichEmail author
  • Dirk Schmidl
  • Anthony M. Bolger
  • Torsten Kuhlen
  • Björn Usadel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9342)


Next-generation sequencing techniques reduced the cost of sequencing a genome rapidly, but came with a relatively high error rate. Therefore, error correction of this data is a necessary task before assembly can take place. Since the input data is huge and error correction is compute intensive, parallelizing this work on a modern shared-memory system can help to keep the runtime feasible. In this work we present PAGANtec, a tool for error correction of next-generation sequencing data, based on the novel PAGAN graph structure. PAGANtec was parallelized with OpenMP and a performance analysis and tuning was done. The analysis led to the awareness, that OpenMP tasks are a more suitable paradigm for this work than traditional work-sharing.


Error Correction Graph Structure Transcriptome Assembly Transactional Memory Load Imbalance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Badia, R.M., Martorell, X.: Tutorial OmpSs: single node programming. In: Parallel Programming Workshop (2013)Google Scholar
  2. 2.
    Bolger, A.M.: PAGAN Framework. Private Communication (2014)Google Scholar
  3. 3.
    Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30, 1–7 (2014)CrossRefGoogle Scholar
  4. 4.
    Carrier, P., Long, B., Walsh, R., Dawson, J., Sosa, C.P., Haas, B., Tickle, T., William, T.: The impact of high-performance computing best practice applied to next-generation sequencing workflows. Technical report, April 2015.
  5. 5.
    Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  6. 6.
    Duran, A., Ayguade, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogenous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Georganas, E., Buluç, A., Chapman, J., Oliker, L., Rokhsar, D., Yelick, K.: Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly, pp. 437–448, November 2014Google Scholar
  8. 8.
    Intel: Intel VTune Amplifier XE 2013 (2013).
  9. 9.
    Kaya, K., Hatem, A., Özer, H.G., Huang, K., Çatalyürek, U.V.: High-performance computing in high-throughput sequencing. In: Elloumi, M., Zomaya, A.Y. (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, Chap. 43, pp. 981–1002. Wiley, Hoboken (2013)CrossRefGoogle Scholar
  10. 10.
    Kelley, D.R., Schatz, M.C., Salzberg, S.L.: Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11(11), R116 (2010)CrossRefGoogle Scholar
  11. 11.
    Le, H.S., Schulz, M.H., McCauley, B.M., Hinman, V.F., Bar-Joseph, Z.: Probabilistic error correction for RNA sequencing. Nucleic Acids Res. 41(10), e109 (2013)CrossRefGoogle Scholar
  12. 12.
    Liu, Y., Schmidt, B., Maskell, D.L.: DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI. BMC Bioinf. 12, 85 (2011)CrossRefGoogle Scholar
  13. 13.
    Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics 95(6), 315–327 (2010)CrossRefGoogle Scholar
  14. 14.
    NVIDIA: Tesla K40 and K80 GPU Accelerators for Servers, December 2014.
  15. 15.
  16. 16.
    Sachdeva, V., Kim, C., Jordan, K., Winn, M.: Parallelization of the trinity pipeline for De Novo transcriptome assembly. In: 2014 IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 566–575. IEEE, May 2014Google Scholar
  17. 17.
    Schmidt, B., Müller-Wittig, W.: Accelerating error correction in high-throughput short-read DNA sequencing data with CUDA. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE, May 2009Google Scholar
  18. 18.
    Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefGoogle Scholar
  19. 19.
    Yang, X., Chockalingam, S.P., Aluru, S.: A survey of error-correction methods for next-generation sequencing. Briefings Bioinf. 14(1), 56–66 (2013)CrossRefGoogle Scholar
  20. 20.
    Yang, X., Dorman, K.S., Aluru, S.: Reptile: representative tiling for short read error correction. Bioinformatics 26(20), 2526–2533 (2010). (Oxford, England)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Markus Joppich
    • 1
    • 2
    • 3
    Email author
  • Dirk Schmidl
    • 1
  • Anthony M. Bolger
    • 2
  • Torsten Kuhlen
    • 1
  • Björn Usadel
    • 2
  1. 1.JARA – High-Performance Computing, IT CenterRWTH Aachen UniversityAachenGermany
  2. 2.Institute for Botany and Molecular GeneticsRWTH Aachen UniversityAachenGermany
  3. 3.Institute for InformaticsLudwig-Maximilians-Universität MunichMunichGermany

Personalised recommendations