Skip to main content

Advertisement

Log in

Supercomputing of reducing sequenced bases in de novo sequencing of the human genome

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

DNA sequencing is one of the important sub-disciplines of bioinformatics, which has various applications in medicine, history, demography, and archaeology. De novo sequencing is the most challenging problem in this field. De novo sequencing is used for recognizing a new genome and for sequencing unknown parts of the genome such as in cancer cells. For assembling the genome, first, small fragments of the genome (called reads) that are located randomly on the genome are sequenced by the sequencing machine. Then, they are sent to the processing machine to be aligned on the genome. To sequence the whole genome, the reads must cover it entirely. The minimum number of reads to cover the genome is given by the Lander–Waterman's coverage bound. In this paper, we generalize the later scheme to de novo sequencing and reduce the total number of required bases by Lander–Waterman's coverage bound. We investigate the performance of the scheme such as the longest generated contig length, the execution time of the algorithm, different read lengths, and probability of error in the genome assembly. The results show the computational complexity and execution time of the algorithm in parallel on human genome with length 50,000 bases. We also show that the proposed method can generate contigs with 90 percent genome length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

References

  1. Farazin A, Sahmani S, Soleimani M et al (2021) Effect of hexagonal structure nanoparticles on the morphological performance of the ceramic scaffold using analytical oscillation response. Ceram Int 47:18339–18350. https://doi.org/10.1016/j.ceramint.2021.03.155

    Article  Google Scholar 

  2. Farazin A, Akbari Aghdam H, Motififard M et al (2019) A polycaprolactone bio-nanocomposite bone substitute fabricated for femoral fracture approaches: molecular dynamic and micromechanical investigation. J Nanoanalysis 6:172–184

    Google Scholar 

  3. Farazin A, Aghadavoudi F, Motififard M et al (2021) Nanostructure, molecular dynamics simulation and mechanical performance of PCL membranes reinforced with antibacterial nanoparticles. J Appl Comput Mech 7:1907–1915

    Google Scholar 

  4. Kazeroni ZS, Telloo M, Farazin A et al (2021) A mitral heart valve prototype using sustainable polyurethane polymer: fabricated by 3D bioprinter, tested by molecular dynamics simulation. AUT J Mech Eng 5:109–120

    Google Scholar 

  5. Farazin A, Mohammadimehr M, Ghasemi AH, Naeimi H (2021) Design, preparation, and characterization of CS/PVA/SA hydrogels modified with mesoporous Ag 2 O/SiO 2 and curcumin nanoparticles for green, biocompatible, and antibacterial biopolymer film. RSC Adv 11:32775–32791. https://doi.org/10.1039/D1RA05153A

    Article  Google Scholar 

  6. Farazin A, Mohammadimehr M (2021) Computer modeling to forecast accurate of efficiency parameters of different size of graphene platelet, carbon, and boron nitride nanotubes: a molecular dynamics simulation. Comput Concr 27:111

    Google Scholar 

  7. Chen T-C, Elveny M, Surendar A et al (2021) Developing a multilateral-based neural network model for engineering of high entropy amorphous alloys. Model Simul Mater Sci Eng 29:065019. https://doi.org/10.1088/1361-651X/ac1774

    Article  Google Scholar 

  8. Xuefeng L, Lei H, Dongmei C (2021) Simulation of pit interactions of multi-pit corrosion under an anticorrosive coating with a three-dimensional cellular automata model. Model Simul Mater Sci Eng 29:065018. https://doi.org/10.1088/1361-651X/ac13cb

    Article  Google Scholar 

  9. M J, Bhattacharya A, (2021) A 2D model for prediction of nanoparticle distribution and microstructure evolution during solidification of metal matrix nanocomposites. Model Simul Mater Sci Eng 29:065017. https://doi.org/10.1088/1361-651X/ac165c

    Article  Google Scholar 

  10. Bhardwaj U, Sand AE, Warrier M (2021) Comparison of SIA defect morphologies from different interatomic potentials for collision cascades in W. Model Simul Mater Sci Eng 29:065015. https://doi.org/10.1088/1361-651X/ac095d

    Article  Google Scholar 

  11. Wu L, Zhu Y, Wang H, Li M (2021) Crystal–melt coexistence in fcc and bcc metals: a molecular-dynamics study of kinetic coefficients. Model Simul Mater Sci Eng 29:065016. https://doi.org/10.1088/1361-651X/ac13c9

    Article  Google Scholar 

  12. Wong TN, Chan LCF, Lau HCW (2003) Machining process sequencing with fuzzy expert system and genetic algorithms. Eng Comput 19:191–202. https://doi.org/10.1007/s00366-003-0260-4

    Article  Google Scholar 

  13. Medland AJ (1994) A proposed structure for a rule-based description of parametric forms. Eng Comput 10:155–161. https://doi.org/10.1007/BF01198741

    Article  Google Scholar 

  14. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426. https://doi.org/10.1016/j.tig.2014.07.001

    Article  Google Scholar 

  15. Huang L-T, Wei K-C, Wu C-C et al (2021) A lightweight BLASTP and its implementation on CUDA GPUs. J Supercomput 77:322–342. https://doi.org/10.1007/s11227-020-03267-1

    Article  Google Scholar 

  16. Chang W-L, Huang S-C, Lin KW, Ho M (2011) Fast parallel DNA-based algorithms for molecular computation: discrete logarithm. J Supercomput 56:129–163. https://doi.org/10.1007/s11227-009-0347-9

    Article  Google Scholar 

  17. Fernández L, Pérez M, Orduña JM (2019) Visualization of DNA methylation results through a GPU-based parallelization of the wavelet transform. J Supercomput 75:1496–1509. https://doi.org/10.1007/s11227-018-2670-5

    Article  Google Scholar 

  18. Ahmed M, Ahmad I, Ahmad MS (2015) A survey of genome sequence assembly techniques and algorithms using high-performance computing. J Supercomput 71:293–339. https://doi.org/10.1007/s11227-014-1297-4

    Article  Google Scholar 

  19. González-Álvarez DL, Vega-Rodríguez MA, Rubio-Largo Á (2014) Parallelizing and optimizing a hybrid differential evolution with Pareto tournaments for discovering motifs in DNA sequences. J Supercomput 70:880–905. https://doi.org/10.1007/s11227-014-1266-y

    Article  Google Scholar 

  20. Wang F, Yu S, Yang J (2010) Robust and efficient fragments-based tracking using mean shift. AEU - Int J Electron Commun 64:614–623. https://doi.org/10.1016/j.aeue.2009.04.004

    Article  Google Scholar 

  21. Fang J, Yang J, Liu H (2011) Efficient and robust fragments-based multiple kernels tracking. AEU - Int J Electron Commun 65:915–923. https://doi.org/10.1016/j.aeue.2011.02.013

    Article  Google Scholar 

  22. Chen D, Li Y, Wang Y, Xu J (2021) LncRNA HOTAIRM1 knockdown inhibits cell glycolysis metabolism and tumor progression by miR-498/ABCE1 axis in non-small cell lung cancer. Genes Genomics 43:183–194. https://doi.org/10.1007/s13258-021-01052-9

    Article  Google Scholar 

  23. Lee JH, Kim J, Kim H et al (2021) Massively parallel sequencing of 25 short tandem repeat loci including the SE33 marker in Koreans. Genes Genomics 43:133–140. https://doi.org/10.1007/s13258-020-01033-4

    Article  Google Scholar 

  24. Nguyen TH, Nguyen N-L, Vu CD et al (2021) Identification of three novel mutations in PCNT in vietnamese patients with microcephalic osteodysplastic primordial dwarfism type II. Genes Genomics 43:115–121. https://doi.org/10.1007/s13258-020-01032-5

    Article  Google Scholar 

  25. Cui J, Wang J, Shen Y, Lin D (2021) Suppression of HELLS by miR-451a represses mTOR pathway to hinder aggressiveness of SCLC. Genes Genomics 43:105–114. https://doi.org/10.1007/s13258-020-01028-1

    Article  Google Scholar 

  26. Srikulnath K, Singchat W, Laopichienpong N et al (2021) Overview of the betta fish genome regarding species radiation, parental care, behavioral aggression, and pigmentation model relevant to humans. Genes Genomics 43:91–104. https://doi.org/10.1007/s13258-020-01027-2

    Article  Google Scholar 

  27. Steele KA, Quinton-Tulloch MJ, Amgai RB et al (2018) Accelerating public sector rice breeding with high-density KASP markers derived from whole genome sequencing of indica rice. Mol Breed 38:38. https://doi.org/10.1007/s11032-018-0777-2

    Article  Google Scholar 

  28. Monjezi M, Baghestani M, Shirani Faradonbeh R et al (2016) Modification and prediction of blast-induced ground vibrations based on both empirical and computational techniques. Eng Comput 32:717–728. https://doi.org/10.1007/s00366-016-0448-z

    Article  Google Scholar 

  29. Tandis E, Assareh E (2017) Inverse design of airfoils via an intelligent hybrid optimization technique. Eng Comput 33:361–374. https://doi.org/10.1007/s00366-016-0478-6

    Article  Google Scholar 

  30. Mishra VK, Bajaj V, Kumar A et al (2017) An efficient method for analysis of EMG signals using improved empirical mode decomposition. AEU - Int J Electron Commun 72:200–209. https://doi.org/10.1016/j.aeue.2016.12.008

    Article  Google Scholar 

  31. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21:i351–i358. https://doi.org/10.1093/bioinformatics/bti1018

    Article  Google Scholar 

  32. Du X, Servin B, Womack JE et al (2014) An update of the goat genome assembly using dense radiation hybrid maps allows detailed analysis of evolutionary rearrangements in Bovidae. BMC Genomics 15:625. https://doi.org/10.1186/1471-2164-15-625

    Article  Google Scholar 

  33. Chin C-S, Alexander DH, Marks P et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569. https://doi.org/10.1038/nmeth.2474

    Article  Google Scholar 

  34. Ye C, Hill CM, Wu S et al (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci Rep 6:31900. https://doi.org/10.1038/srep31900

    Article  Google Scholar 

  35. O’Rawe J, Jiang T, Sun G et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5:28. https://doi.org/10.1186/gm432

    Article  Google Scholar 

  36. Grimwood J, Gordon LA, Olsen A et al (2004) The DNA sequence and biology of human chromosome 19. Nature 428:529–535. https://doi.org/10.1038/nature02399

    Article  Google Scholar 

  37. Nashta-ali D, Motahari SA, Hosseinkhalaj B (2016) Breaking Lander-Waterman’s Coverage Bound. PLoS ONE 11:e0164888. https://doi.org/10.1371/journal.pone.0164888

    Article  Google Scholar 

  38. Genome Browser Gateway. https://genome-asia.ucsc.edu/

Download references

Acknowledgements

The authors would like to thank the referees for their valuable comments. Also, we would like to extend their gratitude for the support provided by the Isfahan University, Isfahan, Iran

Funding

The authors received no financial support for the research, authorship, and publication of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashkan Farazin.

Ethics declarations

Conflict of interest

No conflict of interest exists in the submission of this article, and the article was approved by all the authors.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kavezadeh, S., Farazin, A. & Hosseinzadeh, A. Supercomputing of reducing sequenced bases in de novo sequencing of the human genome. J Supercomput 78, 14769–14793 (2022). https://doi.org/10.1007/s11227-022-04449-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04449-9

Keywords

Navigation