Advertisement

Quantitative Biology

, Volume 6, Issue 4, pp 359–368 | Cite as

WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets

  • Sheng Wang
  • Zhen Li
  • Yizhou Yu
  • Xin Gao
Methodology Article

Abstract

Background

The Oxford MinION nanopore sequencer is the recently appealing third-generation genome sequencing device that is portable and no larger than a cellphone. Despite the benefits of MinION to sequence ultra-long reads in real-time, the high error rate of the existing base-calling methods, especially indels (insertions and deletions), prevents its use in a variety of applications.

Methods

In this paper, we show that such indel errors are largely due to the segmentation process on the input electrical current signal from MinION. All existing methods conduct segmentation and nucleotide label prediction in a sequential manner, in which the errors accumulated in the first step will irreversibly influence the final base-calling. We further show that the indel issue can be significantly reduced via accurate labeling of nucleotide and move labels directly from the raw signal, which can then be efficiently learned by a bi-directionalWaveNet model simultaneously through feature sharing. Our bi-directional WaveNet model with residual blocks and skip connections is able to capture the extremely long dependency in the raw signal. Taking the predicted move as the segmentation guidance, we employ the Viterbi decoding to obtain the final base-calling results from the smoothed nucleotide probability matrix.

Results

Our proposed base-caller, WaveNano, achieves good performance on real MinION sequencing data from Lambda phage.

Conclusions

The signal-level nanopore base-callerWaveNano can obtain higher base-calling accuracy, and generate fewer insertions/deletions in the base-called sequences.

Keywords

nanopore sequencing bi-directional WaveNets base-calling third generation sequencing deep learning 

Notes

Acknowledgements

We thank Minh Duc Cao and Lachlan J. M. Coin for providing the nanopore sequencing data for the Lambda phage sample. We thank Haotian Teng for providing helpful discussions. This work was supported by the Kind Abdullah Unviersity of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards Nos. FCC/1/1976-04, URF/1/2601-01, URF/1/3007-01, URF/1/3412-01 and URF/1/3450-01.

References

  1. 1.
    Cao, M. D., Nguyen, S. H., Ganesamoorthy, D., Elliott, A. G., Cooper, M. A. and Coin, L. J. (2017) Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat. Commun., 8, 14515CrossRefGoogle Scholar
  2. 2.
    Loman, N. J., Quick, J. and Simpson, J. T. (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods, 12, 733–735CrossRefGoogle Scholar
  3. 3.
    Li, Y., Han, R., Bi, C., Li, M., Wang, S. and Gao, X. (2018) DeepSimulator: a deep simulator for nanopore sequencing. Bioinformatics, 34, 2899–2908CrossRefGoogle Scholar
  4. 4.
    Jain, M., Fiddes, I. T., Miga, K. H., Olsen, H. E., Paten, B. and Akeson, M. (2015) Improved data analysis for the MinION nanopore sequencer. Nat. Methods, 12, 351–356CrossRefGoogle Scholar
  5. 5.
    Lu, H., Giordano, F. and Ning, Z. (2016) Oxford Nanopore MinION sequencing and genome assembly. Genom. Proteom. Bioinf., 14, 265–279CrossRefGoogle Scholar
  6. 6.
    Quick, J., Loman, N. J., Duraffour, S., Simpson, J. T., Severi, E., Cowley, L., Bore, J. A., Koundouno, R., Dudas, G., Mikhail, A., et al. (2016) Real-time, portable genome sequencing for Ebola surveillance. Nature, 530, 228–232CrossRefGoogle Scholar
  7. 7.
    Castro-Wallace, S. L., Chiu, C. Y., John, K. K., Stahl, S. E., Rubins, K. H., McIntyre, A. B. R., Dworkin, J. P., Lupisella, M. L., Smith, D. J., Botkin, D. J., et al. (2017) Nanopore DNA sequencing and genome assembly on the International Space Station. Sci. Rep., 7, 18022CrossRefGoogle Scholar
  8. 8.
    Loose, M., Malla, S. and Stout, M. (2016) Real-time selective sequencing using nanopore technology. Nat. Methods, 13, 751–754CrossRefGoogle Scholar
  9. 9.
    Jain, M., Olsen, H. E., Paten, B. and Akeson, M. (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol., 17, 239CrossRefGoogle Scholar
  10. 10.
    Goodwin, S., Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C. and McCombie, W. R. (2015) Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res., 25, 1750–1756CrossRefGoogle Scholar
  11. 11.
    Sovic, I., Šikic, M., Wilm, A., Fenlon, S. N., Chen, S. and Nagarajan, N. (2016) Fast and sensitive mapping of error-prone nanopore sequencing reads with GraphMap. Nat Commun., 7, 11307CrossRefGoogle Scholar
  12. 12.
    Szalay, T. and Golovchenko, J. A. (2015) De novo sequencing and variant calling with nanopores using PoreSeq. Nat. Biotechnol., 33, 1087–1091CrossRefGoogle Scholar
  13. 13.
    David, M., Dursi, L. J., Yao, D., Boutros, P. C. and Simpson, J. T. (2017) Nanocall: an open source basecaller for Oxford Nanopore sequencing data. Bioinformatics, 33, 49–55CrossRefGoogle Scholar
  14. 14.
    Boža, V., Brejová, B. and Vinar, T. (2017) DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PLoS One, 12, e0178751CrossRefGoogle Scholar
  15. 15.
    Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu K. (2016) Wavenet: A generative model for raw audio. ArXiv, 1609.03499Google Scholar
  16. 16.
    Hochreiter, S. and Schmidhuber, J. (1997) Long short-term memory. Neural Comput., 9, 1735–1780CrossRefGoogle Scholar
  17. 17.
    Chung, J., Gulcehre, C., Cho, K. H. and Bengio, Y. (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv, 1412.3555Google Scholar
  18. 18.
    LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep learning. Nature, 521, 436–444CrossRefGoogle Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., and Sun, J. (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las VegasGoogle Scholar
  20. 20.
    Hirschberg, J. and Manning, C. D. (2015) Advances in natural language processing. Science, 349, 261–266CrossRefGoogle Scholar
  21. 21.
    Wang, S., Sun, S., Li, Z., Zhang, R. and Xu, J. (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324CrossRefGoogle Scholar
  22. 22.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410CrossRefGoogle Scholar
  23. 23.
    Pearson, W. R. and Miller, W. (1992) Dynamic programming algorithms for biological sequence comparison. In Methods in Enzymology. pp. 575–601, ElsevierGoogle Scholar
  24. 24.
    Wang, S., Ma, J. and Xu, J. (2016) AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics, 32, i672–i679CrossRefGoogle Scholar
  25. 25.
    McIntyre, A. B., Rizzardi, L., Yu, A. M., Alexander, N., Rosen, G. L., Botkin, D. J., Stahl, S. E., John, K. K., Castro-Wallace, S. L., McGrath, K., et al. (2016) Nanopore sequencing in microgravity. npj Microgravity, 2, 16035CrossRefGoogle Scholar
  26. 26.
    Teng, H., Cao, M. D., Hall, M. B., Duarte, T., Wang, S. and Coin, L. J. M. (2018) Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. Gigascience, 7, giy037CrossRefGoogle Scholar
  27. 27.
    Han, R., Li, Y., Wang, S. and Gao, X. (2017) An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing. bioRxiv, 238857CrossRefGoogle Scholar
  28. 28.
    van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., and Kavukcuoglu, K. (2016) Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing SystemsGoogle Scholar
  29. 29.
    Wang S., Sun S., and Xu J. (2016) AUC-maximized deep convolutional neural fields for protein sequence labeling. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science, Frasconi P., Landwehr N., Manco G., Vreeken J. (eds) vol 9852. Springer, ChamGoogle Scholar
  30. 30.
    Calders T., and Jaroszewicz S. (2007) Efficient AUC optimization for classification. In Knowledge Discovery in Databases: PKDD 2007. Lecture Notes in Computer Science, Kok J. N., Koronacki J., Lopez de Mantaras R., Matwin S., Mladenic D., Skowron A. (eds), vol 4702. Springer, Berlin, HeidelbergGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computational Bioscience Research Center (CBRC)King Abdullah University of Science and Technology (KAUST)ThuwalKingdom of Saudi Arabia
  2. 2.Department of Computer ScienceUniversity of Hong KongHong Kong SARChina
  3. 3.School of Science and Engineering, Shenzhen Research Institute of Big DataThe Chinese University of Hong KongShenzhenChina

Personalised recommendations