Skip to main content
Log in

Low-complexity and highly robust barcodes for error-rich single molecular sequencing

  • Original Article
  • Published:
3 Biotech Aims and scope Submit manuscript

Abstract

DNA barcodes are frequently corrupted due to insertion, deletion, and substitution errors during DNA synthesis, amplification and sequencing, resulting in index hopping. In this paper, we propose a new DNA barcode construction scheme that combines a cyclic block code with a predetermined pseudo-random sequence bit by bit to form bit pairs, and then converts the bit pairs to bases, i.e., the DNA barcodes. Then, we present a barcode identification scheme for noisy sequencing reads, which uses a combination of cyclic shifting and traditional dynamic programming to mark the insertion and deletion positions, and then performs erasure-and-error-correction decoding on the corrupted codewords. Furthermore, we verify the identification error rate of barcodes for multiple errors and evaluate the reliability of the barcodes in DNA context. This method can be easily generalized for constructing long barcodes, which may be used in scenarios with serious errors. Simulation results show that the bit error rate after identifying insertions/deletions is greatly reduced using the combination of cyclic shift and dynamic programming compared to using dynamic programming only. It indicates that the proposed method can effectively improve the accuracy for estimating insertion/deletion errors. And the overall identification error rate of the proposed method is lower than \(10^{ - 5}\) when the probability of each base mutation is less than 0.1, which is the typical scenario in third-generation sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

Download references

Acknowledgements

We thank the National Natural Science Foundation of China (61671324) and Seed Foundation of Tianjin University (2019XZY-0038, 2019XYF-0005).

Author information

Authors and Affiliations

Authors

Contributions

W.C. designed the study. W.C., P.W., L.W., D.Z., and M.H. performed bioinformatic analyses. P.W. and L.W. performed the simulations, and wrote the manuscript. L.W. and M.H. validated the results. W.C., D.Z., M.H., and L.S. supervised the results, and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Weigang Chen.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Wang, P., Wang, L. et al. Low-complexity and highly robust barcodes for error-rich single molecular sequencing. 3 Biotech 11, 78 (2021). https://doi.org/10.1007/s13205-020-02607-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13205-020-02607-5

Keywords

Navigation