Skip to main content
Log in

Sequencing barcode construction and identification methods based on block error-correction codes

  • Research Paper
  • Published:
Science China Life Sciences Aims and scope Submit manuscript

Abstract

Multiplexed sequencing relies on specific sample labels, the barcodes, to tag DNA fragments belonging to different samples and to separate the output of the sequencers. However, the barcodes are often corrupted by insertion, deletion and substitution errors introduced during sequencing, which may lead to sample misassignment. In this paper, we propose a barcode construction method, which combines a block error-correction code with a predetermined pseudorandom sequence to generate a base sequence for labeling different samples. Furthermore, to identify the corrupted barcodes for assigning reads to their respective samples, we present a soft decision identification method that consists of inner decoding and outer decoding. The inner decoder establishes the hidden Markov model (HMM) for base insertion/deletion estimation with the pseudorandom sequence, and adapts the forward-backward (FB) algorithm to output the soft information of each bit in the block code. The outer decoder performs soft decision decoding using the soft information to effectively correct multiple errors in the barcodes. Simulation results show that the proposed methods are highly robust to high error rates of insertions, deletions and substitutions in the barcodes. In addition, compared with the inner decoding algorithm of the barcodes based on watermarks, the proposed inner decoding algorithm can greatly reduce the decoding complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ardui, S., Ameur, A., Vermeesch, J.R., and Hestand, M.S. (2018). Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res 46, 2159–2168.

    Article  CAS  Google Scholar 

  • Barnault, L., and Declercq, D. (2003). Fast decoding algorithm for LDPC over GF(2q). In Proceedings of 2003 IEEE Information Theory Workshop, Paris, France, pp. 70–73.

  • Buschmann, T., and Bystrykh, L.V. (2013). Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics 14, 272.

    Article  Google Scholar 

  • Buschmann, T., Zhang, R., Brash, D.E., and Bystrykh, L.V. (2014). Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate. BMC Bioinformatics 15, 264.

    Article  Google Scholar 

  • Cao, Y., Li, J., Chu, X., Liu, H., Liu, W., and Liu, D. (2019). Nanopore sequencing: a rapid solution for infectious disease epidemics. Sci China Life Sci 62, 1101–1103.

    Article  Google Scholar 

  • Chase, D. (1972). Class of algorithms for decoding block codes with channel measurement information. IEEE Trans Inform Theory 18, 170–182.

    Article  Google Scholar 

  • Craig, D.W., Pearson, J.V., Szelinger, S., Sekar, A., Redman, M., Corneveaux, J.J., Pawlowski, T.L., Laub, T., Nunn, G., Stephan, D. A., et al. (2008). Identification of genetic variants using bar-coded multiplexed sequencing. Nat Methods 5, 887–893.

    Article  CAS  Google Scholar 

  • Cretu Stancu, M., van Roosmalen, M.J., Renkens, I., Nieboer, M.M., Middelkamp, S., de Ligt, J., Pregno, G., Giachino, D., Mandrile, G., Espejo Valle-Inclan, J., et al. (2017). Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun 8, 1326.

    Article  Google Scholar 

  • Davey, M.C., and Mackay, D.J.C. (2001). Reliable communication over channels with insertions, deletions, and substitutions. IEEE Trans Inform Theory 47, 687–698.

    Article  Google Scholar 

  • Eisenstein, M. (2019). Playing a long game. Nat Methods 16, 683–686.

    Article  CAS  Google Scholar 

  • Ezpeleta, J., Krsticevic, F.J., Bulacio, P., and Tapia, E. (2017). Designing robust watermark barcodes for multiplex long-read sequencing. Bioinformatics 33, 807–813.

    CAS  PubMed  Google Scholar 

  • Faircloth, B.C., and Glenn, T.C. (2012). Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLoS ONE 7, e42543.

    Article  CAS  Google Scholar 

  • Hamady, M., Walker, J.J., Harris, J.K., Gold, N.J., and Knight, R. (2008). Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods 5, 235–237.

    Article  CAS  Google Scholar 

  • Hardwick, S.A., Deveson, I.W., and Mercer, T.R. (2017). Reference standards for next-generation sequencing. Nat Rev Genet 18, 473–484.

    Article  CAS  Google Scholar 

  • Haughton, D., and Balado, F. (2013). A modified watermark synchronization code for robust embedding of data in DNA. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 1148–1152.

  • Hawkins, J.A., Jones Jr., S.K., Finkelstein, I.J., and Press, W.H. (2018). Indel-correcting DNA barcodes for high-throughput sequencing. Proc Natl Acad Sci USA 115, E6217–E6226.

    Article  CAS  Google Scholar 

  • Jain, M., Fiddes, I.T., Miga, K.H., Olsen, H.E., Paten, B., and Akeson, M. (2015). Improved data analysis for the MinION nanopore sequencer. Nat Methods 12, 351–356.

    Article  CAS  Google Scholar 

  • Jin, Y., Chen, G., Xiao, W., Hong, H., Xu, J., Guo, Y., Xiao, W., Shi, T., Shi, L., Tong, W., et al. (2019). Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine. Sci China Life Sci 62, 895–904.

    Article  CAS  Google Scholar 

  • Kracht, D., and Schober, S. (2008). Insertion and deletion correcting DNA barcodes based on watermarks. BMC Bioinformatics 16, 50.

    Article  Google Scholar 

  • Li, Z., Zhou, C., Tan, L., Chen, P., Cao, Y., Li, X., Yan, J., Zeng, H., Wang, D.W., and Wang, D.W. (2018). A targeted sequencing approach to find novel pathogenic genes associated with sporadic aortic dissection. Sci China Life Sci 61, 1545–1553.

    Article  CAS  Google Scholar 

  • Lin, S., and Costello, D.J. (2004). Error Control Coding, 2nd ed. (New York: Prentice Hall), pp. 194–231.

    Google Scholar 

  • Liu, Q., Wang, C., Jiao, X., Zhang, H., Song, L., Li, Y., Gao, C., and Wang, K. (2019). Hi-TOM: a platform for high-throughput tracking of mutations induced by CRISPR/Cas systems. Sci China Life Sci 62, 1–7.

    Article  Google Scholar 

  • Liu, Y., and Chen, W. (2016). Hard-decision iterative decoder for the Davey-MacKay construction with symbol-level inner decoder. Electron Lett 52, 1026–1028.

    Article  Google Scholar 

  • Liu, Y., and Chen, W. (2017). Decoding on adaptively pruned trellis for correcting synchronization errors. China Commun 14, 1–9.

    Article  Google Scholar 

  • Liu, Y., and Chen, W. (2018). An iterative decoding scheme for Davey-MacKay construction. China Commun 15, 187–195.

    Article  Google Scholar 

  • Lyons, E., Sheridan, P., Tremmel, G., Miyano, S., and Sugano, S. (2017). Large-scale DNA barcode library generation for biomolecule identification in high-throughput screens. Sci Rep 7, 13899.

    Article  Google Scholar 

  • Nguyen, P., Ma, J., Pei, D., Obert, C., Cheng, C., and Geiger, T.L. (2011). Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire. BMC Genomics 12, 106.

    Article  CAS  Google Scholar 

  • Qiu, F., Guo, L., Wen, T.J., Liu, F., Ashlock, D.A., and Schnable, P.S. (2003). DNA sequence-based “bar codes” for tracking the origins of expressed sequence tags from a maize cDNA library constructed using multiple mRNA sources. Plant Physiol 133, 475–481.

    Article  CAS  Google Scholar 

  • Rang, F.J., Kloosterman, W.P., and de Ridder, J. (2018). From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol 19, 90.

    Article  Google Scholar 

  • Tendolkar, N., and Hartmann, C. (1984). Generalization of chase algorithms for soft decision decoding of binary linear codes. IEEE Trans Inform Theory 30, 714–721.

    Article  Google Scholar 

  • White III, R.A., Callister, S.J., Moore, R.J., Baker, E.S., and Jansson, J.K. (2016). The past, present and future of microbiome analyses. Nat Protoc 11, 2049–2053.

    Article  CAS  Google Scholar 

  • Yazdani, R., and Ardakani, M. (2012). Reliable communication over non-binary insertion/deletion channels. IEEE Trans Commun 60, 3597–3608.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (61671324) and Seed Foundation of Tianjin University (2019XZY-0038, 2019XYF-0005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weigang Chen.

Additional information

Compliance and ethics

The authors declare that they have a Chinese patent.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Wang, L., Han, M. et al. Sequencing barcode construction and identification methods based on block error-correction codes. Sci. China Life Sci. 63, 1580–1592 (2020). https://doi.org/10.1007/s11427-019-1651-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11427-019-1651-3

Navigation