Skip to main content
Log in

Inline index helped in cleaning up data contamination generated during library preparation and the subsequent steps

  • Original Article
  • Published:
Molecular Biology Reports Aims and scope Submit manuscript

Abstract

Background

High-throughput sequencing involves library preparation and amplification steps, which may induce contamination across samples or between samples and the environment.

Methods

We tested the effect of applying an inline-index strategy, in which DNA indices of 6 bp were added to both ends of the inserts at the ligation step of library prep for resolving the data contamination problem.

Results

Our results showed that the contamination ranged from 0.29 to 1.25% in one experiment and from 0.83 to 27.01% in the other. We also found that contamination could be environmental or from reagents besides cross-contamination between samples.

Conclusions

Inline-index method is a useful experimental design to clean up the data and address the contamination problem which has been plaguing high-throughput sequencing data in many applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The raw reads have been uploaded to NCBI under Bioproject Numbers PRJNA749868, PRJNA750567.

Code availability

The custom codes can be found in online version of this article.

References

  1. Eisenhofer R et al (2019) Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends Microbiol 27(2):105–117

    Article  CAS  Google Scholar 

  2. Laurin-Lemay S, Brinkmann H, Philippe H (2012) Origin of land plants revisited in the light of sequence contamination and missing data. Curr Biol 22(15):R593–R594

    Article  CAS  Google Scholar 

  3. Sepulveda AJ et al (2020) The elephant in the lab (and Field): contamination in aquatic environmental DNA studies. Front Ecol Evol. https://doi.org/10.3389/fevo.2020.609973

    Article  Google Scholar 

  4. Ballenghien M, Faivre N, Galtier N (2017) Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions. BMC Biol 15(1):25

    Article  Google Scholar 

  5. Koutsovoulos G et al (2016) No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proc Natl Acad Sci 113(18):5053–5058

    Article  CAS  Google Scholar 

  6. Philippe H et al (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol 9(3):e1000602

    Article  CAS  Google Scholar 

  7. Allio R et al (2020) Whole genome shotgun phylogenomics resolves the pattern and timing of swallowtail butterfly evolution. Syst Biol 69(1):38–60

    Article  CAS  Google Scholar 

  8. Perez-Muñoz ME et al (2017) A critical assessment of the “sterile womb” and “in utero colonization” hypotheses: implications for research on the pioneer infant microbiome. Microbiome 5(1):48

    Article  Google Scholar 

  9. Racimo F, Renaud G, Slatkin M (2016) Joint estimation of contamination, error and demography for nuclear DNA from ancient humans. PLoS Genet 12(4):e1005972

    Article  Google Scholar 

  10. Hänfling B et al (2016) Environmental DNA metabarcoding of lake fish communities reflects long-term data from established survey methods. Mol Ecol 25(13):3101–3119

    Article  Google Scholar 

  11. Sepulveda AJ et al (2019) Adding invasive species biosurveillance to the U.S. Geological Survey streamgage network. Ecosphere 10(8):e02843

    Article  Google Scholar 

  12. Serrao NR, Reid SM, Wilson CC (2018) Establishing detection thresholds for environmental DNA using receiver operator characteristic (ROC) curves. Conserv Genet Resour 10(3):555–562

    Article  Google Scholar 

  13. Glassing A et al (2016) Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathogens 8(1):24

    Article  Google Scholar 

  14. Dickins B et al (2014) Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. Biotechniques 56(3):134–141

    Article  CAS  Google Scholar 

  15. Goldberg CS et al (2016) Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods Ecol Evol 7(11):1299–1307

    Article  Google Scholar 

  16. De Simone G et al (2020) Contaminations in (meta) genome data: an open issue for the scientific community. IUBMB Life 72(4):698–705

    Article  Google Scholar 

  17. Low AJ et al (2019) ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data. PeerJ 7:e6995

    Article  Google Scholar 

  18. Straube N et al (2018) A phylogenomic approach to reconstruct interrelationships of main clupeocephalan lineages with a critical discussion of morphological apomorphies. BMC Evol Biol 18(1):158

    Article  CAS  Google Scholar 

  19. Flickinger M et al (2015) Correcting for sample contamination in genotype calling of DNA sequence data. Am J Hum Genet 97(2):284–290

    Article  CAS  Google Scholar 

  20. Rohland N et al (2015) Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Philos Trans R Soc Lond B Biol Sci 370(1660):20130624

    Article  Google Scholar 

  21. Wang CC et al (2019) Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nat Commun 10(1):590

    Article  CAS  Google Scholar 

  22. Lipson M et al (2018) Population turnover in remote Oceania shortly after initial settlement. Curr Biol 28(7):1157-1165 e7

    Article  CAS  Google Scholar 

  23. Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. https://doi.org/10.1101/pdb.prot5448

    Article  PubMed  Google Scholar 

  24. Lin M, Zhang S, Yao M (2019) Effective detection of environmental DNA from the invasive American bullfrog. Biol Invasions 21(7):2255–2268

    Article  Google Scholar 

  25. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461

    Article  CAS  Google Scholar 

  26. Simion P et al (2018) A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data. BMC Biol 16(1):28

    Article  Google Scholar 

  27. Kircher M, Sawyer S, Meyer M (2012) Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res 40(1):e3

    Article  CAS  Google Scholar 

  28. Peyrégne S, Prüfer K (2020) Present-day DNA contamination in ancient DNA datasets. BioEssays 42(9):e2000081

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to Ms. Lifang Peng for helping with collecting water samples.

Funding

This work was supported by “Science and Technology Commission of Shanghai Municipality (19050501900)” and “Shanghai Academy of Environmental Sciences”.

Author information

Authors and Affiliations

Authors

Contributions

YW and CL conceived the research plan. YW and JH did the experiments and collected data. YW, HY and JH analyzed the data. YW and CL wrote the draft. All authors edited and approved the final version of the manuscript.

Corresponding author

Correspondence to Chenhong Li.

Ethics declarations

Conflict of interest

CL, YW and HY are authors on a patent applied for by Shanghai Ocean University that covers the inline index technology (201811406204.X), and the remaining author (JH) has no financial conflict of interest.

Ethical approval

We followed the guidelines approved by the Ethical Committee of Shanghai Ocean University, China.

Consent to participate

All authors consent to participate.

Consent for publication

All authors consent for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 44 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Yuan, H., Huang, J. et al. Inline index helped in cleaning up data contamination generated during library preparation and the subsequent steps. Mol Biol Rep 49, 385–392 (2022). https://doi.org/10.1007/s11033-021-06884-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11033-021-06884-y

Keywords

Navigation