Skip to main content
Log in

A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference

  • Original Article
  • Published:
Functional & Integrative Genomics Aims and scope Submit manuscript

Abstract

Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

All DiffKAP resources are available at: http://www.appliedbioinformatics.com.au/index.php/DiffKAP

References

Download references

Acknowledgements

Support from the Australian Genome Research Facility (AGRF), the Queensland Cyber Infrastructure Foundation (QCIF) and the Australian Partnership for Advanced Computing (APAC) is gratefully acknowledged.

Funding

The authors received financial support from the Australian Research Council (Projects LP0882095, LP0883462 and DP0985953).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Edwards.

Electronic supplementary material

ESM 1

(XLSX 5761 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan, CK.K., Rosic, N., Lorenc, M.T. et al. A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference. Funct Integr Genomics 19, 363–371 (2019). https://doi.org/10.1007/s10142-018-0647-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10142-018-0647-3

Keywords

Navigation