Additive multiple k-mer transcriptome of the keelworm Pomatoceros lamarckii (Annelida; Serpulidae) reveals annelid trochophore transcription factor cassette
- 531 Downloads
Recent advances in both next-generation sequencing and assembly programmes have made the low-cost construction of transcriptome datasets for non-model species feasible, capable of yielding a raft of information even from less well-transcribed genes. Here we present the results of assemblies performed on a 51-bp paired end Illumina dataset derived from a mixed larval sample of the annelid Pomatoceros lamarckii at 24, 48 and 72 h post-fertilization. We used Oases to assemble 36.5 million paired end reads with k-mer sizes from 21 to 29, followed by amalgamation of assemblies, redundancy removal with Vmatch and TGICL and removal of contigs less than 500 bp in length. This resulted in a final assembly of 50,151 contigs, with a mean length of 1,221 bp and covering 61.3 Mbp. A total of 34,846 (69.4 %) of these returned a BlastX hit above a cutoff of 1.0e −3, and 17,967 (35.8 %) were assigned at least one GO annotation using Blast2GO. We used the assembly to identify genes belonging to the homeobox superclass and the Fox, Sox and Tbx classes, recovering 37, 16, four and three genes, respectively. This included orthologues of genes previously unidentified in lophotrochozoans and protostomes. Our study illustrates the utility of such transcriptomic assembly methods as a gene discovery tool and greatly expands our knowledge of transcription factor genes in annelids in general and in this species in particular.
KeywordsTranscriptome Pomatoceros lamarckii Annelid Hox Sox Fox T-box
Sry-related HMG box
We thank the members of the Shimeld and Holland groups for their help and support in preparing this manuscript and two anonymous reviewers for their comments and suggestions. Sequencing was performed by the High-Throughput Genomics unit at the Wellcome Trust Centre for Human Genetics, Oxford. Supercomputing support was provided by the Oxford Supercomputing Center (http://www.oerc.ox.ac.uk/). We thank the Elizabeth Hannah Jenkinson Fund, which funded the sequencing, and the Clarendon Fund, which supported NJK in the course of this project.
- Andrews S (2011) FastQC—a quality control tool for high throughput sequence data. Babraham Bioinformatics. http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
- Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B (Methodol) 57(1):289–300Google Scholar
- Brusca R, Brusca G (2002) Invertebrates, 2nd edn. Sinauer, SunderlandGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotech 29(7):644–652CrossRefGoogle Scholar
- JGI genome website http://genome.jgi-psf.org/
- Kurtz S (2011) The Vmatch large scale sequence analysis software. http://www.vmatch.de/
- Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu A-L, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJM, Hoodless PA, Birol I (2010) De novo assembly and analysis of RNA-seq data. Nat Meth 7(11):909–912CrossRefGoogle Scholar
- Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. doi: 10.1093/bioinformatics/bts094
- Segrove F (1941) The development of the Serpulid Pomatoceros triqueter L. Q J Microsc Sci 82:467–540Google Scholar
- Zerbino DR (2010) Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics Chapter 11:Unit 11 15Google Scholar