Skip to main content

Marsupial Sequencing Projects and Bioinformatics Challenges

  • Chapter
  • First Online:
Marsupial Genetics and Genomics

Abstract

The arrival of next or second generation sequencing has ushered in a new era of marsupial genomics, where large-scale sequencing of marsupial transcriptomes, and soon perhaps genomes, is within the scope of many independent laboratories. This promises to reveal much about the biology of marsupial genomes and provides opportunities for comparison with eutherian genomes. These comparisons will highlight both the conserved features that are critical, as well as important differences where marsupials and eutherians have chosen different evolutionary paths. Here we describe the current state of marsupial genomic sequencing projects and resources, including available genome and transcriptome sequences. We also survey a number of useful bioinformatics tools, particularly those that we have utilized on marsupial, or sometimes monotreme, genomic data and found useful. Finally, some of the challenges met in dealing with, largely next generation, marsupial sequence are described – experience that we think is also relevant to other non-model organisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Sanger-sequencing is capillary-based DNA sequencing that producing long (600–1,000 nt), high quality reads.

  2. 2.

    The Roche 454 is a next generation sequencing platform that produces hundreds of thousands of 200–600 nt long reads.

  3. 3.

    A transcriptome is the set of all transcripts or expressed genes in a tissue.

  4. 4.

    The Illumina GA2 is a next generation sequencing platform that produces millions of short reads (32–100 nt).

  5. 5.

    RNA-seq is expression analysis based on sequencing transcripts and counting reads mapping to each gene.

  6. 6.

    A flowgram is the signal intensity data from a Roche 454 sequencer. It is analogous to the chromatogram in Sanger sequencing. The signal intensity is proportional to the number of the base of the same type added in each sequencing step.

  7. 7.

    NCBI Reference Sequence collection (http://www.ncbi.nlm.nih.gov/refseq).

  8. 8.

    The E-value is the number of hits expected by chance when searching a database of a particular size. Note that you should use proper scientific notation to write E-values in publications, not the common, but dreadful computer shorthand (e.g. 1e-5).

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410.

    PubMed  CAS  Google Scholar 

  • Baker ML, Indiviglio S, Nyberg AM, et al. (2007) Analysis of a set of Australian northern brown bandicoot expressed sequence tags with comparison to the genome sequence of the South American grey short tailed opossum. BMC Genomics 8:50.

    Article  PubMed  Google Scholar 

  • Belov K, Deakin JE, Papenfuss AT, et al. (2006) Reconstructing an ancestral mammalian immune supercomplex from a marsupial major histocompatibility complex. PLoS Biol 4:e46.

    Article  PubMed  Google Scholar 

  • Belov K, Sanderson CE, Deakin JE, et al. (2007) Characterization of the opossum immune genome provides insights into the evolution of the mammalian immune system. Genome Res 17:982–991.

    Article  PubMed  CAS  Google Scholar 

  • Chou H-H, Holmes MH (2001) DNA sequence quality trimming and vector removal. Bioinformatics 17:1093–1104.

    Article  PubMed  CAS  Google Scholar 

  • Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763.

    Article  PubMed  CAS  Google Scholar 

  • Finn RD, Tate J, Mistry J, et al. (2008) The Pfam protein families database. Nucleic Acids Re 36:D281–D288.

    Article  CAS  Google Scholar 

  • Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313:903–919.

    Article  PubMed  CAS  Google Scholar 

  • Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877.

    Article  PubMed  CAS  Google Scholar 

  • Hubbard TJ, Aken BL, Beal K, et al. (2007) Ensembl 2007. Nucleic Acids Res 35:D610–D617.

    Article  PubMed  CAS  Google Scholar 

  • Kent WJ (2002) BLAT–the BLAST-like alignment tool. Genome Res 12:656–664.

    PubMed  CAS  Google Scholar 

  • Korf I, Yandell M, Bedell J (2003) BLAST. O’Reilly and Associates, Sebastapol.

    Google Scholar 

  • Kullberg M, Hallström B, Arnason U, Janke A (2007) Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution. PLoS One 2:e775.

    Article  PubMed  Google Scholar 

  • Lachish S, Jones M, McCallum H (2007) The impact of disease on the survival and population growth rate of the Tasmanian devil. J Anim Ecol 76:926–936.

    Article  PubMed  Google Scholar 

  • Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.

    Article  PubMed  Google Scholar 

  • Lefevre CM, Digby MR, Whitley JC, Strahm Y, Nicholas KR (2007) Lactation transcriptomics in the Australian marsupial, Macropus eugenii: transcript sequencing and quantification BMC Genomics 8:417.

    Article  PubMed  Google Scholar 

  • Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714.

    Article  PubMed  CAS  Google Scholar 

  • McCallum H, Tompkins D, Jones M, et al. (2007) Distribution and impacts of Tasmanian Devil Facial Tumour Disease. EcoHealth 4:318–325.

    Article  Google Scholar 

  • Mikkelsen TS, Wakefield MJ, Aken B, et al. (2007) Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447:167–177.

    Article  PubMed  CAS  Google Scholar 

  • Murchison EP, Tovar C, Hsu A, et al. (2010). The Tasmanian devil transcriptome reveals Schwann cell origins of a clonally transmissible cancer. Science 327(5961):84–87.

    Google Scholar 

  • Pearse AM, Swift K (2006) Allograft theory: transmission of devil facial-tumour disease. Nature 439:549.

    Article  PubMed  CAS  Google Scholar 

  • Pertea G (2009) http://compbio.dfci.harvard.edu/tgi/software, Retrieved 20 August 2009, from Dana Farber-Cancer Institute Software Tools.

  • Pyecroft S, Pearse A, Loh R, et al. (2007) Towards a case definition for devil facial tumour disease: what is it? EcoHealth 4:346–351.

    Article  Google Scholar 

  • Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, Brudno M (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5:e1000386.

    Article  PubMed  Google Scholar 

  • Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I (2009) ABySS: a parallel assembler for short read sequence. Genome Res 19(6):1117–1123.

    Article  PubMed  CAS  Google Scholar 

  • Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31.

    Article  PubMed  Google Scholar 

  • Wong ES, Young LJ, Papenfuss AT, Belov K (2006) In silico identification of opossum cytokine genes suggests the complexity of the marsupial immune system rivals that of eutherian mammals. Immunome Res 2:4.

    Article  PubMed  Google Scholar 

  • Yeh RF, Lim LP, Burge CB (2001) In silico identification of opossum cytokine genes suggests the complexity of the marsupial immune system rivals that of eutherian mammals. Genome Res 11:803–816.

    Article  PubMed  CAS  Google Scholar 

  • Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony T. Papenfuss .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Papenfuss, A.T., Hsu, A., Wakefield, M. (2010). Marsupial Sequencing Projects and Bioinformatics Challenges. In: Deakin, J., Waters, P., Marshall Graves, J. (eds) Marsupial Genetics and Genomics. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9023-2_6

Download citation

Publish with us

Policies and ethics