Empirical Bayes Analysis of RNA-seq Data for Detection of Gene Expression Heterosis

  • Jarad NiemiEmail author
  • Eric Mittman
  • Will Landau
  • Dan Nettleton


An important type of heterosis, known as hybrid vigor, refers to the enhancements in the phenotype of hybrid progeny relative to their inbred parents. Although hybrid vigor is extensively utilized in agriculture, its molecular basis is still largely unknown. In an effort to understand phenotypic heterosis at the molecular level, researchers are measuring transcript abundance levels of thousands of genes in parental inbred lines and their hybrid offspring using RNA sequencing (RNA-seq) technology. The resulting data allow researchers to search for evidence of gene expression heterosis as one potential molecular mechanism underlying heterosis of agriculturally important traits. The null hypotheses of greatest interest in testing for gene expression heterosis are composite null hypotheses that are difficult to test with standard statistical approaches for RNA-seq analysis. To address these shortcomings, we develop a hierarchical negative binomial model and draw inferences using a computationally tractable empirical Bayes approach to inference. We demonstrate improvements over alternative methods via a simulation study based on a maize experiment and then analyze that maize experiment with our newly proposed methodology.

Supplementary materials accompanying this paper appear on-line.


Hierarchical model Negative binomial RNA-seq Bayesian LASSO  Parallel computing Hybrid vigor 

Supplementary material

13253_2015_230_MOESM1_ESM.pdf (55 kb)
Supplementary material 1 (pdf 55 KB)


  1. Analytics, R. (2014). doMC: Foreach parallel adaptor for the multicore package. R package version 1.3.3.Google Scholar
  2. Bell, G. D., Kane, N. C., Rieseberg, L. H., and Adams, K. L. (2013). RNA-seq analysis of allele-specific expression, hybrid effects, and regulatory divergence in hybrids compared with their parents from natural populations. Genome biology and evolution 5, 1309–1323.CrossRefGoogle Scholar
  3. Chen, Z. J. (2013). Genomic and epigenetic insights into the molecular bases of heterosis. Nature Reviews Genetics 14, 471–482.CrossRefGoogle Scholar
  4. Darwin, C. (1876). The effects of cross and self fertilisation in the vegetable kingdom. John Murray.Google Scholar
  5. Datta, S. and Nettleton, D. (2014). Statistical Analysis of Next Generation Sequencing Data. Springer.Google Scholar
  6. Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–472.CrossRefGoogle Scholar
  7. Gentleman, R. C., Carey, V. J., Bates, D. M., and others (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5, R80.CrossRefGoogle Scholar
  8. Hallauer, A. and Miranda, F. (1981). Quantitative genetics in maize breeding. Iowa St. Univ. Press, Ames, IA .Google Scholar
  9. Hallauer, A. R., Carena, M. J., and Miranda Filho, J. (2010). Quantitative genetics in maize breeding, volume 6. Springer.Google Scholar
  10. Hans, C. (2009). Bayesian lasso regression. Biometrika 96, 835–845.zbMATHMathSciNetCrossRefGoogle Scholar
  11. Hardcastle, T. J. (2012). baySeq: Empirical Bayesian analysis of patterns of differential expression in count data. R package version 2.0.50.Google Scholar
  12. Hardcastle, T. J. and Kelly, K. A. (2010). baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 422.CrossRefGoogle Scholar
  13. Ji, T., Liu, P., and Nettleton, D. (2014). Estimation and testing of gene expression heterosis. Journal of Agricultural, Biological, and Environmental Statistics 19, 319–337.zbMATHMathSciNetCrossRefGoogle Scholar
  14. Neal, R. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, volume 2, pages 113–162. Chapman & Hall/CRC.Google Scholar
  15. Park, T. and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association 103, 681–686.zbMATHMathSciNetCrossRefGoogle Scholar
  16. Paschold, A., Jia, Y., Marcon, C., Lund, S., Larson, N. B., Yeh, C.-T., Ossowski, S., Lanz, C., Nettleton, D., Schnable, P. S., et al. (2012). Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents. Genome research 22, 2445–2454.CrossRefGoogle Scholar
  17. R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
  18. Robinson, M. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.CrossRefGoogle Scholar
  19. Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–40.CrossRefGoogle Scholar
  20. Robinson, M. D. and Smyth, G. K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, –6.Google Scholar
  21. Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 319–392.zbMATHMathSciNetCrossRefGoogle Scholar
  22. Springer, N. and Stupar, R. (2007). Allelic variation and heterosis in maize: How do two halves make more than a whole? Genome research 17, 264–275.CrossRefGoogle Scholar
  23. Stan Development Team (2014a). RStan: the R interface to Stan, version 2.5.0.Google Scholar
  24. Stan Development Team (2014b). Stan: A C++ library for probability and sampling, version 2.5.0.Google Scholar
  25. Swanson-Wagner, R., Jia, Y., DeCook, R., Borsuk, L., Nettleton, D., and Schnable, P. (2006). All possible modes of gene action are observed in a global comparison of gene expression in a maize f1 hybrid and its inbred parents. Proceedings of the National Academy of Sciences 103, 6805–6810.CrossRefGoogle Scholar
  26. van de Wiel, M. A., Neerincx, M., Buffart, T. E., Sie, D., and Verheul, H. M. (2014). ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs. BMC bioinformatics 15, 116.CrossRefGoogle Scholar
  27. Wei, X. and Wang, X. (2013). A computational workflow to identify allele-specific expression and epigenetic modification in maize. Genomics, proteomics & bioinformatics 11, 247–252.CrossRefGoogle Scholar
  28. Wickham, H. (2011). The split-apply-combine strategy for data analysis. Journal of Statistical Software 40, 1–29.Google Scholar

Copyright information

© International Biometric Society 2015

Authors and Affiliations

  1. 1.Department of StatisticsIowa State UniversityAmesUSA

Personalised recommendations