Skip to main content

MINED: An Efficient Mutual Information Based Epistasis Detection Method to Improve Quantitative Genetic Trait Prediction

  • Conference paper
Book cover Bioinformatics Research and Applications (ISBRA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9096))

Included in the following conference series:

Abstract

Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a great deal of attention, as it is very relevant to plant and animal breeding. More effective breeding strategies can be developed based on a more accurate prediction. Most of the existing work considers an additive model on single markers, or genotypes only. In this work, we studied the problem of epistasis detection for genetic trait prediction, where different alleles, or genes, can interact with each other. We have developed a novel method MINED to detect significant pairwise epistasis effects that contribute most to prediction performance. A dynamic thresholding and a sampling strategy allow very efficient detection, and it is generally 20 to 30 times faster than an exhaustive search. In our experiments on real plant data sets, MINED is able to capture the pairwise epistasis effects that improve the prediction. We show it achieves better prediction accuracy than the state-of-the-art methods. To our knowledge, MINED is the first algorithm to detect epistasis in the genetic trait prediction problem. We further proposed a constrained version of MINED that converts the epistasis detection problem into a Weighted Maximum Independent Set problem. We show that Constrained-MINED is able to improve the prediction accuracy even more.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bien, J., Taylor, J., Tibshirani, R., et al.: A lasso for hierarchical interactions. The Annals of Statistics 41(3), 1111–1141 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  2. Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum weight independent set. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1273–1280. IEEE (2011)

    Google Scholar 

  3. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1998)

    Article  MathSciNet  Google Scholar 

  4. Cleveland, M.A., Hickey, J.M., Forni, S.: A common dataset for genomic analysis of livestock populations. G3: Genes— Genomes— Genetics 2(4), 429–435 (2012)

    Article  Google Scholar 

  5. Cook, N.R., Zee, R.Y.L., Ridker, P.M.: Tree and spline based association analysis of gene–gene interaction models for ischemic stroke. Statistics in Medicine 23(9), 1439–1453 (2004)

    Article  Google Scholar 

  6. Fang, G., Haznadar, M., Wang, W., Yu, H., Steinbach, M., Church, T.R., Oetting, W.S., Van Ness, B., Kumar, V.: High-order snp combinations associated with complex diseases: efficient discovery, statistical power and functional interactions. PloS One 7(4), e33531 (2012)

    Google Scholar 

  7. He, D., Rish, I., Haws, D., Teyssedre, S., Karaman, Z., Parida, L.: Mint: Mutual information based transductive feature selection for genetic trait prediction. arXiv preprint arXiv:1310.1659 (2013)

    Google Scholar 

  8. Kilpatrick, J.R.: Methods for detecting multi-locus genotype-phenotype association. PhD thesis, Rice University (2009)

    Google Scholar 

  9. Kizilkaya, K., Fernando, R.L., Garrick, D.J.: Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes. Journal of Animal Science 88(2), 544–551 (2010)

    Article  Google Scholar 

  10. Legarra, A., Robert-Granié, C., Croiseau, P., Guillaume, F., Fritz, S., et al.: Improved lasso for genomic selection. Genetics Research 93(1), 77 (2011)

    Article  Google Scholar 

  11. Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics 37(4), 413–417 (2005)

    Article  Google Scholar 

  12. Meuwissen, T.H.E., Hayes, B.J., Goddard, M.E.: Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001)

    Google Scholar 

  13. Park, T., Casella, G.: The bayesian lasso. Journal of the American Statistical Association 103, 681–686 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. Pattin, K.A., White, B.C., Barney, N., Gui, J., Nelson, H.H., Kelsey, K.T., Andrew, A.S., Karagas, M.R., Moore, J.H.: A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genetic Epidemiology 33(1), 87–94 (2009)

    Article  Google Scholar 

  15. Resende, M.F.R., Muñoz, P., Resende, M.D.V., Garrick, D.J., Fernando, R.L., Davis, J.M., Jokela, E.J., Martin, T.A., Peter, G.F., Kirst, M.: Accuracy of genomic selection methods in a standard data set of loblolly pine (pinus taeda l.). Genetics 190(4), 1503–1510 (2012)

    Article  Google Scholar 

  16. Rincent, R., Laloë, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., Rodriguez, V.M.: Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (zea mays l.). Genetics 192(2), 715–728 (2012)

    Article  Google Scholar 

  17. Sakai, S., Togasaki, M., Yamazaki, K.: A note on greedy algorithms for the maximum weighted independent set problem. Discrete Applied Mathematics 126(2), 313–322 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  18. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)

    MathSciNet  Google Scholar 

  19. Valiente, G.: A new simple algorithm for the maximum-weight independent set problem on circle graphs. In: Ibaraki, T., Katoh, N., Ono, H. (eds.) ISAAC 2003. LNCS, vol. 2906, pp. 129–137. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  20. Wei, W., Hemani, G., Hicks, A.A., Vitart, V., Cabrera-Cardenas, C., Navarro, P., Huffman, J., Hayward, C., Knott, S.A., Rudan, I., et al.: Characterisation of genome-wide association epistasis signals for serum uric acid in human population isolates. PloS One 6(8), e23836 (2011)

    Google Scholar 

  21. Whittaker, J.C., Thompson, R., Denham, M.C.: Marker-assisted selection using ridge regression. Genet. Res. 75, 249–252 (2000)

    Article  Google Scholar 

  22. Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: Snpharvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25(4), 504–511 (2009)

    Article  Google Scholar 

  23. Zhang, X., Huang, S., Zou, F., Wang, W.: Team: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26(12), i217–i227 (2010)

    Google Scholar 

  24. Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case-control studies. Nature Genetics 39(9), 1167–1173 (2007)

    Article  Google Scholar 

  25. Zhao, K., Tung, C.-W., Eizenga, G.C., Wright, M.H., Ali, L., Price, A.H., Norton, G.J., Islam, M.R., Reynolds, A., Mezey, J., et al.: Genome-wide association mapping reveals a rich genetic architecture of complex traits in oryza sativa. Nature Communications 2, 467 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

He, D., Wang, Z., Parada, L. (2015). MINED: An Efficient Mutual Information Based Epistasis Detection Method to Improve Quantitative Genetic Trait Prediction. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19048-8_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19047-1

  • Online ISBN: 978-3-319-19048-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics