Skip to main content

Statistical and Machine Learning Methods for eQTL Analysis

  • Protocol
  • First Online:
eQTL Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2082))

Abstract

An immense amount of observable diversity exists for all traits and across global populations. In the post-genomic era, equipped with efficient sequencing capabilities and better genotyping methods, we are now able to more fully appreciate how regulation of gene expression is consequential to one’s genotypes in coding and non-coding DNA. The identification of genetic loci that contribute to quantifiable variation in genetic expression is critical in further improving our understanding of the biological regulation of complex traits. Expression quantitative traits loci (eQTLs) mapping studies have provided a powerful suite of techniques for genome wide analysis to detect these regulatory effects. However, a typical eQTL analysis relies on a large number of samples with many genetic variants to achieve robust power and significance for detection. With this in mind, eQTL analysis brings about distinct computational and statistical challenges that require advanced methodological development to overcome. In recent years, many statistical and machine learning methods for eQTL analysis have been developed with the ability to provide a more complex perspective towards the identification of relationships between genetic variation and genetic expression. In this chapter, we provide a comprehensive review of statistical and machine learning methods. We will present various machine learning methods based upon regularization terms and several other statistical analysis methods. Finally, we will discuss prior knowledge integration and hyperparameter optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rockman MV, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7(11):862

    Article  CAS  Google Scholar 

  2. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW et al (2010) Common snps explain a large proportion of the heritability for human height. Nat Genet 42(7):565

    Article  CAS  Google Scholar 

  3. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184

    Article  CAS  Google Scholar 

  4. Cheung VG, Spielman RS (2009) Genetics of human gene expression: mapping DNA variants that influence gene expression. Natl Rev Genet 10(9):595

    Article  CAS  Google Scholar 

  5. Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavaré S et al (2005) Genome-wide associations of gene expression variation in humans. PLoS Genet 1(6):e78

    Article  Google Scholar 

  6. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, De Grassi A, Lee C et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315(5813):848–853

    Article  CAS  Google Scholar 

  7. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288

    Google Scholar 

  8. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  Google Scholar 

  9. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67

    Article  Google Scholar 

  10. Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 433–440

    Google Scholar 

  11. Yuan L, Liu J, Ye J (2011) Efficient methods for overlapping group lasso. In: Advances in neural information processing systems, pp 352–360

    Google Scholar 

  12. Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245

    Article  Google Scholar 

  13. Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:10010736

    Google Scholar 

  14. Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R et al (2001) Linkage disequilibrium in the human genome. Nature 411(6834):199

    Article  CAS  Google Scholar 

  15. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108

    Article  Google Scholar 

  16. She Y et al (2010) Sparse regression with exact clustering. Electron J Stat 4:1055–1096

    Article  Google Scholar 

  17. Reid S, Tibshirani R (2016) Sparse regression and marginal testing using cluster prototypes. Biostatistics 17(2):364–376

    PubMed  Google Scholar 

  18. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  19. Samarov DV, Allen D, Hwang J, Lee YJ, Litorja M (2017) A coordinate-descent-based approach to solving the sparse group elastic net. Technometrics 59(4):437–445

    Article  Google Scholar 

  20. Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48

    Google Scholar 

  21. Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 109–117

    Google Scholar 

  22. Zhang Y, Yang Q (2017) A survey on multi-task learning. arXiv preprint arXiv:170708114

    Google Scholar 

  23. Negahban S, Wainwright MJ (2008) Joint support recovery under high-dimensional scaling: benefits and perils of 1, -regularization. In: Proceedings of the 21st international conference on neural information processing systems. Curran Associates, Red Hook, pp 1161–1168

    Google Scholar 

  24. Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972

    Google Scholar 

  25. Chen J, Zhou J, Ye J (2011) Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 42–50

    Chapter  Google Scholar 

  26. Kim S, Xing EP et al (2012) Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann Appl Stat 6(3):1095–1117

    Article  Google Scholar 

  27. Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359

    Article  Google Scholar 

  28. Kim S, Xing EP (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5(8):e1000587

    Article  Google Scholar 

  29. Chen X, Shi X, Xu X, Wang Z, Mills R, Lee C, Xu J (2012) A two-graph guided multi-task lasso approach for eQTL mapping. In: Artificial intelligence and statistics, pp 208–217

    Google Scholar 

  30. Lee S, Zhu J, Xing EP (2010) Adaptive multi-task lasso: with application to eQTL detection. In: Advances in neural information processing systems, pp 1306–1314

    Google Scholar 

  31. Lee S, Xing EP (2012) Leveraging input and output structures for joint mapping of epistatic and marginal eQTLS. Bioinformatics 28(12):i137–i146

    Article  CAS  Google Scholar 

  32. Obozinski G, Taskar B, Jordan M (2007) Joint covariate selection for grouped classification. Technical Report, Statistics Department, UC Berkeley

    Google Scholar 

  33. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    Google Scholar 

  34. Varma S, Das S (2018) Deep learning. https://srdas.github.io/DLBook/HyperParameterSelection.html#tuning-hyper-parameters

  35. Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. J Mach Learn Res 28:I-115–I-123

    Google Scholar 

  36. Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Sun M (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:181208434

    Google Scholar 

  37. You J, Liu B, Ying Z, Pande V, Leskovec J (2018) Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in neural information processing systems, pp 6410–6421

    Google Scholar 

  38. De Cao N, Kipf T (2018) Molgan: an implicit generative model for small molecular graphs. arXiv preprint arXiv:180511973

    Google Scholar 

  39. Fout A, Byrd J, Shariat B, Ben-Hur A (2017) Protein interface prediction using graph convolutional networks. In: Advances in neural information processing systems, pp 6530–6539

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Chen, J., Nodzak, C. (2020). Statistical and Machine Learning Methods for eQTL Analysis. In: Shi, X. (eds) eQTL Analysis. Methods in Molecular Biology, vol 2082. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0026-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0026-9_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0025-2

  • Online ISBN: 978-1-0716-0026-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics