Statistical and Machine Learning Methods for eQTL Analysis

Chen, Junjie; Nodzak, Conor

doi:10.1007/978-1-0716-0026-9_7

Junjie Chen³ &
Conor Nodzak³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2082))

Abstract

An immense amount of observable diversity exists for all traits and across global populations. In the post-genomic era, equipped with efficient sequencing capabilities and better genotyping methods, we are now able to more fully appreciate how regulation of gene expression is consequential to one’s genotypes in coding and non-coding DNA. The identification of genetic loci that contribute to quantifiable variation in genetic expression is critical in further improving our understanding of the biological regulation of complex traits. Expression quantitative traits loci (eQTLs) mapping studies have provided a powerful suite of techniques for genome wide analysis to detect these regulatory effects. However, a typical eQTL analysis relies on a large number of samples with many genetic variants to achieve robust power and significance for detection. With this in mind, eQTL analysis brings about distinct computational and statistical challenges that require advanced methodological development to overcome. In recent years, many statistical and machine learning methods for eQTL analysis have been developed with the ability to provide a more complex perspective towards the identification of relationships between genetic variation and genetic expression. In this chapter, we provide a comprehensive review of statistical and machine learning methods. We will present various machine learning methods based upon regularization terms and several other statistical analysis methods. Finally, we will discuss prior knowledge integration and hyperparameter optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rockman MV, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7(11):862
Article CAS Google Scholar
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW et al (2010) Common snps explain a large proportion of the heritability for human height. Nat Genet 42(7):565
Article CAS Google Scholar
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184
Article CAS Google Scholar
Cheung VG, Spielman RS (2009) Genetics of human gene expression: mapping DNA variants that influence gene expression. Natl Rev Genet 10(9):595
Article CAS Google Scholar
Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavaré S et al (2005) Genome-wide associations of gene expression variation in humans. PLoS Genet 1(6):e78
Article Google Scholar
Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, De Grassi A, Lee C et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315(5813):848–853
Article CAS Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Article Google Scholar
Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 433–440
Google Scholar
Yuan L, Liu J, Ye J (2011) Efficient methods for overlapping group lasso. In: Advances in neural information processing systems, pp 352–360
Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:10010736
Google Scholar
Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R et al (2001) Linkage disequilibrium in the human genome. Nature 411(6834):199
Article CAS Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108
Article Google Scholar
She Y et al (2010) Sparse regression with exact clustering. Electron J Stat 4:1055–1096
Article Google Scholar
Reid S, Tibshirani R (2016) Sparse regression and marginal testing using cluster prototypes. Biostatistics 17(2):364–376
PubMed Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Samarov DV, Allen D, Hwang J, Lee YJ, Litorja M (2017) A coordinate-descent-based approach to solving the sparse group elastic net. Technometrics 59(4):437–445
Article Google Scholar
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48
Google Scholar
Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 109–117
Google Scholar
Zhang Y, Yang Q (2017) A survey on multi-task learning. arXiv preprint arXiv:170708114
Google Scholar
Negahban S, Wainwright MJ (2008) Joint support recovery under high-dimensional scaling: benefits and perils of ℓ¹, ∞-regularization. In: Proceedings of the 21st international conference on neural information processing systems. Curran Associates, Red Hook, pp 1161–1168
Google Scholar
Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972
Google Scholar
Chen J, Zhou J, Ye J (2011) Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 42–50
Chapter Google Scholar
Kim S, Xing EP et al (2012) Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann Appl Stat 6(3):1095–1117
Article Google Scholar
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359
Article Google Scholar
Kim S, Xing EP (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5(8):e1000587
Article Google Scholar
Chen X, Shi X, Xu X, Wang Z, Mills R, Lee C, Xu J (2012) A two-graph guided multi-task lasso approach for eQTL mapping. In: Artificial intelligence and statistics, pp 208–217
Google Scholar
Lee S, Zhu J, Xing EP (2010) Adaptive multi-task lasso: with application to eQTL detection. In: Advances in neural information processing systems, pp 1306–1314
Google Scholar
Lee S, Xing EP (2012) Leveraging input and output structures for joint mapping of epistatic and marginal eQTLS. Bioinformatics 28(12):i137–i146
Article CAS Google Scholar
Obozinski G, Taskar B, Jordan M (2007) Joint covariate selection for grouped classification. Technical Report, Statistics Department, UC Berkeley
Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
Google Scholar
Varma S, Das S (2018) Deep learning. https://srdas.github.io/DLBook/HyperParameterSelection.html#tuning-hyper-parameters
Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. J Mach Learn Res 28:I-115–I-123
Google Scholar
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Sun M (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:181208434
Google Scholar
You J, Liu B, Ying Z, Pande V, Leskovec J (2018) Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in neural information processing systems, pp 6410–6421
Google Scholar
De Cao N, Kipf T (2018) Molgan: an implicit generative model for small molecular graphs. arXiv preprint arXiv:180511973
Google Scholar
Fout A, Byrd J, Shariat B, Ben-Hur A (2017) Protein interface prediction using graph convolutional networks. In: Advances in neural information processing systems, pp 6530–6539
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, USA
Junjie Chen & Conor Nodzak

Authors

Junjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Conor Nodzak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Temple University, Philadelphia, PA, USA
Xinghua Mindy Shi

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Chen, J., Nodzak, C. (2020). Statistical and Machine Learning Methods for eQTL Analysis. In: Shi, X. (eds) eQTL Analysis. Methods in Molecular Biology, vol 2082. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0026-9_7

Download citation

DOI: https://doi.org/10.1007/978-1-0716-0026-9_7
Published: 18 December 2019
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0025-2
Online ISBN: 978-1-0716-0026-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics