Abstract
Gene expression data are often collected from tissue samples that are composed of multiple cell types. Studies of cell type composition based on gene expression data from tissue samples have recently attracted increasing research interest and led to new method development for cell type composition estimation. This new information on cell type composition can be associated with individual characteristics (e.g., genetic variants) or clinical outcomes (e.g., survival time). Such association analysis can be conducted for each cell type separately followed by multiple testing correction. An alternative approach is to evaluate this association using the composition of all the cell types, thus aggregating association signals across cell types. A key challenge of this approach is to account for the dependence across cell types. We propose a new method to quantify the distances between cell types while accounting for their dependencies, and use this information for association analysis. We demonstrate our method in two applied examples: to assess the association between immune cell type composition in tumor samples of colorectal cancer patients versus survival time and SNP genotypes. We found immune cell composition has prognostic value, and our distance metric leads to more accurate survival time prediction than other distance metrics that ignore cell type dependencies. In addition, survival time-associated SNPs are enriched among the SNPs associated with immune cell composition.
Similar content being viewed by others
References
Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FC, Clarke D, Gu M, Emani P, Yang YT (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science 362(6420):eaat8464
Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA (2015) Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 12(5):453 PMCID: PMC4739640
Li B, Severson E, Pignon JC, Zhao H, Li T, Novak J, Jiang P, Shen H, Aster JC, Rodig S et al (2016) Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol 17(1):174
Pearson K (1896) Mathematical contributions to the theory of evolution. iii. regression, heredity, and panmixia, Philosophical Transactions of the Royal Society of London A: Math Phys Eng Sci 187:253
Aitchison J, Egozcue JJ (2005) Compositional data analysis: where are we and where should we be heading? Math Geol 37(7):829
Aitchison J, Bacon-shone J (1984) Log contrast models for experiments with mixtures. Biometrika 71(2):323
Lin W, Shi P, Feng R, Li H (2014) Variable selection in regression with compositional covariates. Biometrika 101(4):785
Shi P, Zhang A, Li H (2016) Regression analysis for microbiome compositional data. Ann Appl Stat 10(2):1019
Zhao N, Chen J, Carroll IM, Ringel-Kulka T, Epstein MP, Zhou H, Zhou JJ, Ringel Y, Li H, Wu MC (2015) Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test. Am J Hum Genet 96(5):797
Wu C, Chen J, Kim J, Pan W (2016) An adaptive association test for microbiome data. Genome Med 8(1):56
Tang ZZ, Chen G, Alekseyenko AV (2016) PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics 32(17):2618
Tang ZZ, Chen G, Alekseyenko AV, Li H (2016) A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33(9):1278
Anderson MJ (2001) Permutation tests for univariate or multivariate analysis of variance and regression. Can J Fish Aquat Sci 58(3):626
Pan W (2011) Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet Epidemiol 35(4):211
Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li H (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28(16):2106
Parham P (2014) The immune system (Garland Science)
Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228
Therneau TM, Grambsch PM, Pankratz VS (2003) Penalized survival models and frailty. J Comput Graph Stat 12(1):156
Therneau TM (2018) coxme: Mixed Effects Cox Models . https://CRAN.R-project.org/package=coxme. R package version 2.2-10
Wang B, Zhu J, Pierson E, Batzoglou S (2017) Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 14(4):414
Hua X, Song L, Yu G, Goedert JJ, Abnet CC, Landi MT, Shi J (2015) Microbiomegwas: a tool for identifying host genetic variants associated with microbiome composition, BioRxiv p. 031187
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat, pp 65–70
Penney K, Banbury BL, Shi Q, Allegra CJ, Alberts SR, Peters U, Yothers G, Sinicrope FA, Sun W, Nair S, Harrison TA, Goldberg RM, Lucas PC, Colangelo LH, Atkins JN, Newcomb PA, Chan AT, Sun W, Nair S, Harrison TA, Goldberg RM, Lucas PC, Colangelo LH, Atkins JN, Newcomb PA, Chan AT (2018) Genome-wide association with survival in stage ii-iii colon cancer clinical trials (ncctg n0147, alliance for clinical trials in oncology; nsabp c-08, nrg oncology)., J Clin Oncol 36(15–suppl):3582. https://doi.org/10.1200/JCO.2018.36.15_suppl.3582
Alberts SR, Sinicrope FA, Grothey A (2005) N0147: a randomized phase iii trial of oxaliplatin plus 5-fluorouracil/leucovorin with or without cetuximab after curative resection of stage iii colon cancer. Clin Colorectal Cancer 5(3):211
Allegra CJ, Yothers G, O’Connell MJ, Sharif S, Petrelli NJ, Colangelo LH, Atkins JN, Seay TE, Fehrenbacher L, Goldberg RM et al (2011) Phase iii trial assessing bevacizumab in stages ii and iii carcinoma of the colon: results of nsabp protocol c-08. J Clin Oncol 29(1):11
Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M (2017) Science forum: the human cell atlas. eLife 6:e27041
Rozenblatt-Rosen O, Stubbington MJ, Regev A, Teichmann SA (2017) The human cell atlas: from vision to reality. Nat News 550(7677):451
Yi JS, Cox MA, Zajac AJ (2010) T-cell exhaustion: characteristics, causes and conversion. Immunology 129(4):474
Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA (2018) in Cancer Systems Biology (Springer), pp 243–259
Acknowledgements
This work is supported in part by NIH grants R01 CA176272, R01 GM105785, R01CA222833, and R21CA224026.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Huang, L., Little, P., Huyghe, J.R. et al. A Statistical Method for Association Analysis of Cell Type Compositions. Stat Biosci 13, 373–385 (2021). https://doi.org/10.1007/s12561-020-09293-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-020-09293-0