Skip to main content
Log in

A Statistical Method for Association Analysis of Cell Type Compositions

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Gene expression data are often collected from tissue samples that are composed of multiple cell types. Studies of cell type composition based on gene expression data from tissue samples have recently attracted increasing research interest and led to new method development for cell type composition estimation. This new information on cell type composition can be associated with individual characteristics (e.g., genetic variants) or clinical outcomes (e.g., survival time). Such association analysis can be conducted for each cell type separately followed by multiple testing correction. An alternative approach is to evaluate this association using the composition of all the cell types, thus aggregating association signals across cell types. A key challenge of this approach is to account for the dependence across cell types. We propose a new method to quantify the distances between cell types while accounting for their dependencies, and use this information for association analysis. We demonstrate our method in two applied examples: to assess the association between immune cell type composition in tumor samples of colorectal cancer patients versus survival time and SNP genotypes. We found immune cell composition has prognostic value, and our distance metric leads to more accurate survival time prediction than other distance metrics that ignore cell type dependencies. In addition, survival time-associated SNPs are enriched among the SNPs associated with immune cell composition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FC, Clarke D, Gu M, Emani P, Yang YT (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science 362(6420):eaat8464

    Article  Google Scholar 

  2. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA (2015) Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 12(5):453 PMCID: PMC4739640

    Article  Google Scholar 

  3. Li B, Severson E, Pignon JC, Zhao H, Li T, Novak J, Jiang P, Shen H, Aster JC, Rodig S et al (2016) Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol 17(1):174

    Article  Google Scholar 

  4. Pearson K (1896) Mathematical contributions to the theory of evolution. iii. regression, heredity, and panmixia, Philosophical Transactions of the Royal Society of London A: Math Phys Eng Sci 187:253

    Google Scholar 

  5. Aitchison J, Egozcue JJ (2005) Compositional data analysis: where are we and where should we be heading? Math Geol 37(7):829

    Article  MathSciNet  Google Scholar 

  6. Aitchison J, Bacon-shone J (1984) Log contrast models for experiments with mixtures. Biometrika 71(2):323

    Article  Google Scholar 

  7. Lin W, Shi P, Feng R, Li H (2014) Variable selection in regression with compositional covariates. Biometrika 101(4):785

    Article  MathSciNet  Google Scholar 

  8. Shi P, Zhang A, Li H (2016) Regression analysis for microbiome compositional data. Ann Appl Stat 10(2):1019

    MathSciNet  MATH  Google Scholar 

  9. Zhao N, Chen J, Carroll IM, Ringel-Kulka T, Epstein MP, Zhou H, Zhou JJ, Ringel Y, Li H, Wu MC (2015) Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test. Am J Hum Genet 96(5):797

    Article  Google Scholar 

  10. Wu C, Chen J, Kim J, Pan W (2016) An adaptive association test for microbiome data. Genome Med 8(1):56

    Article  Google Scholar 

  11. Tang ZZ, Chen G, Alekseyenko AV (2016) PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics 32(17):2618

    Article  Google Scholar 

  12. Tang ZZ, Chen G, Alekseyenko AV, Li H (2016) A general framework for association analysis of microbial communities on a taxonomic tree. Bioinformatics 33(9):1278

    Google Scholar 

  13. Anderson MJ (2001) Permutation tests for univariate or multivariate analysis of variance and regression. Can J Fish Aquat Sci 58(3):626

    Article  Google Scholar 

  14. Pan W (2011) Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet Epidemiol 35(4):211

    Google Scholar 

  15. Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li H (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28(16):2106

    Article  Google Scholar 

  16. Parham P (2014) The immune system (Garland Science)

  17. Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228

    Article  Google Scholar 

  18. Therneau TM, Grambsch PM, Pankratz VS (2003) Penalized survival models and frailty. J Comput Graph Stat 12(1):156

    Article  MathSciNet  Google Scholar 

  19. Therneau TM (2018) coxme: Mixed Effects Cox Models . https://CRAN.R-project.org/package=coxme. R package version 2.2-10

  20. Wang B, Zhu J, Pierson E, Batzoglou S (2017) Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 14(4):414

    Article  Google Scholar 

  21. Hua X, Song L, Yu G, Goedert JJ, Abnet CC, Landi MT, Shi J (2015) Microbiomegwas: a tool for identifying host genetic variants associated with microbiome composition, BioRxiv p. 031187

  22. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat, pp 65–70

  23. Penney K, Banbury BL, Shi Q, Allegra CJ, Alberts SR, Peters U, Yothers G, Sinicrope FA, Sun W, Nair S, Harrison TA, Goldberg RM, Lucas PC, Colangelo LH, Atkins JN, Newcomb PA, Chan AT, Sun W, Nair S, Harrison TA, Goldberg RM, Lucas PC, Colangelo LH, Atkins JN, Newcomb PA, Chan AT (2018) Genome-wide association with survival in stage ii-iii colon cancer clinical trials (ncctg n0147, alliance for clinical trials in oncology; nsabp c-08, nrg oncology)., J Clin Oncol 36(15–suppl):3582. https://doi.org/10.1200/JCO.2018.36.15_suppl.3582

    Article  Google Scholar 

  24. Alberts SR, Sinicrope FA, Grothey A (2005) N0147: a randomized phase iii trial of oxaliplatin plus 5-fluorouracil/leucovorin with or without cetuximab after curative resection of stage iii colon cancer. Clin Colorectal Cancer 5(3):211

    Article  Google Scholar 

  25. Allegra CJ, Yothers G, O’Connell MJ, Sharif S, Petrelli NJ, Colangelo LH, Atkins JN, Seay TE, Fehrenbacher L, Goldberg RM et al (2011) Phase iii trial assessing bevacizumab in stages ii and iii carcinoma of the colon: results of nsabp protocol c-08. J Clin Oncol 29(1):11

    Article  Google Scholar 

  26. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M (2017) Science forum: the human cell atlas. eLife 6:e27041

    Article  Google Scholar 

  27. Rozenblatt-Rosen O, Stubbington MJ, Regev A, Teichmann SA (2017) The human cell atlas: from vision to reality. Nat News 550(7677):451

    Article  Google Scholar 

  28. Yi JS, Cox MA, Zajac AJ (2010) T-cell exhaustion: characteristics, causes and conversion. Immunology 129(4):474

    Article  Google Scholar 

  29. Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA (2018) in Cancer Systems Biology (Springer), pp 243–259

Download references

Acknowledgements

This work is supported in part by NIH grants R01 CA176272, R01 GM105785, R01CA222833, and R21CA224026.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Polly A. Newcomb or Wei Sun.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Electronic supplementary material 1 (PDF 8066 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, L., Little, P., Huyghe, J.R. et al. A Statistical Method for Association Analysis of Cell Type Compositions. Stat Biosci 13, 373–385 (2021). https://doi.org/10.1007/s12561-020-09293-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-020-09293-0

Keywords

Navigation