# Testing for Associations of Opposite Directionality in a Heterogeneous Population

- 52 Downloads

## Abstract

In gene networks, it is possible that the patterns of gene co-expression may exist only in a subset of the sample. In studies of relationships between genotypes and expressions of genes over multiple tissues, there may be associations in some tissues but not in the others. Despite the importance of the problem in genomic applications, it is challenging to identify relationships between two variables when the correlation may only exist in a subset of the sample. The situation becomes even less tractable when there exist two subsets in which correlations are in opposite directions. By ranking subset relationships according to Kendall’s tau, a tau-path can be derived to facilitate the identification of correlated subsets, if such subsets exist. However, the current tau-path methodology only considers the situation in which there is association in a subsample; the more complex scenario depicting the existence of two subsets with opposite directionality of associations was not addressed. Further, existing algorithms for finding tau-paths may be suboptimal given their greedy nature. In this paper, we extend the tau-path methodology to accommodate the situation in which the sample may be drawn from a heterogeneous population composed of subpopulations portraying positive and negative associations. We also propose the use of a cross entropy Monte Carlo procedure to obtain an optimal tau-path, CEMC\(_{tp}\). The algorithm not only can provide simultaneous detection of positive and negative correlations in the same sample, but also can lead to the identification of subsamples that provide evidence for the detected associations. An extensive simulation study shows the aptness of CEMC\(_{tp}\) for detecting associations under various scenarios. Compared with two standard tests for detecting associations, CEMC\(_{tp}\) is seen to be more powerful when there are indeed complex subset associations with well-controlled type-I error rates. We applied CEMC\(_{tp}\) to the NCI-60 gene expression data to illustrate its utility for uncovering network relationships that were missed with standard methods.

## Keywords

Cross entropy Monte Carlo (CEMC) Tau-path Heterogeneous sample Subset associations Gene networks## Notes

### Acknowledgments

The authors would like to thank the two anonymous reviewers for their constructive comments and suggestions. This work was supported in part by the National Science Foundation grants DMS-1220772. Th authors would also like to acknowledge the allocation of computing times from the Ohio Supercomputer Center.

## Supplementary material

## References

- 1.Katz G (2014) How much do we know about HDL cholesterol? Clin Correl http://www.clinicalcorrelations.org/?p=7298
- 2.Voight BF et al (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380:572–580CrossRefPubMedPubMedCentralGoogle Scholar
- 3.Wang YX, Waterman MS, Huang H (2014) Gene coexpression measures in large heterogeneous samples using count statistics. Proc Natl Acad Sci USA 111:16371–16376ADSCrossRefPubMedPubMedCentralGoogle Scholar
- 4.Lonsdale J, Thomas J, Salvatore M, Phillips R et al (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45:580–585CrossRefGoogle Scholar
- 5.Pearson K (1895) Notes on regression and inheritance in the case of two parents. Proc R Soc Lond 58:240242CrossRefGoogle Scholar
- 6.Stigler SM (1989) Francis Galton’s account of the invention of correlation. Stat Sci 4:7379MathSciNetCrossRefzbMATHGoogle Scholar
- 7.Diaconis P, Graham RL (1977) Spearman’s footrule as a measure of disarray. J R Stat Soc Ser B 39:262268MathSciNetzbMATHGoogle Scholar
- 8.Kendall M (1938) A new measure of rank correlation. Biometrica 30:81–89CrossRefzbMATHGoogle Scholar
- 9.Kendall M (1970) Rank correlation methods, 4th edn. Griffin, LondonzbMATHGoogle Scholar
- 10.Yu L (2009) Tau-path test a nonparametric test for testing unspecified subpopulation monotone association. Ph.D. thesis. The Ohio State University, 2009Google Scholar
- 11.Yu L, Verducci JS, Blower PE (2011) The tau-path test for monotone association in an unspecified population: application to chemogenomic data mining. Stat Methodol 8:97–111MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Rubinstein RY, Kroese DP (2004) The cross-entropy method: a unified approach to combinatorial optimization, Monte Carlo simulation, and machine learning. Springer, New YorkCrossRefzbMATHGoogle Scholar
- 13.Liu Z, Lin S, Tan M (2006) Genome-wide tagging SNPs with entropy-based Monte Carlo methods. J Comput Biol 13:1606–1614MathSciNetCrossRefPubMedGoogle Scholar
- 14.Lin S, Ding J (2009) Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics 65:9–18MathSciNetCrossRefPubMedzbMATHGoogle Scholar
- 15.Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6:813–823CrossRefPubMedGoogle Scholar
- 16.Szklarczyk D, Franceschini A, Wyder S et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 2015:D447–452CrossRefGoogle Scholar
- 17.Margolin L (2005) On the convergence of the cross-entropy method. Ann Oper Res 134:201–214MathSciNetCrossRefzbMATHGoogle Scholar
- 18.McLendon R, Friedman A, Bigner D (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455:1061–1068ADSCrossRefGoogle Scholar
- 19.Schimek MG, Budinska E, Ding J, Kugler KG, Svendova V, Lin S (2015) TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists. Stat Appl Genet Mol Biol 14:311–316MathSciNetCrossRefPubMedzbMATHGoogle Scholar
- 20.Cancer Genome Atlas Netwok (2012) Comprehensive molecular portraits of human breast tumors. Nature 490:61–70ADSCrossRefGoogle Scholar