An integrative U method for joint analysis of multi-level omic data
- 503 Downloads
Abstract
Background
The advance of high-throughput technologies has made it cost-effective to collect diverse types of omic data in large-scale clinical and biological studies. While the collection of the vast amounts of multi-level omic data from these studies provides a great opportunity for genetic research, the high dimensionality of omic data and complex relationships among multi-level omic data bring tremendous analytic challenges.
Results
To address these challenges, we develop an integrative U (IU) method for the design and analysis of multi-level omic data. While non-parametric methods make less model assumptions and are flexible for analyzing different types of phenotypes and omic data, they have been less developed for association analysis of omic data. The IU method is a nonparametric method that can accommodate various types of omic and phenotype data, and consider interactive relationship among different levels of omic data. Through simulations and a real data application, we compare the IU test with commonly used variance component tests.
Conclusions
Results show that the proposed test attains more robust type I error performance and higher empirical power than variance component tests under various types of phenotypes and different underlying interaction effects.
Keywords
Non-parametric method Functional data analysis Integrative analysisAbbreviations
- Adj-SKAT
Adjusted sequence kernel association test
- DE
Double exponential
- HTN
Hypertension
- IU
Integrative U test
- SAFDGS
San Antonio Family Diabetes/Gallbladder Study
- SAFHS
San Antonio Family Heart Study
- SKAT
Sequence kernel association test
- VCT
Variance component test
- WGS
Whole-genome sequencing
Background
With rapidly evolving high-throughput technologies and ever-decreasing costs, it has become feasible to systematically study diverse types of omic data in biological and clinical studies [1, 2]. The collection of multi-level omic data from these studies provides us a great opportunity to integrate information from different levels of omic data into association analysis [3, 4, 5, 6]. Although omic-based association analysis holds great promise for discovering novel disease-associated biomarkers, the discovery process is hampered by the lack of appropriate statistical tools to consolidate and analyze multi-level omic data. The development of advanced statistical methods to address the analytical challenges faced by ongoing omic data analysis can enhance our ability to identify new disease-associated biomarkers.
Comprehensive reviews of integrative analysis on multi-level omic data are summarized in [3, 5, 7] and the references therein. Most of the existing methods for integrative analysis are developed based on score-type tests or variance component tests. For instance, in the integrative analysis of single-nucleotide variants (SNVs) and transcript expression data, [6] used the estimating equations to estimate parameters of interest, and then proposed a Wald test to evaluate the association between the outcome and a set of genetic variants, considering possible interactions. In order to efficiently test the joint effects of SNVs and gene expression with a binary phenotype, [8] developed a combined variance component test in the mixed model framework. Based on this work, [9] further investigated a variance component score test for modeling multiple genomic data including SNVs, gene expression, and methylation data, each of which can come from different samples or studies. While those methods have attractive properties under various scenarios, most of these methods are parametric-based or semi-parametric-based, which often rely on a distribution assumption (e.g., a normal distribution assumption). When this assumption is violated, these methods are subject to false positive results and/or power loss [10]. The diagnostic assessments of human diseases can often be of different types (e.g., binary, ordinal and continuous) and follow known or unknown distributions. This issue is, however, paid less attention by the existing methods.
Moreover, the molecular complexity of human diseases manifests itself at the genomic, transcriptomic, epigenomic and proteomic levels [11, 12]. Different levels of omic data can interact in the disease process. By considering interactions between different levels of omic data, the power of detecting disease-associated biomarkers can be potentially enhanced. While some of existing methods consider interactions between omic data [6, 8], they commonly assume a particular interaction model (e.g., a multiplicative model), and are subject to suboptimal performance if the underlying model has different forms (e.g., a threshold model).
To address these limitations, we propose a non-parametric framework for association analysis using multi-level omic data. The IU test is a U-statistic-based test, which is constructed using the pairwise omic and phenotype similarities of subjects. It has several remarkable features worthy of attention: 1) it makes no distribution assumptions, and therefore provides a robust and powerful performance when analyzing phenotypes and omic data with unknown distributions; 2) it provides a unified framework for analyzing various types of phenotypes and omic data (binary, ordinal and continuous); and 3) it considers interactions among different levels of omic data without posing specific model assumptions.
The remaining of the paper is organized as follows. We begin with a detailed description of the proposed integrative U method in “Methods” section, and then present the simulation results of the IU method under different types of phenotypes and various genetic or interaction effects in “Simulation” section. Using the proposed method, we performed an integrative analysis of the DNA sequencing and gene expression data from a hypertension study in “An integrative analysis of gene and gene expression data of hypertension” section. “Conclusion” section summarizes the advantages and limitations of the IU test. Details of the proof of the main results can be found in the Additional file 1.
Methods
Suppose that we are interested in evaluating the joint association of M levels of omic data with a disease phenotype of interest. Without loss of generality, we illustrate the method with two levels of omic data (i.e., SNVs and gene expression data). The extension to more than 2 levels of omic data will be discussed later in “Conclusion” section. Let Y_{i} be a continuous or discrete disease phenotype, S_{i} be a scalar gene expression variable, and G_{i}=(G_{i}(t_{1}),G_{i}(t_{2}),...,G_{i}(t_{p})) be the genotypes of p SNVs (e.g., coding variants in a gene) for the ith individual (i=1,......,n), where t_{j} is the SNV location and G_{i}(t_{j})=0,1,2 is coded as the number of minor alleles.
Genetic smoothing
In recent literature, functional data analysis has been often applied to handle the genetic data. For instance, [13] proposed a functional linear model for quantitative traits using B-spline basis functions to expand the genotype functions. Vsevolozhskaya et al. [14] proposed a functional analysis of variance method to test the association of sequence variants in a genomic region with a qualitative trait. Functional data analysis has also been developed for different types of traits and study purposes in genetic research. For instance, [15] developed a Cox proportional hazard model with functional regression for gene-based association analysis of survival traits. Moreover, [16] proposed a generalized functional linear model to perform meta-analysis of multiple studies to evaluate the association of genetic variants with dichotomous traits.
Test statistic
With the assumptions of Y, G(t) and S mentioned above, we aim to test the hypotheses:
H_{0}: Y is independent of G(t) and S;
H_{a}: Y is associated with G(t) or S.
In addition to the cross product kernel, other kernels, such as those proposed in [10] and [17] can also be used.
From the above equation, the proposed test statistic is a U statistic defined on all possible pairs of subjects (i,j), where the genetic similarity of subjects i and j is defined as the inner product of the smooth curves of the stochastic process, i.e., \(\int _{0}^{1} G_{i} (t) G_{j} (t) dt\). The phenotype similarity and gene expression similarity between the subjects i and j are simply products of two subjects’ phenotype and gene expression values, respectively.
Asymptotic property
where m is the number of eigenvalues of Γ, (λ_{k},ϕ_{k}(t)) are eigenvalues and eigenfunctions of the covariance function Γ, \(\delta _{k} = \int \phi _{k}(t) \eta (t) dt\), \(\sigma _{Y}^{2}\) and \(\sigma _{S}^{2}\) are the population variances of Y and S.
Let \(s_{Y}^{2}\) and \(s_{S}^{2}\) be the sample variances of Y and S, \(\hat \lambda _{k}\) and \(\hat \phi _{k}\) be the eigenvalues and eigenfunctions of \(\hat \Gamma (s,t) = \frac {1}{n}\sum _{i=1}^{n} \left (G_{i}(t)-\bar {G}(t)\right)\left (G_{i}(s) - \bar {G}(s)\right)\). By letting \(\hat \delta _{k} = \int _{0}^{1} \hat \phi _{k}(t)\bar G(t)dt\), we can obtain the asymptotic distribution of the test statistic under H_{0}:
Theorem 1.
We would reject H_{0} if \(|{\sqrt {n}(U_{n} - \hat \mu _{0})}/{\hat \sigma }|>z_{\alpha /2}\), where z_{α/2} is the upper α/2 quantile of the standard normal distribution.
Remark 1.
The assumption of the underlying stochastic process is very general. We do not need a specific condition on the pointwise distributions such as Gaussian, which is the required assumption in [14].
Remark 2.
The proposed test inherits the robustness property from U statistics, and is capable of handling both discrete and continuous phenotypes with various underlying distributions. Moreover, the proposed test does not need to specify any form of the regression function μ=E(Y|S,G), hence the test procedure is free of model assumptions.
The method can also be used for different study purposes. For instance, to only test the effect of SNVs (e.g., in a genetic association study), the corresponding integrative U test statistic can be simplified as \(U_{G} = \frac {1}{n(n-1)}\sum \limits _{i\neq j} Y_{i} Y_{j} \int _{0}^{1} G_{i} (t) G_{j} (t) dt\) with \(\hat \mu _{G} = \bar Y^{2} ||\bar {G}(t)||^{2}\) and variance estimator \(\hat \sigma _{G}^{2} = 4\hat \mu _{Y}^{2} s^{2}_{Y} \sum \hat \lambda _{k} \hat \delta _{k}^{2} \).
Power and sample size
While omic-based studies become increasingly popular in human genetic research, few statistical tools are available for power and sample size calculation. In this section, we investigate the power of the proposed method under certain alternative hypotheses and provide a convenient way for power/sample size calculation.
Results
Simulation
The genetic data was simulated from the 1000 Genome Project [20]. Specifically, we used a 1Mb region of the genome (Chromosome 17: 7344328-8344327) from 1092 individuals in 1000 Genome Project. In each simulation replicate, SNVs were generated by randomly choosing a segment with p=100 consecutive SNVs from the genome. Then the stochastic smoothing function curves were constructed by applying the functional data analysis to the SNV sequences. Gene expression data was generated from a normal distribution, N(1,1.2^{2}). The natural cubic spline smoothing with penalty parameter introduced in [14] was applied. All the results of type I error and empirical power were calculated based on 1000 simulated replicates.
Type I error performance
Type I error comparison of three methods for different types of phenotypes
phenotype distributions | Bernoulli | Gaussian | T _{2} | T _{4} | DE |
---|---|---|---|---|---|
IU | 0.048 | 0.049 | 0.052 | 0.055 | 0.052 |
VCT | 0.046 | 0.053 | 0.091 | 0.068 | 0.075 |
Adj-SKAT | 0.035 | 0.055 | 0.129 | 0.062 | 0.064 |
Type I error of IU for different sample sizes with nominal sizes 0.05 and 0.01
Type I error with Gaussian phenotype | |||||
---|---|---|---|---|---|
α / n | 100 | 200 | 300 | 400 | 500 |
0.05 | 0.049 | 0.051 | 0.048 | 0.047 | 0.052 |
0.01 | 0.01 | 0.011 | 0.009 | 0.011 | 0.010 |
Type I error with binary phenotype | |||||
α / n | 100 | 200 | 300 | 400 | 500 |
0.05 | 0.044 | 0.046 | 0.048 | 0.052 | 0.048 |
0.01 | 0.011 | 0.012 | 0.009 | 0.011 | 0.012 |
Power performance
For the power comparison, we considered the scenarios with or without an interaction between SNVs and gene expression. For the scenarios with an interaction, we studied the performance of the three methods under various interaction models. Similar to the type I error simulation, the genetic data was obtained from the 1000 Genome Project and gene expression S_{i} was sampled from N(1,1.2^{2}). The binary response Y_{i} was then generated from a logistic regression model. In each simulation, we randomly chose 100 cases and 100 controls to form a balanced case-control sample. For continuous phenotypes, we simulated both Gaussian-distributed and T-distributed phenotypes.
Case 1: No interaction effect
where β_{G} and β_{S} were defined as in (2), ε_{i}∼N(0,1) and e_{i}∼T(2), a T-distribution with 2 degrees of freedom.
Similar to [8], we assume that β_{G} and γ are randomly generated from probability distributions with mean 0 and variances \(\sigma _{G}^{2}\) and \(\sigma _{\gamma }^{2}\). In this simulation, the genetic effects measured by β_{G} were generated from a normal distribution, \(N(0,\sigma _{G}^{2})\), while the interaction effects measured by γ were all set to be zero (\(\sigma _{\gamma }^{2} = 0\)) in order to study the marginal effects of genetic variables.
Case 2: Interaction effect
We then compared the performance of three methods under a more complex scenario when there is an interaction between SNVs and gene expression. In this simulation, we considered three types of interaction effects: multiplicative, threshold, and random interaction effects. Similar to the simulation with no interaction, we evaluated the methods under three different kinds of phenotypes.
In summary, the proposed IU test obtains higher power as the marginal or interaction effects increase. Unlike VCT or Adj-SKAT, which show higher power only under some specific models (e.g., the random effect or cross-product interaction models), the IU test showed more robust and stable performance for different phenotypes and various underlying models. These features make IU more appropriate to use when we have limited knowledge on the actual underlying model.
An integrative analysis of gene and gene expression data of hypertension
Hypertension is one of the most common chronic diseases, which affects a large proportion of human population worldwide. Despite decades of research in hypertension, the genetic etiology of hypertension remains largely unknown. The successful identification of genetic variants predisposing to hypertension holds promise for providing better understanding of genetic etiology of hypertension and promoting new therapeutic targets. In this application, we performed an integrative analysis of DNA sequencing and gene expression data from the San Antonio Family Heart Study (SAFHS) and the San Antonio Family Diabetes/Gallbladder Study (SAFDGS). SAFHS and SAFDGS include standardized diagnostic assessments of hypertension (i.e., Case vs. Control). Whole-genome sequencing (WGS) data were available on the odd numbered autosomes. In addition, gene expression was measured using peripheral blood mononuclear cells collected at the first examination. In total, there are 260 subjects with WGS data, gene expression data, and the binary hypertension (HTN) phenotype measured.
Prior to the integrative analysis, we performed a quality control and data preparation process. In this process, we assembled multiple SNVs into genes based on the Genome Reference Consortium release version 38 (GRCh38) and excluded genes without gene expression data. To deal with missing values in the genetic data, we imputed the genotype values from multinomial distribution using the sample proportions as the generating probabilities. After data processing step, 2389 genes and the corresponding gene expression remained for the integrative analysis. We then applied a generalized mixed model to the binary HTN phenotype with covariates AGE, MEDS, SMOKE, SEX and the kinship matrix to remove potential confounding effects and the familial correlations. The residuals were used as the responses in this integrative analysis. Eventually, the proposed IU test is applied to detect the joint effect of genes and gene expression data.
Top 10 gene findings from the integrative analysis in a hypertension study
Name | Chromosome | Starting location | Ending location | # of SNVs | p-value |
---|---|---|---|---|---|
UBAC1 | 9 | 138823836 | 138854205 | 287 | 9.86×10^{−5} |
MEGF11 | 15 | 66186838 | 66546725 | 2989 | 2.14×10^{−4} |
IFI44L | 1 | 79085201 | 79112428 | 207 | 5.35×10^{−4} |
MFGE8 | 15 | 89440944 | 89457653 | 161 | 1.27×10^{−3} |
ANKDD1A | 15 | 65203490 | 65251983 | 464 | 1.29×10^{−3} |
PDZD2 | 5 | 31798110 | 32111928 | 3655 | 1.96×10^{−3} |
TBX4 | 17 | 59532864 | 59561970 | 221 | 2.15×10^{−3} |
IGSF3 | 1 | 117116060 | 117211147 | 429 | 2.42×10^{−3} |
TMEM61 | 1 | 55445562 | 55458886 | 170 | 3.90×10^{−3} |
FAM46B | 1 | 27330739 | 27340321 | 63 | 7.29×10^{−3} |
Conclusion
To facilitate the integrative analysis of omic data, we proposed a unified non-parametric method to detect the joint association of multi-level omic data with various types of phenotypes. There are three main contributions of the proposed IU method. First, it provides robust performance for various types of phenotypes, including binary, Gaussian and heavy-tailed distributions, due to the robustness of U statistics. Second, the proposed integrative U test achieves higher or comparable power compared to existing methods (e.g., VCT) under different types of interaction models. Finally, we also provide a simple sample size/power calculation to facilitate the design of multi-level omic studies.
The connection between the proposed method and variance component tests is that all test statistics are in the form of kernel quadratic framework as seen in “Simulation” section. It also connects to several other U-statistic-based methods [10, 27]. As a similarity-based test, the IU method is proposed as a non-degenerated U statistic, which follows a normal distribution. One advantage of using a non-degenerated U statistic is the computational accuracy with no distribution approximation. If we centralize the phenotype, it becomes a degenerated U test, which follows a mixture chi-square distribution.
The choice of K_{3} is similar to K_{1} and K_{2} as discussed in “Methods” section. Following the same argument for Theorem 1, we can show that this modified IU test also follows an asymptotically normal distribution. In addition, with multiple genes (e.g., genes in a biological pathway) and the corresponding gene expression levels, the gene expression level S can also be modeled as a function. For such purpose, the similarity measure K_{2}(S_{i},S_{j}) can be modified as \(\tilde K_{2}(S_{i}(t),S_{j}(t))\) where \(\tilde K_{2}(\cdot,\cdot)\) measures the similarity between two functions. The asymptotic property can be derived based on the same argument for Theorem 1.
One potential limitation of this study is that gene expression is assumed to be independent of SNVs. One technical reason of making such assumption is that, under the stochastic process setup, it is hard to model the association between the gene expression variable S and the underlying stochastic process SP(η(t),Γ(s,t)). Finding an appropriate way to model correlations among omic data is a challenging topic that is worth of further investigation. Nevertheless, if real data indicates correlations among different levels of omic data, one way to overcome this issue is to adopt methods introduced by [27] and [28].
Notes
Acknowledgements
Not applicable.
Funding
This project was supported by the National Institute on Drug Abuse (Award No. R01DA043501) and the National Library of Medicine (Award No. R01LM012848).
Availability of data and materials
Data used in this article comes from the Genetic Analysis Workshop.
Authors’ contributions
PG and QL participate in the design of the study. PG implemented the methods and drafted the manuscript. XT was involved in the data analysis. QL participated in the conception of the study and in editing the manuscript. All authors read and approved the final manuscripts.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
Author Qing Lu is currently acting as an Editorial Board Member for BMC Genetics. All other authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material
References
- 1.Collins FS, Varmus H. A new initiative on precision medicine. New Eng J Med. 2015; 372(9):793–5.CrossRefGoogle Scholar
- 2.Lappalainen T, Sammeth M, Friedlander MR, ‘t Hoen PA, Monlong J, Rivas MA, Gonzalez-Porta M, et al.Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013; 501(7468):506–11.CrossRefGoogle Scholar
- 3.Kristensen VN, Lingjærde OC, Russnes HG, Vollan HK, Frigessi A, Børresen-Dale A. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer. 2014; 14(5):299–313.CrossRefGoogle Scholar
- 4.Lin W, Feng R, Li H. Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics. J Am Stat Assoc. 2015; 110(509):270–88.CrossRefGoogle Scholar
- 5.Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics. 2015; 16(2):85–97.CrossRefGoogle Scholar
- 6.Zhao SD, Cai TT, Li H. More powerful genetic association testing via a new statistical framework for integrative genomics. Biometrics. 2014; 70(4):881–90.CrossRefGoogle Scholar
- 7.Ainsworth HF, Shin S, Cordell HJ. A comparison of methods for inferring causal relationships between genotype and phenotype using additional biological measurements. Genet Epidemiol. 2017; 41(7):577–86.CrossRefGoogle Scholar
- 8.Huang Y-T, Vanderweele TJ, Lin X. Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. Ann Appl Stat. 2014; 8:352–76.CrossRefGoogle Scholar
- 9.Huang Y-T. Integrative modeling of multiple genomic data from different types of genetic association studies. Biostatistics. 2014; 15(4):587–602.CrossRefGoogle Scholar
- 10.Wei C, Li M, He Z, Vsevolozhskaya O, Schaid DJ, Lu Q. A weighted U-statistic for genetic association analyses of sequencing data. Genet Epidemiol; 38(8):699–708.CrossRefGoogle Scholar
- 11.The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012; 490(7418):61–70.CrossRefGoogle Scholar
- 12.Hawkins RD, Hon GC, Ren B. Next-generation genomics: an integrative approach. Nat Rev Genet. 2010; 11(7):476–86.CrossRefGoogle Scholar
- 13.Luo L, Zhu Y, Xiong M. Quantitative trait locus analysis for next-generation sequencing with the functional linear models. J Med Genet. 2012; 49(8):513–24.CrossRefGoogle Scholar
- 14.Vsevolozhskaya OA, Zaykin DV, Greenwood MC, Wei C, Lu Q. Functional analysis of variance for association studies. PLOS ONE. 2014; 9(9):e105074.CrossRefGoogle Scholar
- 15.Fan R, Wang Y, Boehnke M, Chen W, Li Y, Ren H, Lobach I, Xiong M. Gene level meta-analysis of quantitative traits by functional linear models. Genetics. 2015; 200(4):1089–104.CrossRefGoogle Scholar
- 16.Fan R, Wang Y, Chiu CY, Chen W, Ren H, Li Y, Boehnke M, Amos CI, Moore JH, Xiong M. Meta-analysis of complex diseases at gene level by generalized functional linear models. Genetics. 2015; 202(2):457–70.CrossRefGoogle Scholar
- 17.Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011; 89:82–93.CrossRefGoogle Scholar
- 18.Serfling RJ. Approximation theorems of mathematical statistics. Wiley Series in Probability and Statistics. Hoboken: Wiley; 1981.Google Scholar
- 19.Zhang J-T. Analysis of Variance for Functional Data. London: Chapman & Hall; 2013.CrossRefGoogle Scholar
- 20.1000 Genomes Project Consortium, Abecasis GR, et al.A map of human genome variation from population-scale sequencing. Nature. 2010; 467(7319):1061–73.CrossRefGoogle Scholar
- 21.Kerley-Hamilton JS, Trask HW, Ridley CJ, Dufour E, Ringelberg CS, Nurinova N, Wong D, Moodie KL, Shipman SL, Moore JH, Korc M, Shworak NW, Tomlinson CR. Obesity is mediated by differential aryl hydrocarbon receptor signaling in mice fed a Western diet. Environ Health Perspect. 2012; 120(9):1252–9.CrossRefGoogle Scholar
- 22.Han C, Wu W, Ale A, Kim MS, Cai D, 2016. Central Leptin and Tumor Necrosis Factor- α (TNF α) in Diurnal Control of Blood Pressure and Hypertension. Int J Biol Chem; 291(29):15131–42.CrossRefGoogle Scholar
- 23.BrahmaNaidu P, Nemani H, Meriga B, Mehar SK, Potana S, Ramgopalrao S. Mitigating efficacy of piperine in the physiological derangements of high fat diet induced obesity in Sprague Dawley rats. Chem Biol Interact. 2014; 221:42–51.CrossRefGoogle Scholar
- 24.Correa RJ, Malajian D, Shemer A, Rozenblit M, Dhingra N, Czarnowicki T, Khattri S, Ungar B, Finney R, Xu H, Zheng X, Estrada YD, Peng X, Suarez-Farinas M, Krueger JG, Guttman-Yassky E. Patients with atopic dermatitis have attenuated and distinct contact hypersensitivity responses to common allergens in skin. J Allergy Clin Immunol. 2015; 135(3):712–20.CrossRefGoogle Scholar
- 25.Tchou-Wong KM, Kiok K, Tang Z, Kluz T, Arita A, Smith PR, Brown S, Costa M. Effects of nickel treatment on H3K4 trimethylation and gene expression. PLOS ONE. 2011; 6(3):e17728.CrossRefGoogle Scholar
- 26.Yang AM, Bai YN, Pu HQ, Zheng TZ, Cheng N, Li JS, Li HY, Zhang YW, Ding J, Su H, Ren XW, Hu XB. Prevalence of metabolic syndrome in Chinese nickel-exposed workers. Biomed Environ Sci. 2014; 27(6):475–7.PubMedGoogle Scholar
- 27.Wei C, Elston RC, Lu Q. A weighted U statistic for association analyses considering genetic heterogeneity. Stat Med. 2016; 35(16):2802–14.CrossRefGoogle Scholar
- 28.Jiang Y, Li N, Zhang H. Identifying Genetic Variants for Addiction via Propensity Score Adjusted Generalized Kendall’s Tau. J Am Stat Assoc. 2014; 109(507):905–30.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.