Advertisement

Association Measures and Statistical Significance Measures

  • Steve Horvath
Chapter

Abstract

An association measure can be used to measure the relationships between two random variables. These variables may be numeric, categorical, or binary. Statistical test statistics can often be defined for deriving association measures. For example, several statistical test statistics (Fisher Z, Student t-test, Hotelling) can be used to calculate a statistical significance level (p value) for a correlation coefficient. Multiple comparison correction (MCC) procedures are needed to protect against false positives due to multiple comparisons. The Bonferroni- and Sidak correction are very conservative MCC procedures. The q-value (local false discovery rate) MCC is often advantageous since it allows one to detect more significant variables. To calculate the false discovery rate (FDR), one considers the shape of the histogram of p values. MCC procedures can often be interpreted as transformations that increase the p value to account for the fact that multiple comparisons have been carried out. For example, the Bonferroni correlation multiplies each p value by the number of comparisons. The q-value transformation is sometimes improper, i.e., it decreases significant p values. p values and q-values can be used to screen for significant variables. The WGCNA library contains several R functions that implement standard screening criteria for finding variables (e.g., gene expression profiles) associated with a sample trait y. In practice, many seemingly different gene screening methods turn out to be significant. p values (or q-values) can be used to formulate a statistical criterion for choosing the (hard) threshold τ when defining an unweighted correlation network. Many methods for defining unweighted networks on the basis of pairwise linear relationships between variables turn out to be equivalent.

Keywords

False Discovery Rate Significance Measure Correlation Network Association Measure Multiple Comparison Correction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300Google Scholar
  2. Hawkins DL (1989) Using U statistics to derive the asymptotic distribution of Fisher’s Z statistic. Am Stat 43(4):235–237Google Scholar
  3. Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Shu Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI, Cloughesy TF, Nelson SF, Mischel PS (2006) Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a novel molecular target. Proc Natl Acad Sci USA 103(46):17402–17407PubMedCrossRefGoogle Scholar
  4. Li A, Horvath S (2007) Network neighborhood analysis with the multi-node topological overlap measure. Bioinformatics 23(2):222–231PubMedCrossRefGoogle Scholar
  5. Sokal RR, Rohlf FJ (1981) Biometry: The principles and practice of statistics in biological research, 3rd edn. WH Freeman, New YorkGoogle Scholar
  6. Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B 64:479–498CrossRefGoogle Scholar
  7. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100(16):9440–9445PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.University of California, Los AngelesLos AngelesUSA

Personalised recommendations