Iterative Piecewise Linear Regression to Accurately Assess Statistical Significance in Batch Confounded Differential Expression Analysis

  • Juntao Li
  • Kwok Pui Choi
  • R. Krishna Murthy Karuturi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7292)


Batch dependent variation in microarray experiments may be manifested through systematic shift in expression measurements from batch to batch. Such a systematic shift could be taken care of by using an appropriate model for differential expression analysis. However, it poses greater challenge in the estimation of statistical significance and false discovery rate (FDR), if the batches are confounded (collinear) with the biological groups of interest. Batch confounding problem occurs commonly in the analysis of time-course data or data from different laboratories. We demonstrate that batch confounding may lead to incorrect estimation of the expected statistics. In this paper, we propose an iterative piecewise linear regression (iPLR) method, a major extension of our previously published Stepped Linear Regression (SLR) method, in the context of SAM to re-estimate the expected statistics and FDR. iPLR can be applied to one-sided or two-sided statistics based tests. We demonstrate the efficacy of iPLR on both simulated and real microarray datasets. iPLR also provides a better interpretation of the linear model parameters.


False Discovery Rate Null Distribution Biological Group False Discovery Rate Estimation Linear Model Parameter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Li, C., Wong, W.H.: Dna-chip analyzer (dchip). In: The Analysis of Gene Expression Data: Methods and Software, pp. 28–46. Springer, Heidelberg (2003)Google Scholar
  2. 2.
    Johnson, W.E., Li, C., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8, 118–127 (2007)zbMATHCrossRefGoogle Scholar
  3. 3.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106 (2000)CrossRefGoogle Scholar
  4. 4.
    Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C.M., Marron, J.S.: Adjustment of systematic microarray data biases. Bioinformatics 20, 105–114 (2004)CrossRefGoogle Scholar
  5. 5.
    Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Smyth, G.K.: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3(1) (2004)Google Scholar
  7. 7.
    Storey, J.D., Tibshirani, R.: Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Li, J., Liu, J., Karuturi, R.K.M.: Stepped linear regression to accurately assess statistical significance in batch confounded differential expression analysis. Bioinformatics Research and Applications, 481–491 (2008)Google Scholar
  9. 9.
    Chu, G., Narasimhan, B., Tibshirani, R., Tusher, V.: SAM, significance analysis of microarrays. Users guide and technical documentGoogle Scholar
  10. 10.
    Celisse, A., Robin, S.: A cross-validation based estimation of the proportion of true null hypotheses. Journal of Statistical Planning and Inference 140, 3132–3147 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Xie, Y., Pan, W., Khodursky, A.B.: A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 21, 4280–4288 (2005)CrossRefGoogle Scholar
  12. 12.
    Chu, Z., Li, J., Eshaghi, M., Karuturi, R.K.M., Lin, K., Liu, J.: Adaptive expression responses in the pol-gamma null strain of s. pombe depleted of mitochondrial genome. BMC Genomics 8, 323 (2007)CrossRefGoogle Scholar
  13. 13.
    Stegmaier, K., Wong, J.S., Ross, K.N., Chow, K.T., Peck, D., Wright, R.D., Lessnick, S.L., Kung, A.L., Golub, T.R.: Signature-based small molecule screening identifies cytosine arabinoside as an EWS/FLI modulator in ewing sarcoma. PLoS Medicine 4, e122 (2007)CrossRefGoogle Scholar
  14. 14.
    Efron, B., Tibshirani, R.: On testing the significance of sets of genes. The Annals of Applied Statistics 1, 107–129 (2007)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Juntao Li
    • 1
    • 2
  • Kwok Pui Choi
    • 2
  • R. Krishna Murthy Karuturi
    • 1
  1. 1.Computational & Mathematical BiologyGenome Institute of SingaporeSingapore
  2. 2.Department of Statistics and Applied ProbabilityNational University of SingaporeSingapore

Personalised recommendations