Skip to main content

Discovering Influential Variables: A General Computer Intensive Method for Common Genetic Disorders

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Abstract

We describe a general backward partition method for discovering which of a large number of possible explanatory variables influence a dependent variable Y. This method, based on a variant pioneered by Lo and Zheng, and variations have been used successfully in several biological problems, some of which are discussed here. The problem is an example of feature or variable selection. Although the objective, to understand which are the influential variables, is often not the same as classification, the method has been successfully applied to that problem too.

This chapter is prepared based on the materials presented in a Workshop on Detecting Influential Variables in High-Dimensional Data (held in June, 2009 at National Taiwan University, Taipei). We thank Hung Chen and Taida Institute of Mathematical Sciences for the support and organization of this very successful workshop. All authors of this chapter participated in the workshop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    They actually used more subjects.

References

  1. Barrett, J. C., Hansoul, S., Nicolae, D. L., Cho, J. H., Duerr, R. H., Rioux, J. D., Brant, S. R., Silverberg, M. S., Taylor, K. D., Barmada, M. M., Bitton, A., Dassopoulos, T., Datta, L. W., Green, T., Griffiths, A. M., Kistner, E. O., Murtha, M. T., Regueiro, M. D., Rotter, J. I., Schumm, L. P., Steinhart, A. H., Targan, S. R., Xavier, R. J., Libioulle, C., Sandor, C., Lathrop, M., Belaiche, J., Dewit, O., Gut, I., Heath, S., Laukens, D., Mni, M., Rutgeerts, P., Van Gossum, A., Zelenika, D., Franchimont, D., Hugot, J. P., de Vos, M., Vermeire, S., Louis, E., Cardon, L. R., Anderson, C. A., Drummond, H., Nimmo, E., Ahmad, T., Prescott, N. J., Onnie, C. M., Fisher, S. A., Marchini, J., Ghori, J., Bumpstead, S., Gwilliam, R., Tremelling, M., Deloukas, P., Mansfield, J., Jewell, D., Satsangi, J., Mathew, C. G., Parkes, M., Georges, M., & Daly, M. J. (2008). Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nature Genetics, 40(8), 955–962.

    Article  Google Scholar 

  2. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  3. Chernoff, H., Lo, S. H., & Zheng, T. (2009). Discovering influential variables: A method of partitions. Annals of Applied Statistics, 3(4), 1335–1369.

    Article  MathSciNet  MATH  Google Scholar 

  4. Ding, Y., Cong, L., Ionita-Laza, I., Lo, S. H., & Zheng, T. (2007). Constructing gene association networks for rheumatoid arthritis using the backward genotype-trait association (BGTA) algorithm. BMC Proceedings, 1(Suppl 1), S13.

    Article  Google Scholar 

  5. Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457), 77–87.

    Article  MathSciNet  MATH  Google Scholar 

  6. Efron, B., & Tibshirani, R. (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiology, 23(1), 70–86.

    Article  Google Scholar 

  7. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., & Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.

    Article  Google Scholar 

  8. Hastie, T., Tibshirani, R., & Friedman, J. H. (2003). The elements of statistical learning (corrected ed.) New York, NY: Springer.

    Google Scholar 

  9. Huang, C. H., Cong, L., Xie, J., Qiao, B., Lo, S. H., & Zheng, T. (2009). Rheumatoid arthritis-associated gene-gene interaction network for rheumatoid arthritis candidate genes. In BMC proceedings for the genetic analysis workshop 16, Vol.. BMC Proceedings 2009, 3(Suppl 7):S76 (15 December 2009)

    Google Scholar 

  10. Hunter, D. J., Kraft, P., Jacobs, K. B., Cox, D. G., Yeager, M., Hankinson, S. E., Wacholder, S., Wang, Z., Welch, R., Hutchinson, A., Wang, J., Yu, K., Chatterjee, N., Orr, N., Willett, W. C., Colditz, G. A., Ziegler, R. G., Berg, C. D., Buys, S. S., McCarty, C. A., Feigelson, H. S., Calle, E. E., Thun, M. J., Hayes, R. B., Tucker, M., Gerhard, D. S., Fraumeni, J. F., Jr., Hoover, R. N., Thomas, G., & Chanock, S. J. (2007). A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics, 39(7), 870–874.

    Article  Google Scholar 

  11. Ionita, I., & Lo, S. H. (2005). Multilocus linkage analysis of affected sib pairs. Human Heredity, 60(4), 227–240.

    Article  Google Scholar 

  12. Kerr, M. K., & Churchill, G. A. (2001). Statistical design and the analysis of gene expression microarray data. Genetical Research, 77(2), 123–128.

    Google Scholar 

  13. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., & Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6), 673–679.

    Article  Google Scholar 

  14. King, R. A., Rotter, J. I., & Motulsky, A. G. (2002). The genetic basis of common diseases (2nd ed.). New York, NY: Oxford University Press.

    Google Scholar 

  15. Lee, Y., & Lee, C. K. (2003). Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics, 19(9), 1132–1139.

    Article  Google Scholar 

  16. Lo, S. H., Chernoff, H., Cong, L., Ding, Y., & Zheng, T. (2008). Discovering interactions among BRCA1 and other candidate genes associated with sporadic breast cancer. Proceedings of the National Academy of Science United States of America, 105(34), 12,387–12,392.

    Google Scholar 

  17. Lo, S. H., & Zheng, T. (2002). Backward haplotype transmission association (BHTA) algorithm – a fast multiple-marker screening method. Human Heredity, 53(4), 197–215.

    Article  Google Scholar 

  18. Lo, S. H., & Zheng, T. (2004). A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data. Proceedings of the National Academy of Science United States of America, 101(28), 10,386–10,391.

    Google Scholar 

  19. McKinney, B. A., Reif, D. M., Ritchie, M. D., & Moore, J. H. (2006). Machine learning for detecting gene-gene interactions: A review. Applied Bioinformatics, 5(2), 77–88.

    Article  Google Scholar 

  20. Pochet, N., De Smet, F., Suykens, J. A. K., & De Moor, B. L. R. (2004). Systematic benchmarking of microarray data classification: Assessing the role of non-linearity and dimensionality reduction. Bioinformatics, 20(17), 3185–3195.

    Article  Google Scholar 

  21. Qiao, B., Huang, C. H., Cong, L., Xie, J., Lo, S. H., & Zheng, T. (2009). Genome-wide gene-based analysis of rheumatoid arthritis-associated interaction with PTPN22 and HLA-DRB. In BMC proceedings for the genetic workshop analysis 16, Vol.. BMC Proceedings 2009, 3(Suppl 7): S132.

    Google Scholar 

  22. Rioux, J. D., Silverberg, M. S., Daly, M. J., Steinhart, A. H., McLeod, R. S., Griffiths, A. M., Green, T., Brettin, T. S., Stone, V., Bull, S. B., Bitton, A., Williams, C. N., Greenberg, G. R., Cohen, Z., Lander, E. S., Hudson, T. J., & Siminovitch, K. A. (2000). Genomewide search in canadian families with inflammatory bowel disease reveals two novel susceptibility loci. American Journal of Human Genetics, 66(6), 1863–1870.

    Article  Google Scholar 

  23. Ritchie, M. D., Hahn, L. W., & Moore, J. H. (2003). Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genetic Epidemiology, 24(2), 150–157.

    Article  Google Scholar 

  24. Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F., & Moore, J. H. (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American Journal of Human Genetics, 69(1), 138–147.

    Article  Google Scholar 

  25. Sotiriou, C., Neo, S. Y., McShane, L. M., Korn, E. L., Long, P. M., Jazaeri, A., Martiat, P., Fox, S. B., Harris, A. L., & Liu, E. T. (2003). Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proceedings of the National Academy of Science United States of America, 100(18), 10,393–10,398.

    Google Scholar 

  26. Tusher, V. G., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America, 98(9), 5116–5121.

    Article  MATH  Google Scholar 

  27. van ’t Veer, L. J., Dai, H. Y., van de Vijver, M. J., He, Y. D. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., & Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871), 530–536.

    Google Scholar 

  28. Wang, H., Lo, S. H., Zheng, T., & Hu, I. (2009). A new classification method incorporating interactions among variables for high-dimensional data. Working paper.

    Google Scholar 

  29. Zhang, H., Yu, C. Y., & Singer, B. (2003). Cell and tumor classification using gene expression data: Construction of forests. Proceedings of the National Academy of Sciences of the United States of America, 100(7), 4168–4172.

    Article  Google Scholar 

  30. Zhang, H. H., Ahn, J., Lin, X., & Park, C. (2006). Gene selection using support vector machines with non-convex penalty. Bioinformatics, 22(1), 88–95.

    Article  Google Scholar 

  31. Zheng, T., Wang, H., & Lo, S. H. (2006). Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs. Human Heredity, 62(4), 196–212.

    Article  Google Scholar 

Download references

Acknowledgements

The research is partially supported by NSF grant DMS 0714669, NIH Grant R01 GM070789 and ARRA supplement 3R01GM070789-05S1, and Hong Kong RGC grant 642207. We thank our graduate students, Chien Hsun Huang and Haitian Wang (Maggie) for their diligence in our research activities that have led to several publications discussed in this chapter and their help during the workshop.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaw-Hwa Lo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zheng, T., Chernoff, H., Hu, I., Ionita-Laza, I., Lo, SH. (2011). Discovering Influential Variables: A General Computer Intensive Method for Common Genetic Disorders. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_5

Download citation

Publish with us

Policies and ethics