Improving the Accuracy of Gene Expression Profile Classification with Lorenz Curves and Gini Ratios

Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 696)


Microarrays are a new technology with great potential to provide accurate medical diagnostics, help to find the right treatment for many diseases such as cancers, and provide a detailed genome-wide molecular portrait of cellular states. In this chapter, we show how Lorenz Curves and Gini Ratios can be modified to improve the accuracy of gene expression profile classification. Experimental results with different classification algorithms using additional techniques and strategies for improving the accuracy such as the principal component analysis, the correlation-based feature subset selection, and the consistency subset evaluation technique for the task of classifying lung adenocarcinomas from gene expression show that our method find more optimal genes than SAM.


Microarray data mining 


  1. 1.
  2. 2.
  3. 3.
    Baldi, P., and Long, A. D. A bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17 (2001), 509–519.Google Scholar
  4. 4.
    Bhattacharjee, A., Richards, W. G., Staunton, J., Li, C., Monti, S., Golub, T. R., Sugarbaker, D. J., and Meyerson, M. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA 98, 24 (2001), 13790–13795.Google Scholar
  5. 5.
    Butte, A. The use and analysis of microarray data. Nat. Rev. Drug Discov. 1, 12 (2002), 951–960.Google Scholar
  6. 6.
    Hall, M. A. Correlation-Based Feature Subset Selection. Hamilton, New Zealand, 1998.Google Scholar
  7. 7.
    Jolliffe, I. Principal Component Analysis. Springer Series in Statistics. Springer, New York, 2002.Google Scholar
  8. 8.
    Liu, H., and Setiono, R. A probabilistic approach to feature selection – a filter solution. In Proceedings of the 13th International Conference on Machine Learning (1996), pp. 319–327.Google Scholar
  9. 9.
    Marchal, K., Engelen, K., Brabanter, J. D., Zhou, S., Zheng, X., Wang, J., and Delisle, P. Comparison of different methodologies to identify differentially expressed genes in two-sample cdna microarrays. J. Biol. Syst. 10 (2002), 409–430.Google Scholar
  10. 10.
    Piatetsky-Shapiro, G., and Tamayo, P. Microarray data mining: Facing the challenges. SIGKDD Explorations 5, 2 (2003).Google Scholar
  11. 11.
    Quinlan, J. R. An empirical comparision of genetic and decision-tree classifiers. In Proceedings of the 5th International Conference on Machine Learning (Ann Arbor, 1988), pp. 135–141.Google Scholar
  12. 12.
    Ramaswamy, S., and Golub, T. R. Dna microarrays in clinical oncology. J. Clin. Oncol. 20 (2002), 1932–1941.Google Scholar
  13. 13.
    Storey, J. D., and Tibshirani, R. Statistical significance for genome wide studies. Proc. Natl. Acad. Sci. USA 100 16 (2003), 9440–9445.Google Scholar
  14. 14.
    Tamayo, P., and Ramaswamy, S. Cancer genomics and molecular pattern recognition. In Expression profiling of human tumors: diagnostic and research applications, M. Ladanyi and W. Gerald, Eds. Humana Press, Clifton, 2003.Google Scholar
  15. 15.
    Tran, Q.-N. Microarray data mining: A new algorithm for gene selection using Gini ratios. In Proceedings of IEEE-ITNG 2008 Conference (Las Vegas, Nevada, 2010).Google Scholar
  16. 16.
    Tusher, V. G., Tibshirani, R., and Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98 (2001), 5116–5121.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceLamar UniversityBeaumontUSA

Personalised recommendations