Generation of Comprehensible Hypotheses from Gene Expression Data

  • Yuan Jiang
  • Ming Li
  • Zhi-Hua Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3916)


Machine learning techniques have been recognized as powerful tools for the analysis of gene expression data. However, most learning techniques used in class prediction in gene expression analysis during the past years generate black-box models. Although the prediction accuracy of these models could be very well, they provide little insight into the biological facts. This paper holds the recognition that a more reasonable role for machine learning techniques is to generate hypotheses that can be verified or refined by human experts instead of making decisions for human experts. Based on this recognition, a general approach to generate comprehensible hypotheses from gene expression data is described and applied to human acute leukemias as a test case. The results demonstrate the feasibility of using machine learning techniques to help form hypotheses on the relationship between genes and certain diseases.


Support Vector Machine Acute Myeloid Leukemia Acute Lymphoblastic Leukemia Gene Expression Data Linear Discriminant Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Albrecht, A., Vinterbo, S.A., Ohno-Machado, L.: An epicurean learning approach to gene-expression data classification. Artificial Intelligence in Medicine 28, 75–87 (2003)CrossRefzbMATHGoogle Scholar
  2. 2.
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. Journal of Computational Biology 7, 559–584 (2000)CrossRefGoogle Scholar
  3. 3.
    Bishop, J.F.: Adult acute myeloid leukaemia: update on treatment. Medical Journal of Australia 170, 39–43 (1999)Google Scholar
  4. 4.
    Cho, S.-B., Ryu, J.: Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proceedings of the IEEE 90, 1744–1753 (2002)CrossRefGoogle Scholar
  5. 5.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)CrossRefzbMATHGoogle Scholar
  7. 7.
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906–914 (2000)CrossRefGoogle Scholar
  8. 8.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  9. 9.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  10. 10.
    Hayashi, Y., Setiono, R., Yoshida, K.: A comparison between two neural network rule extraction techniques for the diagnosis of hepatobiliary disorders. Artificial Intelligence in Medicine 20, 205–216 (2000)CrossRefGoogle Scholar
  11. 11.
    Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)CrossRefGoogle Scholar
  12. 12.
    Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18, 725–734 (2002)CrossRefGoogle Scholar
  13. 13.
    Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis, pp. 137–150. Kluwer, Boston (2001)Google Scholar
  14. 14.
    Maughan, N.J., Lewis, F.A., Smith, V.: An introduction to arrays. Journal of Pathology 195, 3–6 (2001)CrossRefGoogle Scholar
  15. 15.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  16. 16.
    Mjolsness, E., DeCoste, D.: Machine learning for science: state of the art and future prospects. Science 293, 2051–2055 (2001)CrossRefGoogle Scholar
  17. 17.
    Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)CrossRefGoogle Scholar
  18. 18.
    Pui, C.H., Evans, W.E.: Acute lymphoblastic leukemia. New England Journal of Medicine 339, 605–615 (1998)CrossRefGoogle Scholar
  19. 19.
    Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2, 418–427 (2001)CrossRefGoogle Scholar
  20. 20.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  21. 21.
    Setiono, R.: Generating concise and accurate classification rules for breast cancer diagnosis. Artificial Intelligence in Medicine 18, 205–219 (2000)CrossRefGoogle Scholar
  22. 22.
    Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75–S83 (2003)Google Scholar
  23. 23.
    Yun, Z., Keong, K.C.: Identifying simple discriminatory gene vectors with an information theory approach. In: Proceedings of the 4th IEEE Computational Systems Bioinformatics Conference, Stanford, CA, pp. 13–24 (2005)Google Scholar
  24. 24.
    Zhou, Z.-H.: Rule extraction: using neural networks or for neural networks? Journal of Computer Science & Technology 19, 249–253 (2004)CrossRefGoogle Scholar
  25. 25.
    Zhou, Z.-H., Jiang, Y.: Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble. IEEE Transactions on Information Technology in Biomedicine 7, 37–42 (2003)CrossRefGoogle Scholar
  26. 26.
    Zhou, Z.-H., Jiang, Y.: NeC4.5: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering 16, 770–773 (2004)CrossRefGoogle Scholar
  27. 27.
    Zhou, Z.-H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artificial Intelligence 137, 239–263 (2002)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yuan Jiang
    • 1
  • Ming Li
    • 1
  • Zhi-Hua Zhou
    • 1
  1. 1.National Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations