Abstract:
Different research groups have conducted independent gene expression studies on tissue samples from human lung adenocarcinomas [Bhattacharjee et al. 2001; Beer et al. 2002]. In this paper we (a) investigate methods to integrate data obtained from independent studies, (b) experiment with different gene selection methods to find genes that have significantly differential expression among different tumor stages, (c) study the performance of neural network classifiers with correlated weights, and (d) compare the performance of classifiers based on neural networks and its many variants on gene expression data. Raw cell intensity data were preprocessed for our analyses. Affymetrix array comparison spreadsheets were used to extract the overlapping probe sets for the data integration study. We considered neural network classifiers with random weights selected from a univariate normal distribution and optimized using Bayesian methods. The performance of the neural network was further enhanced using ensemble techniques such as bagging and boosting. The performance of all the resulting classifiers was compared using the Michigan and Harvard data sets from the CAMDA website. Three gene selection methods were used to find significant genes that could discriminate between the various stages of lung cancer. Significant genes, which were mined from the Gene Ontology (GO) database using the GoMiner and AmiGO packages, were found to be involved in apoptosis, angiogenesis, and cell growth and differentiation. Neural networks enhanced with bagging exhibited the best performance among all the classifiers we tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
REFERENCES
Ando, T., M. Suguro, T. Hanai, T. Kobayashi, H. Honda and M. Seto (2002). “Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma.” Japanese Journal of Cancer Research 93(11): 1207–12.
Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin and G. Sherlock (2000). “Gene Ontology: tool for the unification of biology.” Nature Genetics 25: 25–29.
Beer, D. G., S. L. R. Kardia, C.-C. Huang, T. J. Giordano, A. M. Levin, D. E. Misek, L. Lin, G. Chen, T. G. Gharib, D. G. Thomas, M. L. Lizyness, R. Kuick, S. H. Hayasaka, J. M. G. Taylor, M. D. Iannettoni, M. B. Orringer and S. Hanash (2002). “Gene-expression profiles predict survival of patients with lung adenocarcinoma.” Nature Medicine 8(8): 816–24.
Bhattacharjee, A., W. G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E. J. Mark, E. S. Lander, W. Wong, B. E. Johnson, T. R. Golub, D. J. Sugarbaker and M. Meyerson (2001). “Expression profiling reveals distinct adenocarcinoma subclasses.” PNAS 98(24): 13790–13795.
Breiman, L. (1996). “Bagging predictors.” Machine Learning J. 246(2): 123–40.
Grey, S., S. Dlay, B. Leone, F. Cajone and G. Sherbet (2003). “Prediction of nodal spread of breast cancer by using artificial neural network-based analyses of S100A4, nm23 and steroid receptor expression.” Clin Exp Metastasis 20(6): 507–14.
Irizarry, R., B. Hobbs, F. Collin, Y. Beazer-Barclay, K. Antonellis, U. Scherf and T. Speed (2003). “Exploration, normalization, and summaries of high density oligonucleotide array probe level data.” Biostatistics 4(2): 249–264.
Japkowicz, N. (2000). Class imbalance problem: significance and strategies. International Conference on Artificial Intelligence (IC-AI’2000): Special Track on Inductive Learning, Las Vegas.
Khan, J., J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer (2001). “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.” Nat Med 7(6): 673–9.
Li, C. and W. H. Wong (2001). “Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection.” PNAS 98(1): 31–36.
Mateos, A., J. Herrero, J. Tamames and J. Dopazo (2002). Supervised Neural Networks for Clustering Conditions in DNA Array Data after Reducing Noise by Clustering Gene Expression Profiles. Methods of Microarray Data Analysis II. S. M. Lin and K. F. Johnson. Boston, Kluwer Academic Publishers.
Schapire, R. E. (1990). “The strength of weak learnability.” Machine Learning J. 5(2): 197–227.
Singhal, S., C. G. Kyvernitis, S. W. Johnson, L. R. Kaiser, M. N. Liebman and S. M. Albelda (2003). `“MicroArray Data Simulator For Improved Selection of Differentially Expressed Genes.” Cancer Biology & Therapy 2(4): 383–391.
Tusher, V. G., R. Tibshirani and G. Chu (2001). “Significance analysis of microarrays applied to the ionizing radiation response.” PNAS 98(9): 5116–5121.
Zeeberg, B. R., W. Feng, G. Wang, M. D. Wang, A. T. Fojo, M. Sunshine, S. Narasimhan, D. W. Kane, W. C. Reinhold, S. Lababidi, K. J. Bussey, J. Riss, J. C. Barrett and J. N. Weinstein (2003). “GoMiner: A Resource for Biological Interpretation of Genomic and Proteomic Data.” Genome Biology 4(4): R28.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science + Business Media, Inc. Boston
About this chapter
Cite this chapter
Zheng, G., Olusegun George, E., Narasimhan, G. (2005). Microarray Data Analysis Using Neural Network Classifiers and Gene Selection Methods. In: Shoemaker, J.S., Lin, S.M. (eds) Methods of Microarray Data Analysis. Springer, Boston, MA. https://doi.org/10.1007/0-387-23077-7_16
Download citation
DOI: https://doi.org/10.1007/0-387-23077-7_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-23074-0
Online ISBN: 978-0-387-23077-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)