Microarray Data Mining: Selecting Trustworthy Genes with Gene Feature Ranking

Franco, Ubaudi A.; Paul, J. Kennedy; Daniel, R. Catchpoole; Dachuan, Guo; Simeon, J. Simoff

doi:10.1007/978-0-387-79420-4_11

Ubaudi A. Franco⁴,
J. Kennedy Paul⁴,
R. Catchpoole Daniel⁴,
Guo Dachuan⁴ &
…
J. Simoff Simeon⁴

2017 Accesses

Gene expression datasets used in biomedical data mining frequently have two characteristics: they have many thousand attributes but only relatively few sample points and the measurements are noisy. In other words, individual expression measurements may be untrustworthy. Gene Feature Ranking (GFR) is a feature selection methodology that addresses these domain specific characteristics by selecting features (i.e. genes) based on two criteria: (i) how well the gene can discriminate between classes of patient and (ii) the trustworthiness of the microarray data associated with the gene. An example from the pediatric cancer domain demonstrates the use of GFR and compares its performance with a feature selection method that does not explicitly address the trustworthiness of the underlying data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hardiman, G.: Microarray technologies - an overview. Pharamacogenomics 3 (2002) 293–297
Article Google Scholar
Schena, M.: Microarray Biochip Technology. BioTechniques Press, Westborough, MA (2000)
Google Scholar
Bolstad, B., et al.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19 (2003) 185–193
Article Google Scholar
Weng, L., Dai, H., Zhan, Y., He, Y., Stepaniants, S.B., Bassett, D.E.: Rosetta error model for gene expression analysis. Bioinformatics 22 (2006) 1111–1121
Article Google Scholar
Seo, J., Gordish-Dressman, H., Hoffman, E.P.: An interactive power analysis tool for microar-ray hypothesis testing and generation. Bioinformatics 22 (2006) 808–814
Article Google Scholar
Tsai, C.A., et al.: Sample size for gene expression microarray experiments. Bioinformatics 21 (2005) 1502–1508
Article Google Scholar
Baldi, P., Hatfield, G.W.: DNA Microarrays and Gene Expression: from experiments to data analysis and modeling. Cambridge University Press (2002)
Google Scholar
Golub, T., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 (1999) 7
Article Google Scholar
Mukherjee, S., Tamayo, P., Slonim, D.K., Verri, A., Golub, T.R., Mesirov, J.P., Poggio, T.: Support vector machine classification of microarray data. AI memo 182. CBCL paper 182. Technical report, MIT (2000) Can be retrieved from ftp://publications.ai.mit.edu.
Google Scholar
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97 (1997) 245–271
Article MATH MathSciNet Google Scholar
Yang, J., Hanavar, V.: Feature subset selection using a genetic algorithm. Technical report, Iowa State University (1997+)
Google Scholar
Efron, B., Tibshirani, R., Goss, V., Chu, G.: Microarrays and their use in a comparative experiment. Technical report, Stanford University (2000)
Google Scholar
Bellman, R.E.: Adaptive Control Processes. Princeton University Press (1961)
MATH Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Eleventh International Conference (Machine Learning), Kaufmann Morgan (1994) 121–129
Google Scholar
Saeys, Y., Inza, I., et al.: A review of feature selection tecnhiques in bioinformatics. Bioinfor-matics 23 (2007) 2507–2517
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Machine Learning Research (2003) 1157–1182
Google Scholar
Wang, X., Ghosh, S., Guo, S.W.: Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Research 29 (2001) 8
Article Google Scholar
Park, T., Yi, S.G., Lee, S., Lee, J.K.: Diagnostic plots for detecting outlying slides in a cDNA microarray experiment. BioTechniques 38 (2005) 463–471
Article Google Scholar
Yu, Y., Khan, J., et al.: Expression profiling identifies the cytoskeletal organizer ezrin and the developmental homeoprotein six- 1 as key metastatic regulators. Nature Medicine 10 (2004) 175–181
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1 (1986) 81–106
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, San Francisco, Morgan Kaufmann (1996) 148–156
Google Scholar
Dawson, B., Trapp, R.G.: Basic & Clinical Biostatistics. Third edn. Health Professions. McGraw-Hill Higher Education, Singapore (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of IT, University of Technology, Sydney
Ubaudi A. Franco, J. Kennedy Paul, R. Catchpoole Daniel, Guo Dachuan & J. Simoff Simeon

Authors

Ubaudi A. Franco
View author publications
You can also search for this author in PubMed Google Scholar
J. Kennedy Paul
View author publications
You can also search for this author in PubMed Google Scholar
R. Catchpoole Daniel
View author publications
You can also search for this author in PubMed Google Scholar
Guo Dachuan
View author publications
You can also search for this author in PubMed Google Scholar
J. Simoff Simeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ubaudi A. Franco .

Editor information

Editors and Affiliations

School of Software Faculty of Engineering and Information Technology, University of Technology, PO Box 123, Sydney, Broadway, NSW 2007, Australia
Longbing Cao & Huaifeng Zhang &
Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan St., Chicago, IL, 60607
Philip S. Yu
Centre for Quantum Computation and Intelligent Systems Faculty of Engineering and Information Technology, University of Technology, PO Box 123, Sydney, Broadway, NSW 2007, Australia
Chengqi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Franco, U.A., Paul, J.K., Daniel, R.C., Dachuan, G., Simeon, J.S. (2009). Microarray Data Mining: Selecting Trustworthy Genes with Gene Feature Ranking. In: Cao, L., Yu, P.S., Zhang, C., Zhang, H. (eds) Data Mining for Business Applications. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79420-4_11

Download citation

DOI: https://doi.org/10.1007/978-0-387-79420-4_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79419-8
Online ISBN: 978-0-387-79420-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics