Use of Classification Algorithms in Noise Detection and Elimination

  • André L. B. Miranda
  • Luís Paulo F. Garcia
  • André C. P. L. F. Carvalho
  • Ana C. Lorena
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5572)


Data sets in Bioinformatics usually present a high level of noise. Various processes involved in biological data collection and preparation may be responsible for the introduction of this noise, such as the imprecision inherent to laboratory experiments generating these data. Using noisy data in the induction of classifiers through Machine Learning techniques may harm the classifiers prediction performance. Therefore, the predictions of these classifiers may be used for guiding noise detection and removal. This work compares three approaches for the elimination of noisy data from Bioinformatics data sets using Machine Learning classifiers: the first is based in the removal of the detected noisy examples, the second tries to reclassify these data and the third technique, named hybrid, unifies the previous approaches.


Noise Machine Learning Gene Expression and Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zhu, X., Wu, X.: Class noise vs. Attribute noise: A quantitative study of their impacts. Artificial Intelligence Review 22(3), 177–210 (2004)CrossRefzbMATHGoogle Scholar
  2. 2.
    Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11(2), 171–190 (2007)CrossRefGoogle Scholar
  3. 3.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  4. 4.
    Noble, W.S.: Kernel Methods in Computational Biology. In: Support vector machines applications in computational biology, ch. 3, pp. 71–92. MIT Press, Cambridge (2004)Google Scholar
  5. 5.
    Haykin, S.: Neural Network – A Compreensive foundation, 2nd edn. Prentice-Hall, New Jersey (1999)zbMATHGoogle Scholar
  6. 6.
    Breiman, L., Friedman, F., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth (1984)Google Scholar
  7. 7.
    Verbaeten, S., Assche, A.V.: Ensemble Methods for noise elimination in Classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)CrossRefzbMATHGoogle Scholar
  9. 9.
    Demsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Mack, D.H., Tom, E.Y., Mahadev, M., Dong, H., Mittman, M., Dee, S., Levine, A.J., Gingeras, T.R., Lockhart, D.J.: Biology of Tumors. In: Mihich, K., Croce, C. (eds.), pp. 123–131. Plenum, New York (1998)Google Scholar
  11. 11.
    Brown, M., Grundy, W., Lin, D., Christianini, N., Sugnet, C., Haussler, D.: Support Vector Machines Classication of Microarray Gene Expression Data, Technical Report UCSC-CRL 99-09, Department of Computer Science, University California Santa Cruz, Santa Cruz, CA (1999)Google Scholar
  12. 12.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classication of tumors using gene expression data. Technical Report 576, Department of Statistics, UC Berkeley (2000)Google Scholar
  13. 13.
    Yeoh, E.J., Ross, M.E., Shurtle, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)CrossRefGoogle Scholar
  14. 14.
    Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: A resampling based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1-2), 91–118 (2003)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • André L. B. Miranda
    • 1
  • Luís Paulo F. Garcia
    • 1
  • André C. P. L. F. Carvalho
    • 1
  • Ana C. Lorena
    • 2
  1. 1.Instituto de Ciências Matemáticas e ComputaçãoUniversidade de São Paulo USPSão CarlosBrazil
  2. 2.Centro de Matemática, Computação e CogniçãoUniversidade Federal do ABC UFABCSanto AndréBrazil

Personalised recommendations