On the Homogenization of Data from Two Laboratories Using Genetic Programming

  • Jose G. Moreno-Torres
  • Xavier Llorà
  • David E. Goldberg
  • Rohit Bhargava
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6471)

Abstract

In experimental sciences, diversity tends to difficult predictive models’ proper generalization across data provided by different laboratories. Thus, training on a data set produced by one lab and testing on data provided by another lab usually results in low classification accuracy. Despite the fact that the same protocols were followed, variability on measurements can introduce unforeseen variations that affect the quality of the model. This paper proposes a Genetic Programming based approach, where a transformation of the data from the second lab is evolved driven by classifier performance. A real-world problem, prostate cancer diagnosis, is presented as an example where the proposed approach was capable of repairing the fracture between the data of two different laboratories.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wyse, N., Dubes, R., Jain, A.: A critical evaluation of intrinsic dimensionality algorithmsa critical evaluation of intrinsic dimensionality algorithms. In: Gelsema, E.S., Kanal, L.N. (eds.) Pattern recognition in practice, Amsterdam, pp. 415–425. Morgan Kauffman Publishers, Inc., San Francisco (1980)Google Scholar
  2. 2.
    Kim, K.A., Oh, S.Y., Choi, H.C.: Facial feature extraction using pca and wavelet multi-resolution images. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, p. 439. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  3. 3.
    Podolak, I.T.: Facial component extraction and face recognition with support vector machines. In: FGR 2002: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, p. 83. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  4. 4.
    Pei, M., Goodman, E.D., Punch, W.F.: Pattern discovery from data using genetic algorithms. In: Proceeding of 1st Pacific-Asia Conference Knowledge Discovery & Data Mining, PAKDD 1997 (1997)Google Scholar
  5. 5.
    Liu, H., Motoda, H.: Feature extraction, construction and selection: a data mining perspective. SECS, vol.  453. Kluwer Academic, Boston (1998)Google Scholar
  6. 6.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  7. 7.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)Google Scholar
  8. 8.
    Tackett, W.A.: Genetic programming for feature discovery and image discrimination. In: Proceedings of the 5th International Conference on Genetic Algorithms, pp. 303–311. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  9. 9.
    Sherrah, J.R., Bogner, R.E., Bouzerdoum, A.: The evolutionary pre-processor: Automatic feature extraction for supervised classification using genetic programming. In: Proc. 2nd International Conference on Genetic Programming (GP 1997), pp. 304–312. Morgan Kaufmann, San Francisco (1997)Google Scholar
  10. 10.
    Kotani, M., Ozawa, S., Nakai, M., Akazawa, K.: Emergence of feature extraction function using genetic programming. In: KES, pp. 149–152 (1999)Google Scholar
  11. 11.
    Bot, M.C.J.: Feature extraction for the k-nearest neighbour classifier with genetic programming. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 256–267. Springer, Heidelberg (2001)Google Scholar
  12. 12.
    Zhang, Y., Rockett, P.I.: A generic optimal feature extraction method using multiobjective genetic programming. Technical Report VIE 2006/001, Department of Electronic and Electrical Engineering, University of Sheffield, UK (2006)Google Scholar
  13. 13.
    Guo, H., Nandi, A.K.: Breast cancer diagnosis using genetic programming generated feature. Pattern Recognition 39(5), 980–987 (2006)CrossRefGoogle Scholar
  14. 14.
    Zhang, Y., Rockett, P.I.: A generic multi-dimensional feature extraction method using multiobjective genetic programming. Evolutionary Computation 17(1), 89–115 (2009)CrossRefGoogle Scholar
  15. 15.
    Harris, C.: An investigation into the Application of Genetic Programming techniques to Signal Analysis and Feature Detection,September. University College, London (September 26, 1997)Google Scholar
  16. 16.
    Smith, M.G., Bull, L.: Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265–281 (2005)CrossRefGoogle Scholar
  17. 17.
    Wang, K., Zhou, S., Fu, C.A., Yu, J.X., Jeffrey, F., Yu, X.: Mining changes of classification by correspondence tracing. In: Proceedings of the 2003 SIAM International Conference on Data Mining, SDM 2003 (2003)Google Scholar
  18. 18.
    Yang, Y., Wu, X., Zhu, X.: Conceptual equivalence for contrast mining in classification learning. Data & Knowledge Engineering 67(3), 413–429 (2008)CrossRefGoogle Scholar
  19. 19.
    Cieslak, D.A., Chawla, N.V.: A framework for monitoring classifiers’ performance: when and why failure occurs? Knowledge and Information Systems 18(1), 83–108 (2009)CrossRefGoogle Scholar
  20. 20.
    Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)MATHGoogle Scholar
  21. 21.
  22. 22.
    Fernandez, D.C., Bhargava, R., Hewitt, S.M., Levin, I.W.: Infrared spectroscopic imaging for histopathologic recognition. Nature Biotechnology 23(4), 469–474 (2005)CrossRefGoogle Scholar
  23. 23.
    Levin, I.W., Bhargava, R.: Fourier transform infrared vibrational spectroscopic imaging: integrating microscopy and molecular recognition. Annual Review of Physical Chemistry 56, 429–474 (2005)CrossRefGoogle Scholar
  24. 24.
    Llorà, X., Reddy, R., Matesic, B., Bhargava, R.: Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation GECCO 2007, pp. 2098–2105. ACM, New York (2007)Google Scholar
  25. 25.
    Llorà, X., Priya, A., Bhargava, R.: Observer-invariant histopathology using genetics-based machine learning. Natural Computing: An International Journal 8(1), 101–120 (2009)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  27. 27.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetMATHGoogle Scholar
  28. 28.
    García, S., Herrera, F.: An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)MATHGoogle Scholar
  29. 29.
    García, S., Fernández, A., Luengo, J., Herrera, F.: A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability. Soft Computing 13(10), 959–977 (2009)CrossRefGoogle Scholar
  30. 30.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  31. 31.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1(6), 80–83 (1945)CrossRefGoogle Scholar
  32. 32.
    Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jose G. Moreno-Torres
    • 1
  • Xavier Llorà
    • 2
  • David E. Goldberg
    • 3
  • Rohit Bhargava
    • 4
  1. 1.Department of Computer Science and Artificial IntelligenceUniversidad de GranadaGranadaSpain
  2. 2.National Center for Supercomputing Applications (NCSA)University of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.Illinois Genetic Algorithms Laboratory (IlliGAL)University of Illinois at Urbana-ChampaignUrbanaUSA
  4. 4.Department of BioengineeringUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations