Unleashing Pearson Correlation for Faithful Analysis of Biomedical Data

  • Marc Strickert
  • Frank-Michael Schleif
  • Thomas Villmann
  • Udo Seiffert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5400)

Abstract

Pearson correlation is one of the standards for comparisons in biomedical analyses, possessing yet unused potential. Substantial value is added by transferring Pearson correlation into the framework of adaptive similarity measures and by exploiting properties of the mathematical derivatives. This opens access to optimization-based data models applicable in tasks of attribute characterization, clustering, classification, and visualization. Modern high-throughput measuring equipment creates high demand for analysis of extensive biomedical data including spectra and high-resolution gel-electrophoretic images. In this study cDNA arrays are considered as data sources of interest. Recent computational methods are presented for the characterization and analysis of these huge-dimensional data sets.

Keywords

high-dimensional data mining feature rating clustering data visualization parametric correlation measure 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anscombe, F.J.: Graphs in statistical analysis. American Statistician 27, 17–21 (1973)Google Scholar
  2. 2.
    Azuaje, F., Dopazo, J.: Data Analysis and Visualization in Genomics and Proteomics. Wiley, Chichester (2005)CrossRefGoogle Scholar
  3. 3.
    Balasubramaniyan, R., Hüllermeier, E., Weskamp, N., Kämper, J.: Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 21(7), 1069–1077 (2005)CrossRefPubMedGoogle Scholar
  4. 4.
    Bar-Joseph, Z., Gifford, D.K., Jaakkola, T.S.: Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(suppl. 1), S22–S29 (2001)CrossRefGoogle Scholar
  5. 5.
    Blest, D.: Rank correlation – an alternative measure. Australian & New Zealand Journal of Statistics 42(1), 101–111 (2000)CrossRefGoogle Scholar
  6. 6.
    Bloom, J., Adami, C.: Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evolutionary Biology 3(1), 21 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Buja, A., Swayne, D., Littman, M., Dean, N., Hofmann, H.: Interactive Data Visualization with Multidimensional Scaling. Report, University of Pennsylvania (2004), http://www-stat.wharton.upenn.edu/~buja/
  8. 8.
    Cottrell, M., Hammer, B., Hasenfuß, A., Villmann, T.: Batch NG. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks (ESANN), pp. 275–282. D-side Publications (2005)Google Scholar
  9. 9.
    Cox, M., Cox, M.: Multidimensional Scaling. Chapman and Hall, Boca Raton (2001)Google Scholar
  10. 10.
    Ferguson, T., Genest, C., Hallin, M.: Kendall’s Tau for autocorrelation. The Canadian Journal of Statistics 28(3), 587–604 (2000)CrossRefGoogle Scholar
  11. 11.
    Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  12. 12.
    Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)CrossRefGoogle Scholar
  13. 13.
    Johnson, S.: Hierarchical Clustering Schemes. Psychometrika 2, 241–254 (1967)CrossRefGoogle Scholar
  14. 14.
    Kaski, S.: Dimensionality reduction by random mapping: Fast similarity computation for clustering. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 1998), vol. 1, pp. 413–418. IEEE Service Center, Piscataway (1998)Google Scholar
  15. 15.
    Kaski, S., Nikkila, J., Oja, M., Venna, J., Toronen, P., Castren, E.: Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics 4(1), 48 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2001)CrossRefGoogle Scholar
  17. 17.
    Lee, J., Verleysen, M.: Nonlinear Dimension Reduction. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  18. 18.
    Lee, J., Verleysen, M.: Rank-based quality assessment of nonlinear dimensionality reduction. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks (ESANN), pp. 49–54. D-facto Publications (2008)Google Scholar
  19. 19.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)Google Scholar
  20. 20.
    Lohninger, H.: Teach/Me Data Analysis. Springer, Heidelberg (1999)Google Scholar
  21. 21.
    Ma, Y., Lao, S., Takikawa, E., Kawade, M.: Discriminant analysis in correlation similarity measure space. In: Ghahramani, Z. (ed.) Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pp. 577–584. Omnipress (2007)Google Scholar
  22. 22.
    Mardia, K., Dryden, I.: Statistical Shape Analysis. Wiley, Chichester (1998)Google Scholar
  23. 23.
    Martinetz, T., Berkovich, S., Schulten, K.: “Neural-gas” network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks 4(4), 558–569 (1993)CrossRefPubMedGoogle Scholar
  24. 24.
    Martinetz, T., Schulten, K.: A ”neural-gas” network learns topologies. Artificial Neural Networks I, 397–402 (1991)Google Scholar
  25. 25.
    Meuleman, W., Engwegen, J., Gast, M.-C., Beijnen, J., Reinders, M., Wessels, L.: Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data. BMC Bioinformatics 9(1), 88 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Nielsen, N., Carstensen, J., Smedsgaard, J.: Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping. Journal of Chromatography 805, 17–35 (1998)CrossRefGoogle Scholar
  27. 27.
    Sreenivasulu, N., Radchuk, V., Strickert, M., Miersch, O., Weschke, W., Wobus, U.: Gene expression patterns reveal tissue-specific signaling networks controlling programmed cell death and ABA-regulated maturation in developing barley seeds. The Plant Journal 47(2), 310–327 (2006)CrossRefPubMedGoogle Scholar
  28. 28.
    Strickert, M., Schleif, F.-M., Seiffert, U., Villmann, T.: Derivatives of Pearson correlation for gradient-based analysis of biomedical data. Inteligencia Artificial, Revista Iberoamericana de IA 12(37), 37–44 (2008)Google Scholar
  29. 29.
    Strickert, M., Schleif, F.-M., Villmann, T.: Metric adaptation for supervised attribute rating. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks (ESANN), pp. 31–36. D-facto Publications (2008)Google Scholar
  30. 30.
    Strickert, M., Seiffert, U., Sreenivasulu, N., Weschke, W., Villmann, T., Hammer, B.: Generalized relevance LVQ (GRLVQ) with correlation measures for gene expression data. Neurocomputing 69, 651–659 (2006)CrossRefGoogle Scholar
  31. 31.
    Strickert, M., Sreenivasulu, N., Seiffert, U.: Sanger-driven MDSLocalize - A comparative study for genomic data. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks (ESANN), pp. 265–270. D-facto Publications (2006)Google Scholar
  32. 32.
    Strickert, M., Sreenivasulu, N., Usadel, B., Seiffert, U.: Correlation-maximizing surrogate gene space for visual mining of gene expression patterns in developing barley endosperm tissue. BMC Bioinformatics 8(165) (2007)Google Scholar
  33. 33.
    Strickert, M., Sreenivasulu, N., Villmann, T., Hammer, B.: Robust centroid-based clustering using derivatives of Pearson correlation. In: Proc. Int. Joint Conf. Biomedical Engineering Systems and Technologies, BIOSIGNALS, Madeira (2008)Google Scholar
  34. 34.
    Strickert, M., Teichmann, S., Sreenivasulu, N., Seiffert, U.: High-Throughput Multi-Dimensional Scaling (HiT-MDS) for cDNA-array expression data. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 625–633. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  35. 35.
    Strickert, M., Witzel, K., Mock, H.-P., Schleif, F.-M., Villmann, T.: Supervised attribute relevance determination for protein identification in stress experiments. In: Proceedings of Machine Learning in Systems Biology (MLSB 2007), pp. 81–86 (2007)Google Scholar
  36. 36.
    Venna, J., Kaski, S.: Neighborhood preservation in nonlinear projection methods: An experimental study. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Proceedings of the International Conference on Artificial Neural Networks (ICANN), pp. 485–591. Springer, Heidelberg (2001)Google Scholar
  37. 37.
    Villmann, T., Claussen, J.C.: Magnification control in self-organizing maps and neural gas. Neural Computation 18(2), 446–469 (2006)CrossRefPubMedGoogle Scholar
  38. 38.
    Villmann, T., Schleif, F.-M., Hammer, B.: Comparison of Relevance Learning Vector Quantization with other Metric Adaptive Classification Methods. Journal of Neural Networks 19(5), 610–622 (2006)CrossRefPubMedGoogle Scholar
  39. 39.
    Xu, W., Chang, C., Hung, Y., Kwan, S., Fung, P.: Order Statistics Correlation Coefficient as a Novel Association Measurement with Applications to Biosignal Analysis. IEEE Transactions on Signal Processing 55(12), 5552–5563 (2007)CrossRefGoogle Scholar
  40. 40.
    Yang, L.: An overview of distance metric learning. Technical report, Department of Computer Science and Engineering, Michigan State University (2007)Google Scholar
  41. 41.
    Zhou, X., Kao, M.-C.J., Wong, W.H.: Transitive functional annotation by shortest-path analysis of gene expression data. PNAS 99(20), 12783–12788 (2002)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Marc Strickert
    • 1
  • Frank-Michael Schleif
    • 2
  • Thomas Villmann
    • 2
  • Udo Seiffert
    • 3
  1. 1.Data Inspection GroupLeibniz Institute of Plant Genetics and Crop Plant Research (IPK)GaterslebenGermany
  2. 2.Research group Computational IntelligenceUniversity of LeipzigGermany
  3. 3.Biosystems EngineeringFraunhofer Institute for Factory Operation and Automation (IFF)MagdeburgGermany

Personalised recommendations