Finding High-Order Correlations in High-Dimensional Biological Data

  • Xiang Zhang
  • Feng Pan
  • Wei Wang


In many emerging real-life problems, the number of dimensions in the data sets can be from thousands to millions. The large number of features poses great challenge to existing high-dimensional data analysis methods. One particular issue is that the latent patterns may only exist in subspaces of the full-dimensional space. In this chapter, we discuss the problem of finding correlations hidden in feature subspaces. Both linear and nonlinear cases will be discussed. We present efficient algorithms for finding such correlated feature subsets.


Principal Component Analysis Correlation Dimension Feature Subset Local Correlation Subspace Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns, Proceedings of National Acadamy of Science USA, 95:14863–14868, 1998.CrossRefGoogle Scholar
  2. 2.
    V. Iyer and et. al. The transcriptional program in the response of human fibroblasts to serum. Science, 283:83–87, 1999.PubMedCrossRefGoogle Scholar
  3. 3.
    L. Parsons, E. Haque, and H. Liu. Subspae clustering for high dimensional data: a review, In KDD Explorations, 6(1): 90–105, 2004.CrossRefGoogle Scholar
  4. 4.
    A. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, 97: 245–271, 1997.CrossRefGoogle Scholar
  5. 5.
    H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston, MA, 1998.CrossRefGoogle Scholar
  6. 6.
    L. Yu and H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proceedings of International Conference on Machine Learning, 856–863, 2003.Google Scholar
  7. 7.
    Z. Zhao and H. Liu. Searching for interacting features, In The 20th International Joint Conference on AI, 1156–1161, 2007.Google Scholar
  8. 8.
    M. Belkin and P. Niyogi. “laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003.Google Scholar
  9. 9.
    T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 1996.Google Scholar
  10. 10.
    I. Borg and P. Groenen. Modern multidimensional scaling. Springer, New York, 1997.Google Scholar
  11. 11.
    I. Jolliffe. Principal Component Analysis. Springer, New York, 1986.CrossRefGoogle Scholar
  12. 12.
    S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323–2326, 2000.PubMedCrossRefGoogle Scholar
  13. 13.
    J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500):2319–2323, 2000.PubMedCrossRefGoogle Scholar
  14. 14.
    C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. In SIGMOD, 2000.Google Scholar
  15. 15.
    E. Achtert, C. Bohm, H.-P. Kriegel, P. Kroger, and A. Zimek. Deriving quantitative models for correlation clusters. In KDD, 2006.Google Scholar
  16. 16.
    H. Wang, W. Wang, J. Yang, and Y. Yu. Clustering by pattern similarity in large data sets. In SIGMOD, 2002.Google Scholar
  17. 17.
    M. Ashburner et al. Gene ontology: tool for the unification of biology, The gene ontology consortium, Nature Genetics, 25:25–29, 2000.Google Scholar
  18. 18.
    X. Zhang, F. Pan, and W. Wang. Care: Finding local linear correlations in high dimensional data. In ICDE, 130–139, 2008.Google Scholar
  19. 19.
    K. Fukunaga. Intrinsic dimensionality extraction. Classification, Pattern recongnition and Reduction of Dimensionality, Volume 2 of Handbook of Statistics, pages 347–360, P. R. Krishnaiah and L. N. Kanal editors, Amsterdam, North Holland, 1982.CrossRefGoogle Scholar
  20. 20.
    F. Camastra and A. Vinciarelli. Estimating intrinsic dimension of data with a fractal-based approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(10):1404–1407, 2002.CrossRefGoogle Scholar
  21. 21.
    K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):165–171, 1976.Google Scholar
  22. 22.
    E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 2005.Google Scholar
  23. 23.
    R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD, 94–105, 1998.Google Scholar
  24. 24.
    C. Aggarwal, J. Wolf, P. Yu, C. Procopiuc, and J. Park. Fast algorithms for projected clustering. In SIGMOD, 61–72, 1999.Google Scholar
  25. 25.
    C. Chen, A. Fu, and Y. Zhang. Entropy-based subspace clustering for mining numerical data. In SIGKDD, 84–93, 1999.Google Scholar
  26. 26.
    D. Barbara and P. Chen. Using the fractal dimension to cluster datasets. In KDD, 260–264, 2000.Google Scholar
  27. 27.
    A. Gionis, A. Hinneburg, S. Papadimitriou, and P. Tsaparas. Dimension induced clustering. In KDD, 2005.Google Scholar
  28. 28.
    S. Papadimitriou, H. Kitawaga, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, 2003.Google Scholar
  29. 29.
    B. U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In ICDE, 589, 2000.Google Scholar
  30. 30.
    A. Belussi and C. Faloutsos. Self-spacial join selectivity estimation using fractal concepts. ACM Transactions on Information Systems, 16(2):161–201, 1998.CrossRefGoogle Scholar
  31. 31.
    C. Faloutsos and I. Kamel. Beyond uniformity and independence: analysis of r-trees using the concept of fractal dimension. In PODS, 1994.Google Scholar
  32. 32.
    G. Golub and A. Loan. Matrix computations. Johns Hopkins University Press, Baltimore, MD, 1996.Google Scholar
  33. 33.
    S. N. Rasband. Chaotic Dynamics of Nonlinear Systems. Wiley, 1990.Google Scholar
  34. 34.
    M. Schroeder. Fractals, Chaos, Power Lawers: Minutes from an Infinite Paradise. W. H. Freeman, New York, 1991.Google Scholar
  35. 35.
    R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge UK, 1985.Google Scholar
  36. 36.
    D. C. Lay. Linear Algebra and Its Applications. Addison Wesley, 2005.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations