Machine Learning Techniques for Multimedia pp 91-112

Part of the Cognitive Technologies book series (COGTECH)

Dimension Reduction

  • Pádraig Cunningham

When data objects that are the subject of analysis using machine learning techniques are described by a large number of features (i.e. the data are high dimension) it is often beneficial to reduce the dimension of the data. Dimension reduction can be beneficial not only for reasons of computational efficiency but also because it can improve the accuracy of the analysis. The set of techniques that can be employed for dimension reduction can be partitioned in two important ways; they can be separated into techniques that apply to supervised or unsupervised learning and into techniques that either entail feature selection or feature extraction. In this chapter an overview of dimension reduction techniques based on this organization is presented and the important techniques in each category are described.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park. Fast algorithms for projected clustering. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 61–72, 1999. http://www.sigmod.org/sigma.Google Scholar
  2. 2.
    D.W. Aha and R.L. Bankert. A comparative evaluation of sequential feature selection algorithms. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, pages 1–7, 1995.Google Scholar
  3. 3.
    G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10):2385–2404, 2000.CrossRefGoogle Scholar
  4. 4.
    R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, 1961.Google Scholar
  5. 5.
    A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1–2):245–271, 1997.MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.MATHCrossRefGoogle Scholar
  7. 7.
    K. Bryan, P. Cunningham, and N. Bolshakova. Biclustering of expression data using simulated annealing. In CBMS, pages 383–388. IEEE Computer Society, 2005.Google Scholar
  8. 8.
    S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990.CrossRefGoogle Scholar
  9. 9.
    M. Devaney and A. Ram. Efficient feature selection in conceptual clustering. In D.H. Fisher, editor, ICML, pages 92–97. Morgan Kaufmann, 1997.Google Scholar
  10. 10.
    M. Doyle and P. Cunningham. A dynamic approach to reducing dialog in on-line decision guides. In E. Blanzieri and L. Portinale, editors, EWCBR, volume 1898 of Lecture Notes in Computer Science, pages 49–60. Springer, 2000.Google Scholar
  11. 11.
    R.C. Dubes. How many clusters are best?—an experiment. Pattern Recognition, 20(6):645–663, 1987.CrossRefGoogle Scholar
  12. 12.
    J.G. Dy and C.E. Brodley. Feature selection for unsupervised learning. The Journal of Machine Learning Research, 5:845–889, 2004.MathSciNetGoogle Scholar
  13. 13.
    D.H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2(2):139–172, 1987.Google Scholar
  14. 14.
    R.A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188, 1936.Google Scholar
  15. 15.
    K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, Inc, 2nd edition, 1990.Google Scholar
  16. 16.
    M.A. Gluck and J.E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283–287, Hillsdale, NJ, 1985. Lawrence Earlbaum.Google Scholar
  17. 17.
    J. Handl, J. Knowles, and D.B. Kell. Computational cluster validation in post-genomic data analysis. Bioinformatics, 21(15):3201–3212, 2005.CrossRefGoogle Scholar
  18. 18.
    J.A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123–129, 1972.CrossRefGoogle Scholar
  19. 19.
    X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. In NIPS, 2005.Google Scholar
  20. 20.
    X. He and P. Niyogi. Locality preserving projections. In S. Thrun, L.K. Saul, and B. Schölkopf, editors, NIPS. MIT Press, 2003.Google Scholar
  21. 21.
    D.R. Heisterkamp. Building a latent semantic index of an image database from patterns of relevance feedback. In ICPR (4), pages 134–137, 2002.Google Scholar
  22. 22.
    H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417–441, 1933.CrossRefGoogle Scholar
  23. 23.
    R. Huang, Q. Liu, H. Lu, and S. Ma. Solving the small sample size problem of LDA. In ICPR (3), pages 29–32, 2002.Google Scholar
  24. 24.
    A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analyis. John Wiley & Sons, Inc, 2001.Google Scholar
  25. 25.
    G.H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, pages 121–129, New Brunswick, NJ, 1994. Morgan Kaufmann.Google Scholar
  26. 26.
    R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, 1997.MATHCrossRefGoogle Scholar
  27. 27.
    Y. LeCun, J. Denker, S. Solla, R.E. Howard, and L.D. Jackel. Optimal brain damage. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems II, San Mateo, CA, 1990. Morgan Kauffman.Google Scholar
  28. 28.
    D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.CrossRefGoogle Scholar
  29. 29.
    W. Liu, Y. Wang, S.Z. Li, and T. Tan. Null space approach of fisher discriminant analysis for face recognition. In D. Maltoni and A.K. Jain, editors, ECCV Workshop BioAW, volume 3087 of Lecture Notes in Computer Science, pages 32–44. Springer, 2004.Google Scholar
  30. 30.
    J. Loughrey and P. Cunningham. Overfitting in wrapper-based feature subset selection: The harder you try the worse it gets. In 24th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2004), pages 33–43, 2004.Google Scholar
  31. 31.
    J. Loughrey and P. Cunningham. Using early-stopping to avoid overfitting in wrapper-based feature subset selection employing stochastic search. In M. Petridis, editor, In 10th UK Workshop on Case-Based Reasoning, pages 3–10. CMS Press, 2005.Google Scholar
  32. 32.
    G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, New York 1997.MATHGoogle Scholar
  33. 33.
    S. Mika, B. Schölkopf, A.J. Smola, K.R. Müller, M. Scholz, and G. Rätsch. Kernel PCA and de-noising in feature spaces. In M.J. Kearns, S.A. Solla, and D.A. Cohn, editors, NIPS, pages 536–542. The MIT Press, 1998.Google Scholar
  34. 34.
    D. Mladenic. Feature subset selection in text-learning. In C. Nedellec and C. Rouveirol, editors, ECML, volume 1398 of Lecture Notes in Computer Science, pages 95–100. Springer, Berlin, 1998.Google Scholar
  35. 35.
    D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html.Google Scholar
  36. 36.
    A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of Advances in Neural Information Processing, 2001. http://books.nips.cc.Google Scholar
  37. 37.
    A.Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 14(2):849–856, 2001.Google Scholar
  38. 38.
    S.K. Ng, Z. Zhu, and Y.S. Ong. Whole-genome functional classification of genes by latent semantic analysis on microarray data. In Y.-P. Phoebe Chen, editor, APBC, volume 29 of CRPIT, pages 123–129. Australian Computer Society, 2004.Google Scholar
  39. 39.
    J. Novovičová, A. Malìk, and P. Pudil. Feature selection using improved mutual information for text classification. In A.L. N. Fred, T. Caelli, R.P.W. Duin, A.C. Campilho, and D. de Ridder, editors, SSPR/SPR, volume 3138 of Lecture Notes in Computer Science, pages 1010–1017. Springer, Berlin, 2004.Google Scholar
  40. 40.
    J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA, 1993.Google Scholar
  41. 41.
    J. Reunanen. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3:1371–1382, 2003.MATHCrossRefGoogle Scholar
  42. 42.
    M. Saerens, F. Fouss, L. Yen, and P. Dupont. The principal components analysis of a graph, and its relationships to spectral clustering. In Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence, 3201:371–383, 2004.Google Scholar
  43. 43.
    E. Sahouria and A. Zakhor. Content analysis of video using principal componets. In ICIP (3), pages 541–545, IEEE Computer Society, 1998.Google Scholar
  44. 44.
    P. Smaragdis, B. Raj, and M. Shashanka. A probabilistic latent variable model for acoustic modeling. In Workshop on Advances in Models for Acoustic Processing at NIPS 2006, 2006. http://www.idiap.ch/amac.Google Scholar
  45. 45.
    M. Sugiyama. Local Fisher discriminant analysis for supervised dimensionality reduction. In W.W. Cohen and A. Moore, editors, ICML, pages 905–912. ACM, 2006.Google Scholar
  46. 46.
    M. West. Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Statistics, 7:723–732, 2003.Google Scholar
  47. 47.
    L. Wolf and A. Shashua. Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. Journal of Machine Learning Research, 6:1855–1887, 2005.MathSciNetGoogle Scholar
  48. 48.
    S. Wu and P.A. Flach. Feature selection with labelled and unlabelled data. In Proceedings of ECML/PKDD’02 Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pages 156–167, 2002.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Pádraig Cunningham
    • 1
  1. 1.University College DublinIreland

Personalised recommendations