Extracting Continuous Relevant Features

  • Amir Globerson
  • Gal Chechik
  • Naftali Tishby
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


The problem of extracting the relevant aspects of data, in face of multiple conflicting structures, is inherent to modeling of complex data. Extracting continuous structures in one random variable that are relevant for another variable has been principally addressed recently via the method of Sufficient dimensionality reduction. However, such auxiliary variables often contain both structures that are relevant and others that are irrelevant for the task in hand. Identifying the relevant structures was shown in the context of clustering to be considerably improved by minimizing the information about another, irrelevant, variable. In this paper we address the problem of extracting continuous relevant structures and derive its formal, as well as algorithmic, solution. Its operation is demonstrated in a synthetic example and in a real world application of face images, showing its superiority over current methods such as oriented principal component analysis.


Face Recognition Face Image Side Information Neural Information Processing System Continuous Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. CHECHIK, G. and TISHBY, N. (2003): Extracting relevant structures with side information. In: S. Becker, S. Thrun, and K. Obermayer (Eds.): Advances in Neural Information Processing Systems 15. MIT press, Cambridge, MA.Google Scholar
  2. COVER, T.M. and THOMAS, J.A. (1991): The elements of information theory. Plenum Press, New York.Google Scholar
  3. DARROCH, J.N. and RATCLIFF, D. (1972): Generalized iterative scaling for log-linear models. Ann. Math. Statist., 43, 1470–1480.MathSciNetMATHGoogle Scholar
  4. DELLA PIETRA, S., DELLA PIETRA, V., and LAFFERTY, J.D. (1997): Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 380–393.CrossRefGoogle Scholar
  5. DIAMANTARAS, K.I. and KUNG, S.Y. (1996): Principal Component Neural Networks: Theory and Applications. John Wiley, New York.MATHGoogle Scholar
  6. FISHER, R.A. (1922): On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society, A, 222, 309–368.CrossRefGoogle Scholar
  7. GLOBERSON, A. and TISHBY, N. (2003): Sufficient dimensionality reduction. Journal of Machine Learning Research, 3, 1307–1331.CrossRefMATHGoogle Scholar
  8. LEBANON, G. and LAFERRTY, J. (2002): Boosting and maximum likelihood for exponential models. In: T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.): Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA.Google Scholar
  9. MALOUF, R. (2002): A comparison of algorithms for maximum entropy parameter estimation. In: Sixth Conf. on Natural Language Learning, 49–55.Google Scholar
  10. MARTINEZ, A.M. and BENAVENTE, R. (1998): The AR face data base. Technical Report 24, Computer vision Center.Google Scholar
  11. MIKA, S., Ratsch, G., WESTON, J., SCHOLKOPF, B., SMOLA, A, and MULLER, K. (2000): Invariant feature extraction and classification in kernel space. In: S.A. Solla, T.K. Leen, and K.R. Muller (Eds.): Advances in Neural Information Processing Systems 12. MIT Press, Cambridge, MA, 526–532.Google Scholar
  12. WEINSHALL, D., SHENTAL, N., HERTZ, T., and PAVEL, M. (2002): Adjustment learning and relevant component analysis. In: 7th European Conference of Computer Vision (CECCV 2002), Volume IV, Lecture Notes on Computer Sciences, 776–792.Google Scholar
  13. SHANON, C.E. (1948): A mathematical theory of communication. The Bell systems technical journal, 27, 379–423, 623–656.MathSciNetGoogle Scholar
  14. WYNER, A.D. and ZIV, J. (1976): The rate distortion function for source coding with side information at the decoder. IEEE Trans. Inform. Theory, 22(1), 1–10.MathSciNetCrossRefMATHGoogle Scholar
  15. XING, E.P., NG, A.Y., Jordan, M.I., and RUSSELL, S. (2003): Distance metric learning, with applications to clusterin with side information. In: S. Becker, S. Thrun, and K. Obermayer (Eds.): Advances in Neural Information Processing Systems 15. MIT Press, Cambridge, MA.Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2005

Authors and Affiliations

  • Amir Globerson
    • 1
  • Gal Chechik
    • 1
  • Naftali Tishby
    • 1
  1. 1.School of Computer Science and Engineering and Interdisciplinary Center for Neural ComputationThe Hebrew UniversityJerusalemIsrael

Personalised recommendations