Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent Propagation

  • Patrice Y. Simard
  • Yann A. LeCun
  • John S. Denker
  • Bernard Victorri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7700)


In pattern recognition, statistical modeling, or regression, the amount of data is a critical factor affecting the performance. If the amount of data and computational resources are unlimited, even trivial algorithms will converge to the optimal solution. However, in the practical case, given limited data and other resources, satisfactory performance requires sophisticated methods to regularize the problem by introducing a priori knowledge. Invariance of the output with respect to certain transformations of the input is a typical example of such a priori knowledge. In this chapter, we introduce the concept of tangent vectors, which compactly represent the essence of these transformation invariances, and two classes of algorithms, “tangent distance” and “tangent propagation”, which make use of these invariances to improve performance.


Tangent Vector Tangent Plane Near Neighbor Handwritten Digit Distortion Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Devijver, P.A., Kittler, J.: Pattern Recognition, A Statistical Approache. Prentice-Hall, Englewood Cliffs (1982)zbMATHGoogle Scholar
  2. 2.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structure and Algorithms. Addison-Wesley (1983)Google Scholar
  3. 3.
    Bottou, L., Vapnik, V.N.: Local learning algorithms. Neural Computation 4(6), 888–900 (1992)CrossRefGoogle Scholar
  4. 4.
    Broder, A.J.: Strategies for efficient incremental nearest neighbor search. Pattern Recognition 23, 171–178 (1990)CrossRefGoogle Scholar
  5. 5.
    Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Systems 2, 321–355 (1988)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Choquet-Bruhat, Y., DeWitt-Morette, C., Dillard-Bleick, M.: Analysis, Manifolds and Physics. North-Holland, Amsterdam (1982)zbMATHGoogle Scholar
  7. 7.
    Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  8. 8.
    Dasarathy, B.V.: Nearest Neighbor (NN) Norms: NN Pattern classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)Google Scholar
  9. 9.
    Drucker, H., Schapire, R., Simard, P.Y.: Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence 7(4), 705–719 (1993)CrossRefGoogle Scholar
  10. 10.
    Fukunaga, K., Flick, T.E.: An optimal global nearest neighbor metric. IEEE transactions on Pattern analysis and Machine Intelligence 6(3), 314–318 (1984)CrossRefzbMATHGoogle Scholar
  11. 11.
    Gilmore, R.: Lie Groups, Lie Algebras and some of their Applications. Wiley, New York (1974)zbMATHGoogle Scholar
  12. 12.
    Hastie, T., Kishon, E., Clark, M., Fan, J.: A model for signature verification. Technical Report 11214-910715-07TM, AT&T Bell Laboratories (July 1991)Google Scholar
  13. 13.
    Hastie, T., Simard, P.Y.: Metrics and models for handwritten character recognition. Statistical Science 13 (1998)Google Scholar
  14. 14.
    Hastie, T.J., Tibshirani, R.J.: Generalized Linear Models. Chapman and Hall, London (1990)zbMATHGoogle Scholar
  15. 15.
    Hinton, G.E., Williams, C.K.I., Revow, M.D.: Adaptive elastic models for hand-printed character recognition. In: Advances in Neural Information Processing Systems, pp. 512–519. Morgan Kaufmann Publishers (1992)Google Scholar
  16. 16.
    Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12, 55–67 (1970)CrossRefzbMATHGoogle Scholar
  17. 17.
    Kohonen, T.: Self-organization and associative memory. Springer Series in Information Sciences, vol. 8. Springer (1984)Google Scholar
  18. 18.
    Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, vol. 2, Morgan Kaufmann, Denver (1989)Google Scholar
  19. 19.
    LeCun, Y.: Generalization and network design strategies. In: Pfeifer, R., Schreter, Z., Fogelman, F., Steels, L. (eds.) Connectionism in Perspective, Zurich, Switzerland (1989); Elsevier, An extended version was published as a technical report of the University of TorontoGoogle Scholar
  20. 20.
    LeCun, Y., Jackel, L.D., Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Learning algorithms for classification: A comparison on handwritten digit recognition. In: Oh, J.H., Kwon, C., Cho, S. (eds.) Neural Networks: The Statistical Mechanics Perspective, pp. 261–276. World Scientific (1995)Google Scholar
  21. 21.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes. Cambridge University Press, Cambridge (1988)zbMATHGoogle Scholar
  23. 23.
    Schwenk, H.: The diabolo classifier. Neural Computation (1998) (in press)Google Scholar
  24. 24.
    Sibson, R.: Studies in the robustness of multidimensional scaling: Procrustes statistices. J. R. Statist. Soc. 40, 234–238 (1978)zbMATHGoogle Scholar
  25. 25.
    Simard, P.Y.: Efficient computation of complex distance metrics using hierarchical filtering. In: Advances in Neural Information Processing Systems. Morgan Kaufmann Publishers (1994)Google Scholar
  26. 26.
    Sinden, F., Wilfong, G.: On-line recognition of handwritten symbols. Technical Report 11228-910930-02IM, AT&T Bell Laboratories (June 1992)Google Scholar
  27. 27.
    Vapnik, V.N.: Estimation of dependences based on empirical data. Springer (1982)Google Scholar
  28. 28.
    Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Th. Prob. and its Applications 17(2), 264–280 (1971)CrossRefzbMATHGoogle Scholar
  29. 29.
    Vasconcelos, N., Lippman, A.: Multiresolution tangent distance for affine-invariant classification. In: Advances in Neural Information Processing Systems, vol. 10, pp. 843–849. Morgan Kaufmann Publishers (1998)Google Scholar
  30. 30.
    Voisin, J., Devijver, P.: An application of the multiedit-condensing technique to the reference selection problem in a print recognition system. Pattern Recogntion 20(5), 465–474 (1987)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Patrice Y. Simard
    • 1
  • Yann A. LeCun
    • 1
  • John S. Denker
    • 1
  • Bernard Victorri
    • 2
  1. 1.Image Processing Services Research Lab, AT& T Labs - ResearchRed BankUSA
  2. 2.CNRS, ELSAP, ENSMontrougeFrance

Personalised recommendations