Nonlinear distance function learning using neural network: an iterative framework

Chen, Junying; Zeng, Haoyu; Fan, Na

doi:10.1007/s11042-014-1944-z

Nonlinear distance function learning using neural network: an iterative framework

Published: 10 April 2014

Volume 74, pages 671–688, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Junying Chen¹,
Haoyu Zeng¹ &
Na Fan²

369 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we extend several existing methods that apply distance function learning to regression problems. We discover that these methods may be viewed as approximating a matrix consisting of desired distances among all training samples. Based on this understanding, we propose an iterative framework where outlier samples are corrected by their neighbors via asymptotically increasing the correlation coefficients between the desired distances and the distances of sample labels. Moreover, using this framework, we find that most existing methods iterate only once. As another extension, we adopt a nonlinear distance function and approximate it with neural network. For a fair comparison, we conduct an experiment on age estimation from face images as a regression problem, and the results are comparable to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Manifold learning assumes that data are homogenously sampled [3], as another words, the data lie on or close to a low-dimensional manifold embedded in the ambient space. For most applications, the data is generated by continuously varying a set of parameters.
However, non-trivial extensions are possible, e.g. Taylor et al. [32] extended the Neighborhood Component Analysis (NCA) to the regression setting.
Strictly speaking, the distance functions proposed in [3, 17] can not be referred to as a metric, because they do not satisfy the triangle inequality as one of the metric axioms. Instead, they should be called non-metric distance or semi-metrics, in conformance to most existing literature such as [31]. We will discuss more on this in Section 4.
Other NN topologies with similar size only lead to a slight performance difference. Investigation of the optimal topology is a pure machine learning problem, which is out of the scope of this work. Here we only present a good network configuration, but its optimality is not guaranteed.
It is referred to as Mean Square Error (MSE) in the context of NN.
Also the training labels are integers owing to the limitation of dataset collection, but any intermediate label values and the final predicted labels are real numbers
Here, the dimensionality of a regressor refers to the Vapnik–Chervonenkis dimension based complexity, see [4] for details.
\( \widehat{d}\left(i,j\right)={\left(\frac{\left|L\left(i,j\right)\right|}{C-\left|L\left(i,j\right)\right|}\right)}^p\times d\left(i,j\right) \), where L(i,j) is the label difference between two data. C is a constant greater than any label value in the train set which ensures the denominator to be greater than zero. p is selected to be 2 to make data easier to discriminate. d(i, j) is the Euclidean distance between two samples X _i and X _j.
\( \widehat{d}\left(i,j\right)={\left(\frac{\left|L\left(i,j\right)\right|+\upgamma}{C-\left|L\left(i,j\right)\right|}\right)}^p\times d\left(i,j\right) \), where L(i,j) is the absolute label difference between two data. γ refers to the labeling noise, more specifically, a human face image labeled as 7 years actually ranges within 7–8 years, so the labeling noise is 1 year in this case. C = max L(i,j) + ε, ε > 0 ensuring the denominator not to be zero. p = 2 is selected to make data easier to discriminate. The meaning of d(i, j) is the same as in Eq.(27) in [17] (see the previous footnote).
\( \widehat{d}\left(i,j\right)=\left\{\begin{array}{c}\hfill \frac{\upalpha \left(L\left(i,j\right)\right)}{C-L\left(i,j\right)}\times d\left(i,j\right)L\left(i,j\right)\ne 0\hfill \\ {}\hfill 0L\left(i,j\right)=0\hfill \end{array}\right. \), where the function α(∙) is directly proportional to the label distance (in this case, the pose distance). The meanings of L(i,j), d(i, j) and C are the same as in Eq.(2) in [3] (see the previous footnote).
An obvious counterexample is to combine two three-point metrics both with d(a,b) = 1, d(b,c) = 1, d(a,c) = 2.
It is scaled so that the mean of such Euclidean distance equals to the mean of id _ij. Note that, distance itself is first order derivative and we do not need to scale according to its variance.
Particularly, δ(dd,ad) = δ(dd,ad′) implies that, the NN is not updated and ad = ad′. Denote dd* as the desired distance in the next iteration, then dd* = dd, the iterative algorithm has converged to ad′ already.

References

Balasubramanian VN, Ye J, Panchanathan S (2007) Biased manifold embedding: A framework for person-independent head pose estimation, Proc. CVPR, pp.1–7
Bar-Hillel AD (2007) Weinshall, Learning distance function by coding similarity, Proc. ICML, pp.65–72
Castillo E, Berdinas BG, Romero OF, Betanzos AA (2006) A very fast learning method for neural networks based on sensitivity analysis. J Mach Learn Res 7:1159–1182
MATH MathSciNet Google Scholar
Cherkassky V, Shao X, Mulier FM, Vapnik VN (1999) Model complexity control for regression using VC generalization bounds. IEEE Trans Neural Netw 10(5):1075–1089
Article Google Scholar
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification, Proc. CVPR, pp.539–546
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans on PAMI 23(6):681–685
Article Google Scholar
Davis JV, Kulis B, Jain P, Sra V, Dhillon IS (2007) Information-theoretic metric learning, Proc. ICML, pp.209–216
Fan N (2011) Learning nonlinear distance functions using neural network for regression with application to robust human age estimation, Proc. ICCV, pp.249–254
FG-NET Aging Database, http://www.fgnet.rsunit.com
Geng X, Miles KS, Zhou ZZ (2008) Facial age estimation by nonlinear aging pattern subspace, Proc. ACM Multimedia, pp.721–724
Geng X, Zhou ZH, Miles KS (2007) Automatic age estimation based on facial aging patterns. IEEE Trans PAMI 29(12)):2234–2240
Article Google Scholar
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis, Proc. NIPS, pp.513–520
Guo GD, Mu G, Fu Y, Dyer C, Huang TS (2009) A study on automatic age estimation using a large database, Proc. ICCV, pp.1–8
Guo GD, Mu G, Fu Y, Huang (2009) Human age estimation using bio-inspired features, Proc. CVPR, pp.1–8
He X, Ma WY, Zhang HJ (2004) Learning an image manifold for retrieval, Proc. ACM Multimedia, pp.17–23
Huang YZ, Long YJ (2008) Demosaicking recognition with applications in digital photo authentication based on a quadratic pixel correlation model, Proceedings of CVPR, pp.1–8
Jin C, Long YJ (2010) On label information incorporated metric learning for regressions. Int J Comput Intell Appl 9(4):339–351
Article MATH Google Scholar
Lanitis A, Draganova C, Christodoulou C (2004) Comparing different classifiers for automatic age estimation. IEEE Trans SMC-B 34(1):621–628
Google Scholar
Long YJ, Huang YZ (2006) Image based source camera identification using demosaicking, Proceedings of the 8th International conference on Workshop Multimedia Signal Processing, pp. 419–424
Macskassy SA, Hirsh H, Banerjee A, Dayanik AA (2003) Converting numerical classification into text classification. Artif Intell 143(1):51–77
Article MATH MathSciNet Google Scholar
McCullagh P (1980) Regression models for ordinal data. J R Stat Soc Ser B 42(2):109–142
MATH MathSciNet Google Scholar
Min R, van der Maaten LJP, Yuan Z, Bonner A, Zhang Z (2010) Deep supervised t-distributed embedding, Proc. ICML, pp.791–798
Moller AF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533
Article Google Scholar
Pan (2010) Human age estimation by metric learning for regression problems. Proc. EMM CVPR, pp. 455–465
Ramanathan N, Chellappa R, Biswas S (2009) Age progression in human faces: a survey. J Vis Lang Comput 20:131–144
Article Google Scholar
Salakhutdinov R, Hinton G (2007) Learning a nonlinear embedding by preserving class neighbourhood structure, Proc. AI and Statistics, pp. 412–419
Shalev-Shwartz S, Singer Y, Ng AY (2004) Online and batch learning of pseudo-metrics, Proc. ICML, pp.743–750
Shental N, Hertz T, Weinshall D, Pavel M (2002) Adjustment learning and relevant component analysis, Proc. ECCV, pp.776–792
Smith L (2002) A tutorial on Principal Components Analysis http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
Stanley KO (2007) Compositional pattern producing networks: a novel abstraction of development. Genet Program Evolvable Mach 8(2):131–162
Article Google Scholar
Tan X, Chen S, Li J, Zhou Z (2006) Learning non-metric partial similarity based on maximal margin criterion, Proc. CVPR, pp.138–145
Taylor G, Fergus R, Williams G, Spiro I, Bregler C (2010) Pose-sensitive embedding by nonlinear NCA regression. Proc, NIPS
Google Scholar
Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification, Proc. NIPS, pp.1475–1482
Xing E, Ng A, Jordan MI, Russell S (2002) Distance metric learning with application to clustering with side-information, Proc. NIPS, pp.505–512
Yan S, Wang H, Huang TS, Tang X (2007) Ranking with uncertain labels, Proc. ICME, pp.96–99
Yan S, Wang H, Tang X, Huang T (2007) Learning auto-structured regressor from uncertain nonnegative labels. Proc. ICCV, pp.1–8
Yan S, Zhou X, M. Liu, M. H. Johnson, T. Huang (2008) Regression from patch-kernel, Proc. CVPR, pp.1–8
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey, Technical report, Michigan State University. http://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf
Yeung DY, Chang H (2007) A kernel approach for semi-supervised metric learning. IEEE Trans on Neural Net 18(1):141–149
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Sciences, Agricultural University of Hebei, 289 Lingyu Temple St., Baoding, Hebei, 071001, China
Junying Chen & Haoyu Zeng
Department of Electronic Engineering, East China Normal University, 500 Dongchuan Rd, Shanghai, 200241, People’s Republic of China
Na Fan

Authors

Junying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Na Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junying Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Zeng, H. & Fan, N. Nonlinear distance function learning using neural network: an iterative framework. Multimed Tools Appl 74, 671–688 (2015). https://doi.org/10.1007/s11042-014-1944-z

Download citation

Published: 10 April 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s11042-014-1944-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear distance function learning using neural network: an iterative framework

Abstract

Access this article

Similar content being viewed by others

Super-Sparse Regression for Fast Age Estimation from Faces at Test Time

Deep Convolutional Neural Networks and Maximum-Likelihood Principle in Approximate Nearest Neighbor Search

Calibrating Distance Metrics Under Uncertainty

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonlinear distance function learning using neural network: an iterative framework

Abstract

Access this article

Similar content being viewed by others

Super-Sparse Regression for Fast Age Estimation from Faces at Test Time

Deep Convolutional Neural Networks and Maximum-Likelihood Principle in Approximate Nearest Neighbor Search

Calibrating Distance Metrics Under Uncertainty

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation