Skip to main content
Log in

Manifold Based Local Classifiers: Linear and Nonlinear Approaches

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In case of insufficient data samples in high-dimensional classification problems, sparse scatters of samples tend to have many ‘holes’—regions that have few or no nearby training samples from the class. When such regions lie close to inter-class boundaries, the nearest neighbors of a query may lie in the wrong class, thus leading to errors in the Nearest Neighbor classification rule. The K-local hyperplane distance nearest neighbor (HKNN) algorithm tackles this problem by approximating each class with a smooth nonlinear manifold, which is considered to be locally linear. The method takes advantage of the local linearity assumption by using the distances from a query sample to the affine hulls of query’s nearest neighbors for decision making. However, HKNN is limited to using the Euclidean distance metric, which is a significant limitation in practice. In this paper we reformulate HKNN in terms of subspaces, and propose a variant, the Local Discriminative Common Vector (LDCV) method, that is more suitable for classification tasks where the classes have similar intra-class variations. We then extend both methods to the nonlinear case by mapping the nearest neighbors into a higher-dimensional space where the linear manifolds are constructed. This procedure allows us to use a wide variety of distance functions in the process, while computing distances between the query sample and the nonlinear manifolds remains straightforward owing to the linear nature of the manifolds in the mapped space. We tested the proposed methods on several classification tasks, obtaining better results than both the Support Vector Machines (SVMs) and their local counterpart SVM-KNN on the USPS and Image segmentation databases, and outperforming the local SVM-KNN on the Caltech visual recognition database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Simard, P., Le Cun, Y., Denker, J., & Victorri, B. (1998). Transformation invariance in pattern recognition—tangent distance and tangent propagation, lecture notes in computer science (vol. 1524, pp. 239–274). Berlin: Springer.

    Google Scholar 

  2. Peng, J., Heisterkamp, D. R., & Dai, H. K. (2003). LDA/SVM Driven Nearest Neighbor Classification. IEEE Trans Neural Netw, 14, 940–942. doi:10.1109/TNN.2003.813835.

    Article  Google Scholar 

  3. Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. IEEE Trans. PAMI, 18(6), 607–616.

    Google Scholar 

  4. Vincent, P., & Bengio, Y. (2001). K-local hyperplane and convex distance nearest neighbor algorithms. Adv Neural Inf Process Syst, 14, 985–992.

    Google Scholar 

  5. Domeniconi, C., & Gunopulos, D. (2002). Efficient local flexible nearest neighbor classification. In Proceedings of the 2nd SIAM International Conference on Data Mining.

  6. Zhang, H., Berg, a. C., Maire, M., & Malik, J. (2006). SVM-KNN: discriminative nearest neighbor classification for visual category recognition, in CVPR 2006 (pp. 2126–2136).

  7. Peng, J., Heisterkamp, D. R., & Dai, H. K. (2004). Adaptive quasiconformal kernel nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell, 28, 656–661. doi:10.1109/TPAMI.2004.1273978.

    Article  Google Scholar 

  8. Domeniconi, C., Peng, J., & Gunopulos, D. (2002). Locally adaptive metric nearest-neighbor classification. IEEE Trans Pattern Anal Mach Intell, 24, 1281–1285. doi:10.1109/TPAMI.2002.1033219.

    Article  Google Scholar 

  9. Olkun, O. (2004). Protein fold recognition with K-local hyperplane distance nearest neighbor algorithm. In Proceedings of the 2nd European Workshop on data Mining and Text Mining in Bioinformatics, pp. 51–57.

  10. Hinton, G. E., Dayan, P., & Revow, M. (1997). Modeling the manifolds of images of handwritten digits. IEEE Trans Neural Netw, 18, 65–74. doi:10.1109/72.554192.

    Article  Google Scholar 

  11. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. doi:10.1126/science.290.5500.2323.

    Article  Google Scholar 

  12. Verbeek, J. (2006). Learning non-linear image manifolds by global alignment of local linear models. IEEE Trans PAMI, 28, 1236–1250.

    Google Scholar 

  13. Cevikalp, H., Neamtu, M., & Wilkes, M. (2005). Discriminative common vectors for face recognition. IEEE Trans PAMI, 27, 4–13.

    Google Scholar 

  14. Kim, T.-K., & Kittler, J. (2005). Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image. IEEE Trans PAMI, 27, 318–327.

    Google Scholar 

  15. Fitzgibbon, A. W., & Zisserman, A. (2003). Joint manifold distance: a new approach to appearance based clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  16. Zhang, J., Marszalek, M., Lazebnik, S., & Schmidt, C. (2006). Local features and kernels for classification of texture and object categories: a comprehensive study. In Proceedings of the Computer Vision and Pattern Recognition Workshop.

  17. Tenenbaum, J. B., Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323. doi:10.1126/science.290.5500.2319.

    Article  Google Scholar 

  18. Gulmezoglu, M. B., Dzhafarov, V., & Barkana, A. (2001). The common vector approach and its relation to principal component analysis. IEEE Trans Speech Audio Process, 9(6), 655–662. doi:10.1109/89.943343.

    Article  Google Scholar 

  19. Boyd, S. (2004). Convex optimization pp. 399–401. Cambridge, UK: Cambridge University Press.

    MATH  Google Scholar 

  20. Schölkopf, B., Smola, A. J., & Muller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput, 10, 1299–1319. doi:10.1162/089976698300017467.

    Article  Google Scholar 

  21. Cevikalp, H., Neamtu, M., & Wilkes, M. (2006). Discriminative common vector method with kernels. IEEE Trans Neural Netw, 17, 1550–1565. doi:10.1109/TNN.2006.881485.

    Article  Google Scholar 

  22. Xu, J., & Zikatanov, L. (2002). The method of alternating projections and the method of subspace corrections in hilbert space. J Am Math Soc, 15, 573–597. doi:10.1090/S0894-0347-02-00398-3.

    Article  MATH  MathSciNet  Google Scholar 

  23. Fei-Fei, L. Fergus, R., & Perona, P. (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Proceedings of the IEEE CVPR Workshop of Generative Model Based Vision.

  24. USPS dataset of handwritten characters created by the US Postal Service. Retrieved from ftp://ftp.kyb.tuebingen.mpg.de/pub/bs/data.

  25. Keysers, D., Dohmen, J., Theiner, T., & Ney, H. (2000). Experiments with an extended tangent distance. In Proceedings of the 15th International Conference on Pattern Recognition, vol. 2, pp. 38–42.

  26. C codes for computing tangent distances. Retrieved from http://www-i6.informatik.rwth-aachen.de/∼keysers/td/.

  27. Golub, G. H., & Loan, C. F.-V. (1996). Matrix computations (3rd ed.). Baltimore, MD: Johns Hopkins University Press.

    MATH  Google Scholar 

  28. UCI—benchmark repository—a huge collection of artificial and real world data sets. University of California Irvine. Retrieved from http://www.ics.edu/∼mlearn/MLRepository.html.

  29. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the ECCV Workshop on Statistical Learning for Computer Vision.

  30. Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Trans PAMI, 27(8), 1265–1278.

    Google Scholar 

  31. Fowlkes, C., Belogie, S., Chung, F., & Malik, J. (2004). Spectral grouping using the Nystrom method. IEEE Trans PAMI, 26, 1–12.

    Google Scholar 

  32. Saul, L. K., & Roweis, S. T. (2003). Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res, 4, 119–155.

    Article  MathSciNet  Google Scholar 

  33. Levina, E., & Bickel, P. J. (2005). Maximum likelihood estimation of intrinsic dimension. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing system, 17 (pp. 777–784). Cambridge, MA: MIT Press.

    Google Scholar 

  34. Camastra, F., & Vinciarelli, A. (2002). Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans Pattern Anal Mach Intell, 24(10), 1404–1407. doi:10.1109/TPAMI.2002.1039212.

    Article  Google Scholar 

  35. Fukunaga, K., & Olsen, D. R. (1971). An algorithm for finding intrinsic dimensionality of data. IEEE Trans Comput, C-20, 176–183. doi:10.1109/T-C.1971.223208.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakan Cevikalp.

Appendix

Appendix

  1. Theorem 1:

    Let P and \(P_{{\text{NS}}}^{\left( i \right)} \) be the projection matrices of the subspaces \(R\left( {S_T^K } \right)\) and \(N\left( {S_i^K } \right)\), \(i = 1, \ldots ,C\), respectively. Then P and \(P_{{\text{NS}}}^{\left( i \right)} \) commute, i.e.:

    $$P_{{\text{NS}}}^{\left( i \right)} P = PP_{{\text{NS}}}^{\left( i \right)} ,\quad i = 1,...,C.$$

    Proof of the theorem is omitted since it can be derived as in the proof of Theorem 1 in [21].

  2. Theorem 2:

    Assume that there are C classes in the training set. For a query x q \(\left\| {P_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| \leqslant \left\| {P_{{\text{NS}}}^{\left( j \right)} \left( {x_q - \mu _j } \right)} \right\|\) implies that \(\left\| {P_{{\text{int}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| \leqslant \left\| {P_{{\text{int}}}^{\left( j \right)} \left( {x_q - \mu _j } \right)} \right\|\) for \(i,j = 1,...,C\), and \(i \ne j\).

Proof: We first recall several facts from [13] (see Lemma 1 of [13]). For each \(i = 1, \ldots ,C,\) it holds \(N\left( {S_T^K } \right) \subset N\left( {S_i^K } \right)\), where N(A) denotes the null space of a matrix A. Consequently, \(N\left( {S_T^K } \right)\) and \(R\left( {S_i^K } \right)\) are orthogonal, where \(R\left( {S_i^K } \right)\) is the range of \(S_i^K \). This implies the identity \(\left( {I - P} \right)\left( {I - P_{{\text{NS}}}^{\left( i \right)} } \right) = 0\) or \(\left( {I - P} \right) = \left( {I - P} \right)P_{{\text{NS}}}^{\left( i \right)} \).

Thus, we can write:

$$\begin{array}{*{20}c} {\left\| {P_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| = \left\| {PP_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right) + \left( {I - P} \right)P_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| = \left\| {PP_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| + } \\ {\left\| {\left( {I - P} \right)P_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| = \left\| {PP_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| + \left\| {\left( {I - P} \right)\left( {x_q - \mu _i } \right)} \right\|.} \\ \end{array} $$
(25)

We now note that the vector \(\left( {I - P} \right)\left( {x_q - \mu _i } \right)\) is the same for each class (i.e., it does not depend on the class index i) since we have shown in [21] that (I − P)μ i is a so-called common vector for the class consisting of all samples in \(V = \left\{ {x_m^i } \right\}_{m = 1,i = 1}^{K,C} \) and that in fact (I − P)x is the same vector for all x in the affine hull of V.

Thus, we have shown that:

$$\left\| {P_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| = \left\| {PP_{{\text{NS}}}^{\left( i \right)} \left( {x_q - \mu _i } \right)} \right\| + \left\| v \right\|,$$
(26)

for some vector v independent of the class index i. The assertion of Theorem 2 now immediately follows from this fact. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cevikalp, H., Larlus, D., Neamtu, M. et al. Manifold Based Local Classifiers: Linear and Nonlinear Approaches. J Sign Process Syst 61, 61–73 (2010). https://doi.org/10.1007/s11265-008-0313-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0313-4

Keywords

Navigation