Boosting k-NN for Categorization of Natural Scenes

Nock, Richard; Piro, Paolo; Nielsen, Frank; Bel Haj Ali, Wafa; Barlaud, Michel

doi:10.1007/s11263-012-0539-2

Boosting k-NN for Categorization of Natural Scenes

Published: 04 July 2012

Volume 100, pages 294–314, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Richard Nock¹,
Paolo Piro²,
Frank Nielsen^3,4,
Wafa Bel Haj Ali⁵ &
…
Michel Barlaud⁵

1246 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

The k-nearest neighbors (k-NN) classification rule has proven extremely successful in countless many computer vision applications. For example, image categorization often relies on uniform voting among the nearest prototypes in the space of descriptors. In spite of its good generalization properties and its natural extension to multi-class problems, the classic k-NN rule suffers from high variance when dealing with sparse prototype datasets in high dimensions. A few techniques have been proposed in order to improve k-NN classification, which rely on either deforming the nearest neighborhood relationship by learning a distance function or modifying the input space by means of subspace selection. From the computational standpoint, many methods have been proposed for speeding up nearest neighbor retrieval, both for multidimensional vector spaces and nonvector spaces induced by computationally expensive distance measures.

In this paper, we propose a novel boosting approach for generalizing the k-NN rule, by providing a new k-NN boosting algorithm, called UNN (Universal Nearest Neighbors), for the induction of leveraged k-NN. We emphasize that UNN is a formal boosting algorithm in the original boosting terminology. Our approach consists in redefining the voting rule as a strong classifier that linearly combines predictions from the k closest prototypes. Therefore, the k nearest neighbors examples act as weak classifiers and their weights, called leveraging coefficients, are learned by UNN so as to minimize a surrogate risk, which upper bounds the empirical misclassification rate over training data. These leveraging coefficients allows us to distinguish the most relevant prototypes for a given class. Indeed, UNN does not affect the k-nearest neighborhood relationship, but rather acts on top of k-NN search.

We carried out experiments comparing UNN to k-NN, support vector machines (SVM) and AdaBoost on categorization of natural scenes, using state-of-the art image descriptors (Gist and Bag-of-Features) on real images from Oliva and Torralba (Int. J. Comput. Vis. 42(3):145–175, 2001), Fei-Fei and Perona (IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 524–531, 2005), and Xiao et al. (IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492, 2010). Results display the ability of UNN to compete with or beat the other contenders, while achieving comparatively small training and testing times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting k-Nearest Neighbors Classification

k-NN Boosting Prototype Learning for Object Classification

Naive Bayes Image Classification: Beyond Nearest Neighbors

Notes

A surrogate is a function which is a suitable upperbound for another function (here, the non-convex non-differentiable empirical risk).
The implementation by the authors is available at http://people.csail.mit.edu/torralba/code/spatialenvelope/sceneRecognition.m.
The MAP was computed by averaging classification rates over categories (diagonal of the confusion matrix) and then averaging those values after repeating each experiment 10 times on different folds.
Code available at http://www.vlfeat.org/.
Code available at http://www.irisa.fr/texmex/people/jegou/src.php.
For AdaBoost, we used the code available at http://www.mathworks.com/matlabcentral/fileexchange/22997-multiclass-gentleadaboosting.
We recall young inequality: for any p, q Hölder conjugates (p>1, (1/p)+(1/q)=1), we have yy′≤y ^p/p+y′^q/q, assuming y,y′≥0.

References

Amores, J., Sebe, N., & Radeva, P. (2006). Boosting the distance estimation: application to the k-nearest neighbor classifier. Pattern Recognition Letters, 27(3), 201–209.
Article Google Scholar
Athitsos, V., Alon, J., Sclaroff, S., & Kollios, G. (2008). BoostMap: an embedding method for efficient nearest neighbor retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 89–104.
Article Google Scholar
Bartlett, P., & Traskin, M. (2007). Adaboost is consistent. Journal of Machine Learning Research, 8, 2347–2368.
MathSciNet MATH Google Scholar
Bartlett, P., Jordan, M., & McAuliffe, D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156.
Article MathSciNet MATH Google Scholar
Bel Haj Ali, W., Piro, P., Crescence, L., Giampaglia, D., Ferhat, O., Darcourt, J., Pourcher, T., & Barlaud, M. (2012). Changes in the subcellular localization of a plasma membrane protein studied by bioinspired UNN learning classification of biologic cell images. In International conference on computer vision theory and applications (VISAPP).
Google Scholar
Boutell, R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.
Article Google Scholar
Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, 6, 153–172.
Article MathSciNet MATH Google Scholar
Cucala, L., Marin, J. M., Robert, C. P., & Titterington, D. M. (2009). A Bayesian reassessment of nearest-neighbor classification. Journal of the American Statistical Association, 104(485), 263–273.
Article MathSciNet Google Scholar
Dudani, S. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(4), 325–327.
Article Google Scholar
Escolano Ruiz, F., Suau Pérez, P., & Bonev, B. I. (2009). Information theory in computer vision and pattern recognition. Berlin: Springer.
Book Google Scholar
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 524–531).
Chapter Google Scholar
Fukunaga, K., & Flick, T. (1984). An optimal global nearest neighbor metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(3), 314–318.
Article MATH Google Scholar
García-Pedrajas, N., & Ortiz-Boyer, D. (2009). Boosting k-nearest neighbor classifier by means of input space projection. Expert Systems with Applications, 36(7), 10,570–10,582.
Article Google Scholar
Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. In Proc. international conference on very large databases (pp. 518–529).
Google Scholar
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: discriminative classification with sets of image features. In IEEE international conference on computer vision (ICCV) (pp. 1458–1465).
Google Scholar
Gupta, L., Pathangay, V., Patra, A., Dyana, A, & Das, S. (2007). Indoor versus outdoor scene classification using probabilistic neural network. EURASIP Journal on Applied Signal Processing, 2007(1), 123.
Google Scholar
Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516.
Article Google Scholar
Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), 607–616.
Article Google Scholar
Holmes, C. C., & Adams, N. M. (2003). Likelihood inference in nearest-neighbour classification models. Biometrika, 90, 99–112.
Article MathSciNet MATH Google Scholar
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Tech. rep.
Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128.
Article Google Scholar
Kakade, S., Shalev-Shwartz, S., & Tewari, A. (2009). Applications of strong convexity–strong smoothness duality to learning with matrices. Tech. rep.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 2169–2178).
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Masip, D., & Vitrià, J. (2006). Boosted discriminant projections for nearest neighbor classification. Pattern Recognition, 39(2), 164–170.
Article MATH Google Scholar
Nguyen, X., Wainwright, M. J., & Jordan, M. I. (2009). On surrogate loss functions and f-divergences. Annals of Statistics, 37, 876–904.
Article MathSciNet MATH Google Scholar
Nock, R., & Nielsen, F. (2009a). Bregman divergences and surrogates for learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 2048–2059.
Article Google Scholar
Nock, R., & Nielsen, F. (2009b). On the efficient minimization of classification calibrated surrogates. In Advances in neural information processing systems 21 (NIPS) (pp. 1201–1208).
Google Scholar
Nock, R., & Sebban, M. (2001). An improved bound on the finite-sample risk of the nearest neighbor rule. Pattern Recognition Letters, 22(3/4), 407–412.
Article MATH Google Scholar
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Paredes, R. (2006). Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1100–1110.
Article Google Scholar
Payne, A., & Singh, S. (2005). Indoor vs. outdoor scene classification in digital photographs. Pattern Recognition, 38(10), 1533–1545.
Article Google Scholar
Piro, P., Nock, R., Nielsen, F., & Barlaud, M. (2012). Leveraging k-NN for generic classification boosting. Neurocomputing, 80, 3–9.
Article Google Scholar
Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In IEEE computer society conference on computer vision and pattern recognition (CVPR).
Google Scholar
Schapire, E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning Journal, 37, 297–336.
Article MATH Google Scholar
Serrano, N., Savakis, A. E., & Luo, J. B. (2004). Improved scene classification using efficient low-level features and semantic cues. Pattern Recognition, 37, 1773–1784.
Article MATH Google Scholar
Shakhnarovich, G., Darell, T., & Indyk, P. (2006). Nearest-neighbors methods in learning and vision. Cambridge: MIT Press.
Google Scholar
Sivic, J., & Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. In IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 1470–1477).
Chapter Google Scholar
Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7, 11–32.
Article Google Scholar
Torralba, A., Murphy, K., Freeman, W., & Rubin, M. (2003). Context-based vision system for place and object recognition. In IEEE international conference on computer vision (ICCV) (pp. 273–280).
Chapter Google Scholar
Vedaldi, A., & Fulkerson, B. (2008). VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org.
Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157.
Article Google Scholar
Xiao, J., Hays, J., Ehinger, A., Oliva, A., & Torralba, A. (2010). SUN database: large-scale scene recognition from abbey to zoo. In IEEE conference on computer vision and pattern recognition (CVPR), June 2010 (pp. 3485–3492).
Google Scholar
Yu, K., Ji, L., & Zhang, X. (2002). Kernel nearest-neighbor algorithm. Neural Processing Letters, 15(2), 147–156.
Article MATH Google Scholar
Yuan, M., & Wegkamp, M. (2010). Classification methods with reject option based on convex risk minimization. Journal of Machine Learning Research, 11, 111–130.
MathSciNet MATH Google Scholar
Zhang, H., Berg, C., Maire, M., & Malik, J. (2006). Svm-knn: discriminative nearest neighbor classification for visual category recognition. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 2126–2136).
Google Scholar
Zhang, M. L., & Zhou, Z. H. (2007). Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Article MATH Google Scholar
Zhu, J., Rosset, S., Zou, H., & Hastie, T. (2009). Multi-class adaboost. Statistics and Its Interface, 2, 349–360.
MathSciNet MATH Google Scholar
Zuo, W., Zhang, D., & Wang, K. (2008). On kernel difference-weighted k-nearest neighbor classification. Pattern Analysis & Applications, 11(3–4), 247–257.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the reviewers for stimulating comments and discussions about our results, which helped to significantly improve the paper, and Dario Giampaglia and John Tassone for their help in handling experiments. The software UNN is available upon request to Michel Barlaud.

Author information

Authors and Affiliations

CEREGMIA, Université Antilles-Guyane, Campus de Schoelcher, Martinique, France
Richard Nock
Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
Paolo Piro
Sony Computer Science Laboratories, Inc., Tokyo, Japan
Frank Nielsen
LIX Department, Ecole Polytechnique, Palaiseau, France
Frank Nielsen
University of Nice-Sophia Antipolis/CNRS, 2000 route des Lucioles, 06903, Sophia Antipolis, France
Wafa Bel Haj Ali & Michel Barlaud

Authors

Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Piro
View author publications
You can also search for this author in PubMed Google Scholar
Frank Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Wafa Bel Haj Ali
View author publications
You can also search for this author in PubMed Google Scholar
Michel Barlaud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paolo Piro.

Appendix

Generic UNN algorithm

The general version of UNN is shown in Algorithm 2. This algorithm induces the leveraged k-NN rule (9) for the broad class of surrogate losses meeting conditions of (Bartlett et al. 2006), thus generalizing Algorithm 1. Namely, we constrain ψ to meet the following conditions: (i) im(ψ)=ℝ₊, (ii) ∇_ψ(0)<0 (∇_ψ is the conventional derivative of ψ loss function), and (iii) ψ is strictly convex and differentiable. (i) and (ii) imply that ψ is classification-calibrated: its local minimization is roughly tied up to that of the empirical risk (Bartlett et al. 2006). (iii) implies convenient algorithmic properties for the minimization of the surrogate risk (Nock and Nielsen 2009b). Three common examples have been shown in (6)–(5).

The main bottleneck of UNN is step [I.1], as (29) is non-linear, but it always has a solution, finite under mild assumptions (Nock and Nielsen 2009b): in our case, δ _j is guaranteed to be finite when there is no total matching or mismatching of example j’s memberships with its reciprocal neighbors’, for the class at hand. The second column of Table 5 contains the solutions to (29) for surrogate losses mentioned in Sect. 2.2. Those solutions are always exact for the exponential loss (ψ ^exp) and squared loss (ψ ^squ); for the logistic loss (ψ ^log) it is exact when the weights in the reciprocal neighborhood of j are the same, otherwise it is approximated. Since starting weights are all the same, exactness can be guaranteed during a large number of inner rounds depending on which order is used to choice the examples. Table 5 helps to formalize the finiteness condition on δ _j mentioned above: when either sum of weights in (28) is zero, the solutions in the first and third line of Table 5 are not finite. A simple strategy to cope with numerical problems arising from such situations is that proposed by Schapire and Singer (1999). (See Sect. 2.4.) Table 5 also shows how the weight update rule (30) specializes for the mentioned losses.

Table 5 Three common loss functions and the corresponding solutions δ _j of (29) and w _i of (30). (Vector \(\boldsymbol{r}^{(c)}_{j}\) designates column j of R ^(c) and ∥.∥₁ is the L ₁ norm.) The rightmost column says whether it is (A)lways the solution, or whether it is when the weights of reciprocal neighbors of j are the (S)ame

Full size table

Table 6 Number of images for each of the first 100 categories of the SUN database

Full size table

Proofsketch of Theorem 3

We plug in the weight notation the iteration t and class c, so that \(w_{ti}^{(c)}\) denotes the weight of example x _i prior to iteration t for class c in UNN (inside the “for c” loop of Algorithm 2, letting w ₀ denote the initial value of w). To save space in some computations below, we also denote for short:

(25)

ψ is ω strongly smooth is equivalent to \(\tilde{\psi}\) being strongly convex with parameter ω ⁻¹ (Kakade et al. 2009), that is,

(26)

is convex. Here, we have made use of the following notations: \(\tilde{\psi}(x) \stackrel{\mathrm{.}}{=} \psi^{\star}(-x)\), where \(\psi^{\star}(x) \stackrel{\mathrm{.}}{=} x\nabla_{\psi}^{-1}(x) - \psi(\nabla^{-1}_{\psi}(x))\) is the Legendre conjugate of ψ. Since a convex function h satisfies h(w′)≥h(w)+∇_h(w)(w′−w), applying inequality (32) taking as h the function in (32) yields, ∀t=1,2,…,T, ∀i=1,2,…,m, ∀c=1,2,…,C:

(27)

where we recall that D _ψ denotes the Bregman divergence with generator ψ (21). On the other hand, Cauchy-Schwartz inequality yields:

(28)

The equality in (34) holds because \(\sum_{i: j \sim_{k} i} {\mathrm{r}^{(c)}_{ij}w^{(c)}_{(t+1)i}} = 0\), which is exactly (29). We obtain:

(29)

(30)

(31)

Here, (35) follows from (33), (36) follows from (34), and (37) follows from (19). Adding (37) for c=1,2,…,C and t=1,2,…,T, and then dividing by C, we obtain:

(32)

We now work on the big parenthesis which depends solely upon the examples. We have:

(33)

(34)

(35)

Here, (39) holds because of the Arithmetic-Geometric-Harmonic inequality, and (40) is Young’s inequality^{Footnote 7} with p=q=2. Plugging (41) into (38), we obtain:

(36)

Now, UNN meets the following property (Piro et al. 2012, A.2), which can easily be shown to hold with our class encoding as well:

(37)

Adding (43) for t=0,2,…,T−1 and c=1,2,…,C, we obtain:

(38)

Plugging (42) into (44), we obtain:

(39)

But the following inequality holds between the average surrogate risk and the empirical risk of the leveraged k-NN rule \(\boldsymbol{h}^{\ell}_{T}\), because of (i):

(40)

so that, putting altogether (45) and (46) and using the fact that ψ(0)>0 because of (i)–(ii), we have after T rounds of boosting for each class: i.e.:

(41)

There remains to compute the minimal value of T for which the right hand side of (47) becomes no greater than some user-fixed τ∈[0,1] to obtain the bound in (22).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nock, R., Piro, P., Nielsen, F. et al. Boosting k-NN for Categorization of Natural Scenes. Int J Comput Vis 100, 294–314 (2012). https://doi.org/10.1007/s11263-012-0539-2

Download citation

Received: 21 October 2009
Accepted: 17 May 2012
Published: 04 July 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s11263-012-0539-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting k-NN for Categorization of Natural Scenes

Abstract

Access this article

Similar content being viewed by others

Boosting k-Nearest Neighbors Classification

k-NN Boosting Prototype Learning for Object Classification

Naive Bayes Image Classification: Beyond Nearest Neighbors

Notes

References

Acknowledgements