Optimization Letters

, Volume 11, Issue 2, pp 329–341

Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases

Original Paper

DOI: 10.1007/s11590-015-0948-6

Cite this article as:
Savchenko, A.V. Optim Lett (2017) 11: 329. doi:10.1007/s11590-015-0948-6


This paper addresses the problem of insufficient performance of statistical classification with the medium-sized database (thousands of classes). Each object is represented as a sequence of independent segments. Each segment is defined as a random sample of independent features with the distribution of multivariate exponential type. To increase the speed of the optimal Kullback–Leibler minimum information discrimination principle, we apply the clustering of the training set and an approximate nearest neighbor search of the input object in a set of cluster medoids. By using the asymptotic properties of the Kullback–Leibler divergence, we propose the maximal likelihood search procedure. In this method the medoid to check is selected from the cluster with the maximal joint density (likelihood) of the distances to the previously checked medoids. Experimental results in image recognition with artificially generated dataset and Essex facial database prove that the proposed approach is much more effective, than an exhaustive search and the known approximate nearest neighbor methods from FLANN and NonMetricSpace libraries.


Statistical classification Approximate nearest neighbor method Image recognition Kullback–Leibler discrimination Exponential family 

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Laboratory of Algorithms and Technologies for Network AnalysisNational Research University Higher School of EconomicsNizhny NovgorodRussia

Personalised recommendations