The influence of image descriptors’ dimensions’ value cardinalities on large-scale similarity search
Abstract
In this empirical study, we evaluate the impact of the dimensions’ value cardinality (DVC) of image descriptors in each dimension, on the performance of large-scale similarity search. DVCs are inherent characteristics of image descriptors defined for each dimension as the number of distinct values of image descriptors, thus expressing the dimension’s discriminative power. In our experiments, with six publicly available datasets of image descriptors of different dimensionality (64–5,000 dim) and size (240 K–1 M), (a) we show that DVC varies, due to the existence of several extraction methods using different quantization and normalization techniques; (b) we also show that image descriptor extraction strategies tend to follow the same DVC distribution function family; therefore, similarity search strategies can exploit image descriptors DVCs, irrespective of the sizes of the datasets; (c) based on a canonical correlation analysis, we demonstrate that there is a significant impact of image descriptors’ DVCs on the performance of the baseline LSH method [8] and three state-of-the-art hashing methods: SKLSH [28], PCA-ITQ [10], SPH [12], as well as on the performance of MSIDX method [34], which exploits the DVC information; (d) we experimentally demonstrate the influence of DVCs in both the sequential search and in the aforementioned similarity search methods and discuss the advantages of our findings. We hope that our work will motivate researchers for considering DVC analysis as a tool for the design of similarity search strategies in image databases.
Keywords
Dimensions value cardinalities Indexing Content-based image retrieval Approximate similarity search1 Introduction
This work presents an empirical study of dimensions’ value cardinality (DVC), defined as the number of distinct values in each dimension of image descriptor vectors. Through our analysis and experiments, we examine the influence of DVCs on the performance of approximate similarity search algorithms as well as on the sequential search in image databases.
Many hashing techniques [8, 12, 13, 19, 20, 28, 31, 38, 39] have been proposed to provide efficient methods for high-dimensional indexing of low-level descriptor vectors of multimedia, such as video or still images. The mapping of low-level descriptor vectors into the hamming space [5] using appropriate hashing functions ensures scalability of the similarity search algorithms to large-scale datasets, due to the compactness of the data and the fast hamming distance computations. The goal of the appropriate hashing functions is to map similar (i.e. adjacent in the euclidean space) high-dimensional descriptors of images to neighboring binary codes in the hamming space. Similarity search is then performed by comparing the binary codes. However, hashing methods often fail to preserve neighboring vectors adjacent to the hamming space and thus have low accuracy. The performance of similarity search methods is usually measured in terms of mean Average Precision (mAP), expressing how well the methods preserve the Euclidean neighbors of sequential search. Especially, when the hashing functions are selected independently from the data, or when a short binary code length is selected, hashing methods have limited mAP. Moreover, for long binary code lengths, a significant preprocessing time is required and the speedup factor (SF) of similarity search is highly reduced. Hashing methods are categorized to data-dependent or data-independent ones, based on the method followed to generate the hashing functions. Efficiency improvements of data-dependent methods over independent ones have been shown in several studies [19, 39], for the case where limited hash code sizes are employed. This happens due to the increase of independence between the hash functions as their number increases. For example, spectral hashing [39] outperforms many data-independent methods for small code sizes, but it is outperformed by the data independent method of shift-invariant kernel hashing [28] for sizes over 64 bits. Moreover, in all data-dependent hashing methods, there is often a significant preprocessing cost for learning the selected training dataset and for generating the binary codes. While in most hashing methods the usual technique for assigning the binary codes is to partition the metric space of the projected data points of image descriptors with appropriate hyperplanes and set two different codes for each side, in the recent approach of spherical hashing (SPH) [12], the partition of data points for computing the binary codes is based on hyperspheres. According to the experimental evaluation of [12], SPH outperforms other state-of-the-art hashing methods.
In the work of [2], authors developed an analytic model to describe the operation of hash table-based multimedia fingerprint databases. In their analysis, they show that their model can predict the performance of a search through a hash-based database as a function of both the statistical distribution of the fingerprints and the actual values derived by the database design parameter. The main idea is to exploit the notion of “weak” bits. When extracting the fingerprint of the query multimedia object, each of the bits of the query fingerprint is assigned a certain probability value, which describes the likelihood with which the respective bit would change if the query object were modified. The bits which are assigned with a high probability of change are called “weak” bits. In their algorithm, a stability score is assigned to each bit. Less stable (weaker) bits are toggled to generate multiple pseudoqueries from a single query. The results from all the generated queries are then aggregated. However, the algorithm requires that the stable bits of the query be correctly identified, otherwise it fails. Moreover, the aforementioned algorithm is limited to multimedia fingerprint databases.
Apart from hashing strategies, the recently proposed MSIDX [34] method exploits the correlation between (a) the value cardinality of each dimension of the descriptor vector and (b) the discriminative power of the specific dimension, assuming that dimensions with high-value cardinalities have more discriminative power. The key idea of MSIDX is to reorder the storage positions of image descriptors according to the value of the cardinalities of their dimensions, by performing a multiple sort algorithm. This sorting approach aims to increase the probability of having two similar images in storage positions that do not differ more than a specific global constant range, which is calculated as a percentage of the dataset size and marked with parameter \(w\). As was experimentally shown, MSIDX outperforms current state-of-the-art hashing methods in terms of both mAP and SF.
1.1 Contribution and Layout
- (C1)
In six publicly available datasets it is experimentally shown that image descriptors’ DVCs vary due to the existence of different extraction techniques and dimensions with relatively high-DVC have relatively higher discriminative power.
- (C2)
It is shown that each descriptor extraction method tends to produce similar DVC distributions for different dataset sizes. Thus, similarity search strategies that exploit image descriptors’ DVCs can scale, since the DVC distributions over the dimensions are preserved, irrespective of the dataset sizes.
- (C3)
It is verified that the values of the image descriptors’ DVCs have a strong impact on the similarity search performance both in terms of mAP and SF. The correlations of a set of variables describing DVCs for each image descriptor dataset and a set of variables describing the performance of the similarity search strategy were calculated using canonical correlation analysis (CCA). The CCA approach considers the performance variables, mAP and SF, as a set and not separately, since both mAP and SF play a crucial role in similarity search.
2 Analysis of image descriptors’ DVC
2.1 Impact of images’ descriptor extraction strategies on DVC and search performance
A wide set of factors are known to affect the retrieval performance of image descriptor extraction algorithms. The actual image characteristics that each algorithm selects to identify and encode, are tightly correlated with the semantic definition of similarity which is adopted by the algorithm designer. The ability of the algorithm to correctly model image characteristics such as color, texture, illumination and resolution variations, plays a very important role on the final retrieval performance of the image descriptor. However, given the retrieval performance of an image descriptor algorithm, the aim of our analysis is to study how the different dimensions of a descriptor vector contribute to the overall performance in the case of hashing and other approximate similarity search strategies. In this Section, image descriptor extraction techniques are discussed to identify how different methodologies influence DVC values.
For each dimension, DVC is the number of discrete values that can be found in this dimension throughout a dataset of image descriptors. Descriptor vectors of images are integer or real-valued vector representations of either a part (i.e. local descriptors) or the whole image (i.e. global descriptors), by its characteristics. These are typically histograms or other vector representations of image characteristics such as color, texture, edges, illumination and their spatial distribution in the examined area. A parameter that varies among descriptor extraction techniques is the number of dimensions of the descriptor vector. In the case of local image descriptor vectors, the number of dimensions depends on the selected number of attributes or the binning resolution, which are selected for producing the histograms of the local attributes. In the case of generating global image descriptors from local ones, the typical procedure to follow is a “bag-of-words” technique, where local descriptors are assigned, either by soft or hard assignment, to a predefined number of centroids. Then a histogram of these assignments is constructed. The number of local descriptor vectors that are used to extract the global descriptor may vary for images due to different sizes, sampling strategies (e.g. dense grid, interest points, pyramidal decompositions) or the selected extraction density [34].
According to [34], such variations bind the global descriptor vectors to low performance and thus, a post-processing phase, to normalize the values in each dimension is required [23, 36, 41]. However, this step renders the descriptor values as real numbers which further increases the algorithm’s complexity, leading to high processing time and storage requirements, especially for large-scale datasets. The typical approach to address the aforementioned drawback is quantization of the values in each dimension which, however, is generally a lossy process and thus it introduces a trade-off between retrieval accuracy and computational cost. However, in practice, a very limited number of dimensions reach the quantization bounds, while most of them are highly repetitive and thus, restricted to a lower DVC bound.
2.2 Calculation of image descriptors’ DVC
The value cardinalities for each dimension is the number of distinct values that exist in this dimension throughout the dataset [34]. In the case of integer values, this is well defined.
In the case of real values, a finite length of decimal digits should be selected. However, real values are calculated either by normalizing integers to the \( \{0-1\}\mathbb {R}\) range and thus, already restricted to the original discrete values, or are real numbers with restricted decimal accuracy due to memory and time bounds that the algorithms and current computers introduce. As a result, descriptor vectors in all datasets of the experiments have a limited number of decimals, usually not exceeding 8 decimal digits. Following [34], we should also note that in our experiments no value quantization was applied in the examined datasets.
2.3 Evaluation datasets
The evaluation datasets used in our experiments are the datasets used in [34] with the additional C-SIFT dataset. The collection of datasets contains both local and global descriptors from different collections^{1}\(^{,}\)^{2}\(^{,}\)^{3} and has not been subjected to any additional preprocessing steps.
From the ImageClef image collection, we have the CIME 64d-240K, CEDD 144d-240K and SURF 5000d-240K datasets. The CIME 64d-240K dataset features CIME descriptors [32] of 64-dimensions of integer values\(\in \) {0- 63}. The CEDD 144d-240K dataset contains global CEDD descriptors [4], of 144-dimensions of integer values\(\in \) {0 - 7} and the SURF 5000d-240K dataset with a 5000-dimensional codebook, to extract global vectors from the local SURF descriptors [3] with normalized real values\(\in \)\([0,1]\).
From the TEXMEX collection, we used SIFT 128d-1M and GIST 960d-1M datasets, featuring 1 million image descriptors. The SIFT 128d-1M consists of local SIFT [23] descriptors of 128-dimensions of integer values\(\in \) {0 - 255}, while the GIST 960d-1M hold global GIST [27] descriptors of real values\(\in \)\([0,1.0929]\).
Finally, C-SIFT 1019d-700K dataset, featuring 738,418 (\(N\) = 700K) images, crawled through Flickr’s Web Services by posing 50 random queries. Local C-SIFT descriptors [36] were extracted and a codebook of 1019 dimensions was computed by clustering the local descriptors. Next, the typical “bag-of-words” approach was followed to compute global C-SIFT descriptor vectors of normalized real values\(\in \)\([0,1]\) from the local vectors for each image.
2.4 DVC in evolving datasets’ sizes
Our goal in this section is to evaluate the correlation of DVCs with the performance of similarity search strategies and to confirm that DVC characteristics can be exploited in the design of scalable similarity search strategies, irrespective of the datasets’ sizes.
For each evaluation dataset the Kolmogorov–Smirnov [25] test was performed between all possible pairs of the DVC distributions of different datasets’ sizes and it was found that for each descriptor extraction methodology the DVC distributions of different datasets’ sizes come from the same distribution function (\(p<0.01\)). This means that each descriptor extraction technique tends to produce the same distribution function family. Since for each descriptor extraction strategy the cumulative distributions of DVC come from the same distribution family, the relative differences between DVCs of datasets are preserved, irrespective of the datasets’ sizes. Based on this finding, similarity search strategies can exploit image descriptors’ DVCs irrespective of the datasets’ sizes (contribution C2 in Sect. 1.1).
In addition, based on the experimental results of Fig. 2, we observe that in the case of the 20 % down-sampled high-dimensional dataset of SURF, the majority of the DVCs are in the range of [1 1,000] with \(F(1,000)\thickapprox 0.8\) (80 %). According to (1) this means that 80 % of dimensions have DVCs lower than or equal to 1,000, whereas the rest of the 20 % of dimensions have DVCs in the range of [1,000 5,000]. Analogously, the same happens for the high-dimensional datasets of GIST ([2,000 6,000]) and C-SIFT ([1 16,000]) datasets, with the majority (80 %) of dimensions having the respective DVCs lower than or equal to 3,000 and 4,000, whereas the rest of the 20 % of dimensions have DVCs in the range of [3,000 6,000] and [4,000 16,000], respectively. However, in the case of the low-dimensional datasets of CIME ([10 60]), SIFT ([140 220]) and CEDD ([1 8]), the majority (80 %) of the dimensions have the respective DVCs lower than or equal to 50, 170 and 8. The remaining 20 % of dimensions have DVCs in the range of [50 60], [170 220] and 8. This can be attributed to the fact that, high-dimensional descriptors tend to produce high-DVCs for fewer dimensions, in comparison with the low dimensional image descriptors. In the high-dimensional evaluation datasets of SURF 5000d-240K, C-SIFT 1019d-700K and GIST 960d-1M there are few dimensions with high discriminative power, specifically those that have high-DVCs, while the rest have low DVCs. This effect is attributed to the fact that high-dimensional descriptors tend to be more sparse, with some dimensions frequently holding zero or being highly repetitive.
3 Similarity search strategies
3.1 Algorithms
- 1.
Locality-sensitive hashing (LSH) [8] is the baseline hashing method used in our experiments. LSH projects the data to a randomly generated space, through a gaussian random distribution, and the thresholds to generate the binary codes. The source code of LSH is publicly available at [30].
- 2.
Shift-invariant kernel hashing (SKLSH) [28] is based on the random projections, in such a way that the Hamming distance between the binary codes of two vectors is related to the value of a shift-invariant kernel between the vectors. The source code of SKLSH is publicly available at [16].
- 3.
Iterative quantization for learning binary codes (PCA-ITQ) [10] minimizes the quantization error by rotating zero-centered PCA projected data. The PCA-ITQ method generates the binary codes in two steps: (a) PCA dimensionality reduction and (b) iterative quantization. In the first step, PCA is performed and a projection matrix is computed for dimensionality reduction. The aim is to produce efficient binary codes, in which the variance of each bit is maximized and the bits are pairwise uncorrelated. Next, in the iterative quantization step, a rotation matrix is computed for the training set so as to minimize the quantization error by preserving the locality structure of the projected data. The source code of PCA-ITQ is publicly available at [16].
- 4.
Spherical hashing (SPH) [12] is a hashing method based on a hypersphere binary embedding technique (see Sect. 1). Based on the experimental results of [12], in our experiments, we used the spherical Hamming distance approach, since it achieves higher mAP than the baseline Hamming distance. However, the SF of SPH is comparable to the baseline LSH method. The source code of SPH is publicly available at [30].
- 5.
Multi-sort indexing is based on DVC (MSIDX) [34], which performs a multi-sort algorithm based on image descriptors’ DVC (see Sect. 1). The source code of MSIDX is publicly available at [15].
3.2 Evaluation benchmark
3.3 Parameter settings and performance
Several experiments were conducted with different configurations for the number of bits and hash tables for the examined methods. To make a fair comparison for the hashing methods a condition^{4} of \(\mathrm{SF}\ge 1\) was applied, since large #bits and #hash tables increase mAP, while reducing SF. For the relatively low-dimensionality datasets (CIME 64d-240K, SIFT 128d-1M and CEDD 144d-240K) we varied the #bits from 1 to 1,024 with a step of 4 bits to increase the number of observations that satisfy the condition and fixed the #hash tables = 1. For the high-dimensional datasets (GIST 960d-1M, C-SIFT 1019d-700K, SURF 5000d-240K) we varied the number of bits in the set of \(\left\{ {64, 128, 256, 512, 1,024}\right\} \). In the high dimensional datasets of GIST 960d-1M and C-SIFT 1019d-700K the maximum #hash tables were 5, whereas in the extreme high-dimensional dataset of SURF 5000d-240K the maximum #hash tables were 15.
4 Canonical correlation analysis (CCA): impact of DVC on similarity search strategies
4.1 Preliminaries of CCA
Canonical correlation analysis (CCA) [14] has been applied to many machine learning methods. CCA generates a multivariate statistical model facilitating the study of interrelationships among sets of multiple dependent variables and multiple independent variables. According to [14], the goal of CCA is to maximize the \(R_c\), which is the linear correlation between two sets of metric/categorical variables. In case that the generated model is not statistically significant, as measured by Wilk’s \(\Lambda \) statistic [24], then the two sets of variables are not linearly correlated. Alternatively, non-linear CCA or Kernel CCA have been proposed [21].
4.2 DVC and performance sets
To evaluate the impact of the DVCs of the image descriptors on the performance of the five examined similarity search strategies (described in Sect. 3.1), we performed CCA on two sets of variables: the DVC set, as the set of independent variables and the Performance set as the set of dependent variables.
The three central moments \(\mu _2\), \(\mu _3\), \(\mu _4\) of DVC in the six evaluation datasets
\(\mu _2\) | \(\mu _3\) | \(\mu _4\) | |
---|---|---|---|
CIME 64d-240K | 81.826 | \(-\)660.770 | 17,691 |
SIFT 128d-1M | 189.090 | 3316.6 | 1.619e+05 |
CEDD 144d-240K | 1,006 | \(-\)0.959 | 2.607 |
GIST 960d-1M | 2.973e+05 | 1.718e+08 | 3.515e+11 |
C-SIFT 1019d-700K | 4.673e+06 | 7.922e+09 | 8.176e+13 |
SURF 5000d-240K | 1.451e+05 | 1.666e+09 | 3.250e+13 |
The Performance set\(=\)\(\{\mathrm{mAP, SF} \}\) consists of the mAP and SF variables of the examined LSH, SKLSH, PCA-ITQ, SPH and MSIDX similarity search strategies, expressing how well each similarity search strategy preserves the Euclidean neighbors of sequential search and the method’s speedup factor compared to the linear time of sequential search, calculated according to (2) and (3), respectively.
4.3 Canonical correlations
Canonical correlation coefficient\(R_c\) is the measure of the strength of the overall relationships between the linear composites (canonical variates) for the \(\{\mathrm{mAP, SF} \}\) dependent and the \(\{ \mu _2, \mu _3, \mu _4\}\) independent variables. In effect, it represents the bivariate correlation between the two canonical variates and according to [14] it is equal to the squared root of the first eigenvalue of matrix \(K\) defined in (9), \(R_c=\sqrt{\lambda _1}\), containing a large amount of variance of the examined variables.
Canonical variates, \(\mathrm{DVC}_s\) and \(\mathrm{Perf}_s\), are in general synthetic sets of variables, which are the linear combinations that represent the weighted sum of two or more variables and can be defined for either dependent or independent variables. In our case, we have two variates: \(\mathrm{DVC}_s\) for the DVC Set and \(\mathrm{Perf}_s\) for the Performance Set, respectively.
Canonical loadings\(r\) are measures of the simple linear correlation between each of the independent \(\{ \mu _2,\mu _3, \mu _4\}\) or dependent \(\{\mathrm{mAP, SF} \}\) variables with their corresponding canonical variates (i.e. \(\mathrm{DVC}_s\) and \(\mathrm{Perf}_s\) ). As an example the canonical loading \(r_\mathrm{mAP}\) represents the correlation of the mAP variable with the \(\mathrm{Perf}_s\) canonical variate of which it is member. The larger the coefficient, the more important it is in deriving the canonical variate.
Canonical cross-loadings\(rc\) measure the correlation of each observed independent or dependent variable with the opposite canonical variate. As an example, \({rc}_{\mu _2-\mathrm{Perf}_s}\) encodes the correlation of the independent \(\mu _2\) variable with the dependent \(\mathrm{Perf}_s\) canonical variate.
5 Experimental results
5.1 Roadmap
In this section, we examine the notion of DVC from different perspectives. In Sect. 5.2, the results of our CCA are discussed. Section 5.3 presents the impact of eliminating the low-DVC dimensions and finally, in Sect. 5.4 an energy-based study of DVC is presented to further support our findings.
5.2 CCA results
- 1.
The generated statistical models of the LSH, SKLSH, PCA-ITQ, SPH and MSIDX similarity search strategies based on CCA were statistically significant according to Wilk’s \(\Lambda \) statistic [24]. Accordingly, we can reject the null hypothesis that there is no relationship between the two variable sets, in all five models of LSH, SKLSH, PCA-ITQ, SPH and MSIDX. The result, that answers contribution C3, can be interpreted as follows: there is a linear correlation between the images descriptors’ DVC and the performance of the similarity search strategies, expressed by the \(R_c\) canonical correlation coefficient between the \(\mathrm{DVC}_s\) and \(\mathrm{Perf}_s\) variates.
- 2.
In all five statistical models, the \(\mu _2,\mu _3,\mu _4\) variables are highly correlated to the \(\mathrm{DVC}_s\) and \(\mathrm{Perf}_s\) variates, denoted by \(r_{\mu _2}\), \(r_{\mu _3}\), \(r_{\mu _4}>0\) and \({rc}_{\mu _2-\mathrm{Perf}_{s}}\), \({rc}_{\mu _3-\mathrm{Perf}_{s}}\), \({rc}_{\mu _4-\mathrm{Perf}_{s}}>0\). This observation reflects the high contribution of \(\mu _2\), \(\mu _3\), \(\mu _4\) central moments to the overall performance of each similarity search strategy.
- 3.
The SF, mAP variables are negatively correlated to the \(\mathrm{Perf}_s\) variate, denoted by \(r_\mathrm{mAP}>0\) and \(r_\mathrm{SF}<0\) for the hashing methods and \(r_\mathrm{mAP}<0\) and \(r_\mathrm{SF}>0\) for MSIDX, expressing the trade-off between SF and mAP of the similarity search strategies.
- 4.
For all five similarity search strategies, the \(|rc_{\mathrm{SF-DVC}_s}|\) canonical cross-loading is low, which means that there is a weak correlation between the SF variable and the \(\mathrm{DVC}_s\) variate. This happens because the choices of (a) #bits and #hash tables for the hashing methods and (b) parameter \(w\) for MSIDX solely influence the performance of the similarity search strategies in terms of SF and they are not affected by image descriptors’ DVC. However, since SF is strongly correlated to the mAP variable, it is important to preserve SF in our analysis.
- 5.
For LSH, SKLSH, PCA-ITQ and SPH, it holds that \(rc_{\mathrm{mAP-DVC}_s}>0\), which means that the NN search accuracy (mAP) of the examined hashing methods is positively correlated to the DVC set of variables. This happens because the efficient encoding of image data into binary codes through hash functions depends on image descriptors’ DVC high covariance, asymmetry and “peakedness”.
- 6.
For MSIDX, the \(rc_{\mathrm{mAP-DVC}_s}=-0.476\) canonical cross-loading indicates that there is a negative correlation between MSIDX’s mAP and the DVC set of variables. Based on this observation, we conclude that in the MSIDX method mAP is increased when the DVCs of image descriptors have low covariance, asymmetry and “peakedness”, which can be achieved by a dataset of very high DVC values in all its dimensions. This happens because MSIDX does not perform any projection of the image descriptors’ dimensions into lower dimensional space, as the hashing methods do, and thus all dimensions contribute to the final performance.
Experimental results of CCA for evaluating the performance of LSH, SKLSH, PCA-ITQ, SPH and MSIDX on the six evaluation datasets
LSH | SKLSH | PCA-ITQ | SPH | MSIDX | |
---|---|---|---|---|---|
Wilk’s \(\Lambda \) | 0.121 (\(p<\) 0.001) | 0.018 (\(p<\) 0.001) | 0.187 (\(p<\)0.001) | 0.822 (\(p<\) 0.004) | 0.525 (\(p<\) 0.001) |
Can. Load. \(r \mathrm{DVC}_s\) | |||||
\(r_{\mu _2}\) | 0.475 | 0.431 | 0.558 | 0.656 | 0.440 |
\(r_{\mu _3}\) | 0.306 | 0.502 | 0.196 | 0.867 | 0.595 |
\(r_{\mu _4}\) | 0.061 | 0.036 | 0.167 | 0.965 | 0.732 |
Can. Cross-Load. \(rc \mathrm{DVC}_s\)-\(\mathrm{Perf}_s\) | |||||
\({rc}_{\mu _2-\mathrm{Perf}_s}\) | 0.445 | 0.427 | 0.503 | 0.239 | 0.303 |
\({rc}_{\mu _3-\mathrm{Perf}_s}\) | 0.286 | 0.497 | 0.177 | 0.316 | 0.410 |
\({rc}_{\mu _4-\mathrm{Perf}_s}\) | 0.057 | 0.036 | 0.150 | 0.351 | 0.504 |
Can. Load. \(r \mathrm{Perf}_s\) | |||||
\(r_{mAP}\) | 0.973 | 0.996 | 0.946 | 0.894 | \(-\)0.678 |
\(r_{SF}\) | \(-\)0.074 | \(-\)0.014 | \(-\)0.072 | \(-\)0.182 | 0.018 |
Can. Cross-Load. \(rc \mathrm{Perf}_s-\mathrm{DVC}_s\) | |||||
\(rc_{\mathrm{mAP-DVC}_s}\) | 0.911 | 0.987 | 0.852 | 0.325 | \(-\)0.476 |
\({rc}_{\mathrm{SF-DVC}_s}\) | \(-\)0.069 | \(-\)0.014 | \(-\)0.065 | \(-\)0.066 | 0.0120 |
Can. Correlation Coefficient | |||||
\(R_c\) | 0.936 | 0.991 | 0.900 | 0.364 | 0.688 |
5.3 Elimination of low-DVC dimensions
mAPDif (%) for hashing methods, by eliminating a number of low-DVC dimensions (per dataset) that preserve the sequential search over 90 % mAP (dashed line in Fig. 6)
8 bits (%) | 16 bits (%) | 32 bits (%) | 64 bits (%) | 128 bits (%) | 256 bits (%) | 512 bits (%) | 1,024 bits (%) | |
---|---|---|---|---|---|---|---|---|
LSH | ||||||||
CIME 64d-240K | 0.76 | \(-\)0.51 | 0.68 | \(-\)1.19 | 0.38 | 0.73 | 1.31 | 0.65 |
SIFT 128d-1M | 0.10 | \(-\)0.49 | \(-\)1.26 | 2.02 | 0.19 | 0.08 | 0.48 | 0.74 |
CEDD 144d-240K | 0.30 | 1.11 | \(-\)0.29 | \(-\)0.05 | \(-\)1.27 | 0.41 | \(-\)0.25 | \(-\)0.03 |
GIST 960d-1M | 0.07 | \(-\)0.13 | \(-\)0.14 | \(-\)0.34 | 0.16 | \(-\)0.29 | 0.16 | 0.39 |
C-SIFT 1019d-700K | 0.14 | 0.30 | \(-\)0.13 | 0.10 | 0.67 | \(-\)0.11 | \(-\)0.09 | 0.40 |
SURF 5000d-240K | 0.10 | \(-\)0.20 | 0.46 | \(-\)0.03 | 0.28 | 0.10 | \(-\)0.60 | \(-\)0.36 |
SKLSH | ||||||||
CIME 64d-240K | 0.44 | \(-\)0.13 | \(-\)0.08 | \(-\)0.39 | \(-\)0.09 | 0.07 | 0.01 | \(-\)0.03 |
SIFT 128d-1M | 0.04 | 0.00 | \(-\)0.07 | \(-\)0.03 | 0.00 | 0.01 | \(-\)0.03 | 0.05 |
CEDD 144d-240K | 0.09 | 0.02 | \(-\)0.01 | 0.03 | 0.03 | \(-\)0.01 | 0.00 | \(-\)0.10 |
GIST 960d-1M | 0.25 | \(-\)0.10 | 0.64 | 0.22 | \(-\)0.47 | \(-\)0.97 | \(-\)1.58 | \(-\)0.29 |
C-SIFT 1019d-700K | 0.05 | 0.30 | \(-\)0.13 | 0.06 | 0.32 | 1.76 | \(-\)2.45 | 0.48 |
SURF 5000d-240K | 0.16 | \(-\)0.08 | 0.15 | 0.02 | 0.04 | 0.23 | \(-\)0.21 | 0.47 |
PCA-ITQ | ||||||||
CIME 64d-240K | 0.02 | 0.56 | 0.74 | |||||
SIFT 128d-1M | \(-\)0.01 | 0.24 | 0.54 | \(-\)0.45 | ||||
CEDD 144d-240K | \(-\)0.34 | 0.73 | \(-\)0.39 | 0.06 | ||||
GIST 960d-1M | \(-\)0.10 | 0.11 | 0.04 | 0.19 | 0.51 | 0.72 | 1.03 | |
C-SIFT 1019d-700K | \(-\)0.45 | \(-\)0.10 | 0.36 | 0.99 | 0.94 | 1.01 | 1.14 | |
SURF 5000d-240K | 0.04 | \(-\)0.04 | \(-\)0.01 | \(-\)0.13 | 0.10 | 0.05 | \(-\)0.06 | 0.00 |
SPH | ||||||||
CIME 64d-240K | \(-\)0.10 | \(-\)0.22 | 0.25 | 0.62 | \(-\)0.03 | 1.15 | 1.68 | 1.48 |
SIFT 128d-1M | 0.19 | \(-\)0.29 | 0.78 | 0.25 | \(-\)0.15 | 1.39 | 0.67 | 0.80 |
CEDD 144d-240K | \(-\)0.19 | 0.64 | \(-\)0.18 | 0.11 | \(-\)0.39 | 0.80 | 0.89 | 0.99 |
GIST 960d-1M | 0.41 | 0.16 | 0.44 | 0.40 | 2.50 | 0.30 | 0.08 | 0.58 |
C-SIFT 1019d-700K | 0.41 | \(-\)0.16 | \(-\)0.66 | 1.02 | 0.46 | 1.98 | 1.07 | 0.96 |
SURF 5000d-240K | \(-\)0.35 | \(-\)0.99 | 0.18 | 0.14 | \(-\)0.54 | \(-\)0.70 | 0.34 | 0.05 |
mAPDif (%) for MSIDX, by eliminating a number of low-DVC dimensions (per dataset) that preserve the sequential search over 90 % mAP
2.5 W (%) | 5 W (%) | 7.5 W (%) | 10 W (%) | 12.5 W (%) | 15 W (%) | 17.5 W (%) | 20 W (%) | 22.5 W (%) | 25 W (%) | |
---|---|---|---|---|---|---|---|---|---|---|
CIME 64d-240K | \(-\)0.08 | 0.10 | 0.02 | 0.19 | 0.34 | 0.56 | 0.92 | 1.29 | 1.80 | 2.46 |
SIFT 128d-1M | 0.01 | 0.02 | 0.06 | 0.17 | 0.34 | 0.65 | 1.15 | 1.79 | 2.56 | 3.34 |
CEDD 144d-240K | 0.00 | 0.03 | 0.20 | 0.40 | 0.56 | 0.62 | 0.94 | 1.21 | 1.74 | 2.74 |
GIST 960d-1M | 0.00 | 0.02 | 0.07 | 0.17 | 0.42 | 0.86 | 1.31 | 1.82 | 2.30 | 2.88 |
C-SIFT 1019d-700K | 0.04 | 0.09 | 0.16 | 0.24 | 0.34 | 0.45 | 0.58 | 0.74 | 0.94 | 1.21 |
SURF 5000d-240K | 0.44 | 0.49 | 0.51 | 0.51 | 0.56 | 0.73 | 0.90 | 1.14 | 1.41 | 1.74 |
mAPDif (%) for hashing methods by eliminating the 50 % of the low-DVC dimensions
8 bits (%) | 16 bits (%) | 32 bits (%) | 64 bits (%) | 128 bits (%) | 256 bits (%) | 512 bits (%) | 1,024 bits (%) | |
---|---|---|---|---|---|---|---|---|
LSH | ||||||||
CIME 64d-240K | \(-\)0.07 | 1.77 | \(-\)0.89 | \(-\)0.14 | 0.50 | 3.72 | 5.26 | 6.64 |
SIFT 128d-1M | \(-\)0.91 | \(-\)0.53 | \(-\)1.00 | 2.58 | 4.52 | 9.44 | 14.94 | 19.37 |
CEDD 144d-240K | \(-\)0.74 | \(-\)0.48 | \(-\)0.94 | \(-\)2.13 | 0.25 | \(-\)0.02 | 1.60 | 2.40 |
GIST 960d-1M | 0.17 | \(-\)0.16 | 0.05 | 0.28 | 0.57 | 0.60 | 1.33 | 2.34 |
C-SIFT 1019d-700K | \(-\)0.18 | \(-\)0.39 | \(-\)0.43 | \(-\)0.18 | \(-\)0.35 | 0.32 | 0.69 | 1.09 |
SURF 5000d-240K | 0.02 | \(-\)0.22 | \(-\)0.36 | \(-\)0.44 | \(-\)0.68 | \(-\)0.90 | \(-\)0.99 | \(-\)0.82 |
SKLSH | ||||||||
CIME 64d-240K | \(-\)0.50 | 0.24 | 0.05 | 0.10 | 0.10 | 0.08 | 0.00 | \(-\)0.20 |
SIFT 128d-1M | 0.01 | 0.02 | 0.01 | 0.03 | 0.02 | \(-\)0.03 | 0.01 | 0.01 |
CEDD 144d-240K | \(-\)0.02 | \(-\)0.03 | \(-\)0.07 | \(-\)0.15 | \(-\)0.23 | \(-\)0.53 | \(-\)0.74 | \(-\)1.16 |
GIST 960d-1M | 1.96 | \(-\)0.16 | 0.39 | \(-\)0.65 | \(-\)0.93 | \(-\)1.02 | \(-\)1.61 | 0.83 |
C-SIFT 1019d-700K | 0.09 | 0.07 | 0.03 | \(-\)1.25 | \(-\)0.02 | 0.58 | 1.10 | 2.71 |
SURF 5000d-240K | \(-\)0.01 | 0.07 | \(-\)0.17 | 0.05 | \(-\)0.53 | 0.17 | \(-\)0.32 | \(-\)0.05 |
PCA-ITQ | ||||||||
CIME 64d-240K | 0.31 | 0.82 | 1.61 | |||||
SIFT 128d-1M | 0.51 | 1.40 | 3.81 | 6.02 | ||||
CEDD 144d-240K | \(-\)0.13 | 0.43 | 0.35 | 1.08 | ||||
GIST 960d-1M | \(-\)0.06 | 0.40 | 0.61 | 1.05 | 1.61 | 1.56 | ||
C-SIFT 1019d-700K | \(-\)0.24 | 0.51 | 0.74 | 1.15 | 2.97 | 2.88 | ||
SURF 5000d-240K | \(-\)0.06 | 0.02 | 0.22 | \(-\)0.03 | 0.64 | 0.88 | 0.77 | 2.35 |
SPH | ||||||||
CIME 64d-240K | \(-\)0.23 | 1.37 | 1.21 | 2.74 | 4.65 | 6.08 | 8.14 | 9.48 |
SIFT 128d-1M | \(-\)0.13 | 0.13 | 1.26 | 3.51 | 6.32 | 9.40 | 12.77 | 13.08 |
CEDD 144d-240K | 0.22 | 0.76 | 0.29 | 1.63 | 2.02 | 3.48 | 2.93 | 3.99 |
GIST 960d-1M | 0.12 | 0.17 | 0.07 | \(-\)0.20 | \(-\)0.22 | 0.00 | \(-\)0.40 | \(-\)0.59 |
C-SIFT 1019d-700K | 0.19 | 1.11 | 1.01 | 1.84 | 2.58 | 2.91 | 3.60 | 4.18 |
SURF 5000d-240K | 0.13 | 0.12 | 0.29 | 0.37 | \(-\)0.37 | \(-\)0.22 | 1.09 | 0.12 |
mAPDif (%) for MSIDX by eliminating the 50 % of the low-DVC dimensions
2.5 W (%) | 5 W (%) | 7.5 W (%) | 10 W (%) | 12.5 W (%) | 15 W (%) | 17.5 W (%) | 20 W (%) | 22.5 W (%) | 25 W (%) | |
---|---|---|---|---|---|---|---|---|---|---|
CIME 64d-240K | 0.28 | 0.74 | 1.67 | 3.10 | 5.08 | 7.01 | 9.24 | 11.47 | 14.32 | 16.88 |
SIFT 128d-1M | 0.61 | 3.37 | 7.84 | 12.92 | 18.17 | 22.99 | 27.30 | 31.11 | 34.17 | 36.87 |
CEDD 144d-240K | 0.47 | 1.48 | 2.54 | 3.50 | 4.55 | 5.63 | 6.64 | 7.55 | 8.55 | 9.63 |
GIST 960d-1M | 0.07 | 0.56 | 1.62 | 2.94 | 4.50 | 6.10 | 7.38 | 8.43 | 9.79 | 11.00 |
C-SIFT 1019d-700K | 0.06 | 0.19 | 0.43 | 0.85 | 1.45 | 2.27 | 3.28 | 4.46 | 5.67 | 7.03 |
SURF 5000d-240K | 1.81 | 2.31 | 3.18 | 4.40 | 5.99 | 7.70 | 9.89 | 11.77 | 13.81 | 15.79 |
Finally, we calculated the Pearson correlation between DVC and variance and found that they are positively correlated (\(p<0.05\)) in the six evaluation datasets, which means that dimensions with high-/low-DVC have high/low variance, respectively. To exclude the possibility that the low mAPDif values of the hashing methods in Table 5 (the case of eliminating the 50 % of the dimensions with low-DVC) were generated due to the low variance in these dimensions and not due to the low-DVC exclusively, we repeated the experiments in two additional synthetic datasets. In particular, to decorrelate DVC and variance we generated two synthetic datasets of 100K vectors with 512 (SYNTH 512d-100K) and 1,024 dimensions (SYNTH 1,024d-100K), by preserving similar variance for all dimensions and monotonically increasing the DVC values for different dimensions, i.e. \(\forall \) dimension \(i>j\), \(\mathrm{DVC}_i>\mathrm{DVC}_j\). The two synthetic datasets were generated as follows. The possible values in the dimensions of the synthetic vectors were limited to the range of [0 1]. Then, an equal-width binning method for each dimension was performed, where different dimensions had different number of bins, with the number of bins of the \(i\)-th dimension being equal to the DVC value of the \(i\)-th dimension. Finally, the 100K possible values in both synthetic datasets were assigned to the bins (\(=\) DVC) of each dimension, using a uniform distribution function, ensuring thus that the dimensions have similar variance and different DVC. In doing so, in both synthetic datasets DVC and variance were not correlated at 0.05 level. In both datasets, the variance was in the range of [0.080–0.086], whereas the DVC was in the range of [73–584] in SYNTH 512d-100K and [53-1,076] in the SYNTH 1,024d-100K dataset. In both synthetic datasets, we repeated the experiment of Table 5, by eliminating the 50 % of the low-DVC dimensions, where we observed that mAP was preserved, i.e. small mAPDif values, similar to the reported values in Table 5. For instance, in the case of LSH with {8, 16, 32, 64, 128, 256, 512, 1024} bits, the mAPDif values were (\(-\)0.06, 0.03, 0.05, 0.11, 0.40, 1.05, 2.85, 6.32 %) and (\(-\)0.02, 0.03, 0.03, \(-\)0.05, 0.18, 0.30, 1.05, 2.61 %) in SYNTH 512d-100K and SYNTH 1,024d-100K datasets, respectively. According to the results of this experiment, we concluded that even in the case of decorrelating DVC and variance, dimensions with low-DVC have low contribution to the overall performance of the hashing methods.
- 1.
Storage: Current multimedia similarity search systems are required to store and handle huge volumes of content. A decrease of e.g. 50 % of the dimensions of the descriptors (see Tables 3, 4, 5, 6) may reduce the dataset sizes up to 50 % and thus increase the capacity of the system.
- 2.
Preprocessing: The preprocessing of multi-dimensional data of high volumes requires extremely high processing power. Moreover, in the case of periodic retraining of similarity search methods (as it may occur due to the significant increase of the dataset) the preprocessing time becomes prohibitive. In our experiments, we observed that an elimination of 50 % of the SURF low-DVC dimensions decreased SPH’s preprocessing time to approximately 50 %, compared to the preprocessing cost that SPH required for the initial SURF dataset. This happens because the elimination of low-DVC dimensions results in smaller datasets with negligible computational burden, avoiding thus unnecessary computations.
- 3.
Speedup: In the case of hashing methods, the query time SF is preserved, since the size of the binary codes is maintained. However, for MSIDX this is not the case. MSIDX performs distance computations among the \(2\times w\) image descriptors and thus an elimination of the dimensionality of the descriptors significantly reduces the computation time in terms of SF.
5.4 Energy-based study of DVC
The results of Fig. 8 explain the outcomes of our previous experiments, since the higher the energy is the more information is preserved. An interesting observation is also that the curves for the randomly eliminated dimensions fall always in-between the low-DVC and high-DVC curves, with the high-DVC elimination curves preserving the less possible energy. One may interpret this as follows: by removing the high-DVC dimensions from the dataset, the most informative features (dimensions) are lost.^{6}
6 Conclusions and discussion
Our aim in this paper was to introduce DVC in the multimedia community and to motivate researchers to consider DVC in the design and evaluation phase of large-scale similarity search strategies due to the following highly desired characteristics: (a) descriptor extraction methods tend to produce DVC distributions from the same distribution family irrespective of the datasets’ sizes and thus similarity search strategies that exploit DVC can scale, (b) as it was experimentally shown in our CCA, the DVCs of image descriptors have strong impact on similarity search strategies and (c) the elimination of low-DVC dimensions of a descriptor vector has minor impact on the mAP performance in similarity search strategies.
6.1 A practical guide
As a general, practical guide for the interested researchers, the following steps are recommended. Initially, a training set or the full multi-dimensional dataset is required to compute the DVC for each dimension, as was explained in Sect. 2.2. The output of the first stage is a vector of the same length with the descriptor vectors. Then, the entries of the DVCs vector should be sorted in a descending order and the sorting index can be used as a priority index for each dimension.
Otherwise, since most of the similarity search techniques follow an approximate approach, the dimensions with high priority, i.e. high-DVC values, should be either weighted more or quantized with extra bits, or in general preserve more information than the dimensions of low-DVC values do. The level of weighting between high-DVC and low-DVC is an open research issue depending on each specific case.
6.2 The use of DVC in other applications of image descriptors
Apart from the similarity search, image descriptors are used in a wide range of applications such as image clustering [1, 9, 26, 42], image annotation [22, 29, 44], image registration [7, 11, 33] and object recognition [6, 35, 45]. The examination of the impact of DVCs in such applications requires further in-depth analysis. Here, we present some key characteristics for future research.
All of the aforementioned applications use a distance or a similarity metric to determine whether two or more image descriptors represent the same object, or belong to the same cluster or are in some way related. As such, the DVC characteristics of the selected image descriptor and specifically each of its dimensions have a contribution to the measured distances and thus to the overall performance of the application. The contribution of each dimension to the final performance is associated with the processing steps that each method follows.
Applications of image descriptors may be classified as approximate or exact ones. For the applications that compute exact distances on the descriptors, without the usage of any indexing technique or other approach that prunes the search space, there is no direct benefit of the knowledge of DVC characteristics. On the other hand, for the applications that perform approximate approaches such as indexing with hashing methods, spectral clustering or vector quantization, the exploitation of DVCs may have clear benefits. As such, the influence of DVCs to such applications are worth to examine.
6.3 Future work
Apart from the hashing methods and MSIDX, many other strategies have been proposed in the literature for efficient large-scale similarity search in image databases:
Vantage indexing for large-scale similarity search, such as the work of [37], aims to increase SF, by selecting a small set of reference/vantage multimedia objects, such as images, based on which the rest of objects are compared to, in order to retrieve the \(k\) most similar results. Thus, by avoiding the all-to-all comparison, the search space is pruned resulting in the increase of SF. To achieve high mAP, the key idea is the accurate definition of the criteria to assess the quality of the selected vantage objects. However, in the aforementioned criteria, the DVCs of image descriptors have not been considered yet.
Dimensionality reduction methods, such as the work of [18], aim at mapping the original data into a much lower dimensional subspace. An index can then be built on the subspace to further facilitate the image similarity search. The main idea is to transform data from a high-dimensional space to a lower dimensional one without losing much information. Many dimensionality reduction methods have been proposed, including global and local methods. Global dimensionality reduction methods map the dataset as a whole down to a suitable and lower dimensional subspace. Local dimensionality reduction methods firstly divide the whole dataset into correlated clusters, each of which is then reduced to their respective subspaces by classical PCA or other methods. The dimensionality reduction methods reduce image descriptors’ dimensionality into a smaller number by removing insignificant dimensions. Since most image descriptors do not preserve complete information, nearest neighbor search accuracy may be compromised. Demand of high mAP accuracy often limits the number of dimensions to be removed, thus limiting the performance. A possible future direction for the dimensionality reduction methods is to consider the DVCs of image descriptors, assuming that dimensions of high-DVC have more discriminative power, and thus contain more valuable information.
Data Co-Reduction methods, such as the work of [17], achieve simultaneous data reduction on both data size and image descriptors dimensionality. This is possible by assuming that there is a subset of dimensions that may have very close values for a subset of image descriptors and similarly, a subset of image descriptors may have very similar values along their dimensions. However, in the data co-reduction methods, the impact of image descriptors’ DVCs is still omitted.
We hope that our analysis will help researchers in the design of large-scale similarity search strategies in image and other multimedia databases, as well as in other applications of image descriptors.
Footnotes
- 1.
- 2.
- 3.
- 4.
In the PCA-ITQ method, due to the PCA’s eigen-decomposition, we also satisfied the condition of #bits\(< d\), where \(d\) is the dimensionality of each evaluation dataset.
- 5.
The first central moment \(\mu _1\) of mean \(\mu \) is discarded in our analysis, because by definition it is always equal to 0 and thus, based on Wilk’s \(\Lambda \) statistic [24] \(\mu _1\) generates a statistical insignificant model of CCA in the examined methods.
- 6.
Notes
Acknowledgments
This work was partially supported by the EC FP7 funded project CUBRIK, ICT- 287704 (http://www.cubrikproject.eu).
References
- 1.Agrawal R, Wu C, Grosky WI, Fotouhi F (2007) Image clustering using visual and text keywords. Computational Intelligence in Robotics and automation, CIRA 2007. International Symposium on, pp. 49,54, 20–23 June 2007Google Scholar
- 2.Bauer C, Radhakrishnan R, Jiang W (2010) Optimal configuration of hash table based multimedia fingerprint databases using weak bits. In: Proc. of IEEE International Conference on Multimedia and Expo (ICME), pp. 1672–1667Google Scholar
- 3.Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. Comput. Vis. Image Underst. (CVIU) 110(3):346–359CrossRefGoogle Scholar
- 4.Chatzichristofis SA, Boutalis YS (2008) CEDD: Color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: ICVS, vol. 5008 of Lecture Notes in Computer Science, Springer, pp 312–322Google Scholar
- 5.Daintith J, Wright E (2008) Hamming space. In: A dictionary of computing. Oxford University Press. Retrieved 30 Oct 2014, from http://www.oxfordreference.com/view/10.1093/acref/9780199234004.001.0001/acref-9780199234004-e-2303
- 6.Due Trier Ø, Jain AK, Taxt T (1996) Feature extraction methods for character recognition–a survey. Pattern Recog 29(4):641–662 ISSN 0031–3203CrossRefGoogle Scholar
- 7.Fan B, Wu F, Hu Z (2012) Rotationally invariant descriptors using intensity order pooling. Pattern Anal Mach Intel IEEE Trans 34(10):2031–2045CrossRefGoogle Scholar
- 8.Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of International Conference on Very large data bases (VLDB), pp 518–529Google Scholar
- 9.Goldberger J, Gordon S, Greenspan H (2006) Unsupervised image-set clustering using an information theoretic framework. Image Process IEEE Trans 15(2):449–458CrossRefGoogle Scholar
- 10.Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans PAMI 35(12):2916–2929Google Scholar
- 11.Griffith EJ, Yuan C, Jump M, Ralph JF (2013) Equivalence of BRISK descriptors for the registration of variable bit-depth aerial imagery. In: 2013 IEEE international conference on systems, man, and cybernetics (SMC), pp 2587–2592, 13–16 Oct 2013Google Scholar
- 12.Heo JP, Lee Y, He J, Chang S, Yoon S (2012) Spherical hashing. In: Proceedings of CVPR, pp 2957–2964Google Scholar
- 13.He J, Radhakrishnan R, Chang S-F, Bauer C (2011) Compact hashing with joint optimization of search accuracy and time. In: Proceedings of CVPR, pp 753–760Google Scholar
- 14.Hotelling H (1936) Relations between two sets of variables. Biometrika 28:312–377CrossRefGoogle Scholar
- 15.
- 16.
- 17.Huang Z, Shen HT, Liu J, Zhou X (2011) Effective data co-reduction for multimedia similarity search. In: Proceedings of ACM SIGMOD, pp 1021–1032Google Scholar
- 18.Huang Z, Shen HT, Shao J, Ruger SM, Zhou X (2008) Locality condensation: a new dimensionality reduction method for image retrieval. In: Proceedings of ACM Multimedia, pp 219–228Google Scholar
- 19.Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans PAMI 33(1):117–128CrossRefGoogle Scholar
- 20.Joly A, Buisson O (2011) Random maximum margin hashing. In: Proceedings of the CVPR’11 - IEEE computer vision and pattern recognition, Jun 2011. IEEE, Colorado Springs, US, pp 873–880Google Scholar
- 21.Lai PL, Fyfe C (2000) Kernel and nonlinear canonical correlation analysis. Int J Neural Syst 10(5):365–377CrossRefGoogle Scholar
- 22.Liu C, Yuen J, Torralba A (2009) Nonparametric scene parsing: label transfer via dense scene alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, Miami, US, pp 1972–1979Google Scholar
- 23.Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110CrossRefGoogle Scholar
- 24.Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic PressGoogle Scholar
- 25.Massey FJ (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):6878CrossRefGoogle Scholar
- 26.Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Proceedings of NIPSGoogle Scholar
- 27.Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATHGoogle Scholar
- 28.Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Proceedings of NIPS, pp 1509–1517Google Scholar
- 29.Russell BC, Torralba A, Liu C, Fergus R, Freeman WT (2007) Object recognition by scene alignment. In: NIPSGoogle Scholar
- 30.sglab.kaist.ac.kr\_Hashing/Google Scholar
- 31.Song J, Yang Y, Huang Z, Shen H-T, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on Multimedia (MM ’11). ACM, New York, NY, USA, pp 423–432Google Scholar
- 32.Stehling RO, Nascimento MA, Falcao AX (2002) A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proceedings of CIKMGoogle Scholar
- 33.Szeliski R (2006) Image alignment and stitching: a tutorial. Found Trends Comput Graph Comput Vis 2(1)Google Scholar
- 34.Tiakas E, Rafailidis D, Dimou A, Daras P (2013) MSIDX: multi-sort indexing for efficient content-based image search and retrieval. IEEE Trans Multimed 15(6):1415–1430CrossRefGoogle Scholar
- 35.Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis Springer 104(2):154–171CrossRefGoogle Scholar
- 36.Van De Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans PAMI 32(9):1582–1596CrossRefGoogle Scholar
- 37.Van Leuken RH, Veltkamp RC (2011) Selecting vantage objects for similarity indexing. ACM TOMCCAP 7(3):16Google Scholar
- 38.Wang J, Kumar S, Chang S-F (2010) Semisupervised hashing for scalable image retrieval. In: Proceedings of CVPR, pp 3424–3431Google Scholar
- 39.Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of NIPS, pp 1753–1760Google Scholar
- 40.Yan J, Liu N, Yan S, Yang Q, Fan W, Wei W, Chen Z (2011) Trace-oriented feature analysis for large-scale text data dimension reduction. Knowl Data Eng IEEE Trans 23(7):1103–1117Google Scholar
- 41.Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of ACM MIR, pp 197–206Google Scholar
- 42.Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’09). ACM, New York, NY, USA, pp 907–916Google Scholar
- 43.Yan J, Liu N, Zhang B, Yan S, Chen Z, Cheng Q, Fan W, Ma W-Y (2005) OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR ’05. ACM, New York, NY, USA, pp 122–129Google Scholar
- 44.Zhang D, Islam MM, Lu G (2012) A review on automatic image annotation techniques. Pattern Recog, 45(1), pp 346–362, ISSN 0031–3203, http://dx.doi.org/10.1016/j.patcog.2011.05.013
- 45.Zitov B, Flusser J (2003) Image registration methods: a survey. Image Vis Comput 21(11):977–1000. ISSN 0262-8856Google Scholar