Intrinsic spatial pyramid matching for deformable 3D shape retrieval
 1.3k Downloads
 28 Citations
Abstract
In this paper, we present an intrinsic spatial pyramid matching approach for 3D shape retrieval. Motivated by the fact that the second eigenfunction of Laplace–Beltrami operator not only can capture the global topological structure information, but also is intrinsic, we propose to adopt its level sets as cuts to perform surface partition. The resulting matching scheme is able to consistently estimate the approximate global geometric correspondence among 3D shapes. In particular, we can leverage recent developments in intrinsic shape analysis and perform intrinsic spatial pyramid matching based on dense spectral shape descriptors such as scaleinvariant heat kernel signature. Our experiments demonstrate a significant improvement of 3D shape retrieval on two standard benchmarks.
Keywords
Intrinsic partition Diffusion geometry Eigenfunction Shape retrieval1 Introduction
Stateoftheart image recognition algorithms usually adopt a local patch based, multiplelayer pipeline to obtain a good representation. These methods start from local image patches using either normalized raw pixel density or descriptors such as the scaleinvariant feature transform (SIFT) [1] or the histogram of oriented gradients (HOG) [2], and encode them into an overcomplete representation using various algorithms such as the \(k\)means or sparse coding. After coding, global image representations are formed by spatially pooling the coded local descriptors. The methods following such a pipeline have achieved competitive performance on image classification tasks [3]. During the whole procedure, the spatial pooling step brings a substantial performance improvement. One significant milestone in the construction of this arsenal of tools is the spatial pyramid matching (SPM) introduced in [4], which partitions the image into increasingly fine subregions and then computes histograms of local features found inside each subregion. The empirical success of this technique stems from the fact that the spatial cue is integrated, and an approximate geometric matching is actually performed when multiple resolutions are combined in a principled way.
The codebook model, as a simplified version of such a pipeline without spatial pooling, has been also considered for 3D shapes. The basic idea of using a codebook to represent a shape as histograms of occurrences of visual words is commonly referred to in the literature as BagofWords (BoW) or BagofFeatures (BoF) approach. Several authors have introduced such BoF approaches for 3D shape retrieval. Indeed, early research has mainly dealt with the global Euclidean transformations (rigid motion) [5] and multiple views [6]. By defining the visual words on the segmented shape regions, Toldo et al. [7] obtained encouraging shape categorization and retrieval results. Darom et al. [8] achieved stateoftheart retrieval accuracy by designing the local vertexwise features, which are robust to scale changes and partial mesh matching. The codebook model has been shown to be a promising method for partial shape retrieval [8, 9]. Recent efforts have also focused on finding the deformation invariance for nonrigid shapes by replacing the Euclidean metric with its geodesic counterpart [10]. The geodesic distance, however, suffers from strong sensitivity to topological noise, which limits its usefulness in real applications.
This problem is well handled by the tools from the emerging field of diffusion geometry, which provides a generic framework for many intrinsic methods in the analysis of geometric shapes. Diffusion geometry formulates the heat diffusion processes on manifolds. Coifman and Lafon [11] introduced invariant metrics known as diffusion distances, which correspond to the \(L_{2}\)norm difference of energy distribution between two points initialized with unit impulse functions after a given time. The diffusion distance is more robust to topological noise than geodesic one. Reuter et al. [12] adopted the eigenvalues of the Laplace–Beltrami (LB) operator to construct a global shape descriptor, called ShapeDNA. Based on the theoretical works in [13], Lévy [14] showed that the eigenfunctions of LB operator can be well adapted to the geometry and the topology of an object. Later, several spectral descriptors were proposed to characterize the geometric features of a 3D surface [15, 16, 17]. By aggregating these spectral descriptors, the Shape Google algorithm [18, 19] was proposed as a classical method for deformable shape retrieval. It uses the multiscale diffusion heat kernels as “geometric words”, and constructs a compact and informative shape representation by means of the codebook approach.
More recently, there have been several attempts to adapt 2D planar shape contexts [20], popular image feature detectors [21] and descriptors [22], to 3D surfaces. This line of works partially inspires our proposed approach. Another inspiration is due in part to the great success of SPM in the image domain. Spatially enhanced techniques for 3D shape recognition were explored earlier in [23, 24], but these works are not intrinsic, i.e., shape deformations affect the descriptors. “Geometric expressions” [18] was an earlier work that explored the exploitation of intrinsic geometry, but the authors only dealt with the local relative spatial position, by considering the diffusion distance between pairwise vertices. Our approach, on the other hand, models the global absolute spatial positions, which allow us to retain and exploit the information contained in the whole 3D shape.
Our contributions are threefold: (1) we propose to adopt the second eigenfunction of the LB operator in a bid to construct a global surface coordinate system, which is insensitive to shape deformation, (2) we develop a proper generalization of the SPM for surfaces and show a numerical way to construct it, and (3) we experimentally demonstrate that introducing the global spatial context significantly improves the discriminative power of the descriptor in 3D matching and retrieval.
The rest of this paper is organized as follows. Section 2 provides a brief background on the LB operator, its discretization and eigenanalysis, followed by the codebook model. In Sect. 3, we propose the intrinsic spatial pyramid matching (ISPM) approach. Experimental results on two 3D datasets are presented in Sect. 4. Finally, we conclude and point out future work directions in Sect. 5.
2 Background
2.1 Laplace–Beltrami operator
2.2 Discretization
Assume that the surface \(\mathbb{M }\) is approximated by a triangular mesh. A triangle mesh \(\mathbb{M }\) may be defined as \(\mathbb{M }=(\mathcal{V },\mathcal{E })\) or \(\mathbb{M }=(\mathcal{V },\mathcal{T })\), where \(\mathcal{V }=\{{\varvec{v}}_{1},\ldots ,{\varvec{v}}_{m}\}\) is the set of vertices, \(\mathcal{E }=\{e_{ij}\}\) is the set of edges, and \(\mathcal{T }=\{{\varvec{t}}_{1},\ldots ,{\varvec{t}}_{n}\}\) is the set of triangles. Each edge \(e_{ij}\) (denoted by \([{\varvec{v}}_{i},{\varvec{v}}_{j}]\) or simply \([i,j]\)) connects a pair of vertices \(\{{\varvec{v}}_{i},{\varvec{v}}_{j}\}\). Two distinct vertices \({\varvec{v}}_{i},{\varvec{v}}_{j}\in \mathcal{V }\) are adjacent (denoted by \({\varvec{v}}_{i}\sim {\varvec{v}}_{j}\) or simply \(i\sim j\)) if they are connected by an edge, i.e., \(e_{ij}\in \mathcal{E }\). The neighborhood (1ring) of a vertex \({\varvec{v}}_{i}\) is the set \({\varvec{v}}_{i}^{\star }=\{{\varvec{v}}_{j}\in \mathcal{V }: i\sim j\}\).
2.3 Eigenanalysis and spectral signatures
Based on the obtained eigenfunctions and eigenvalues, several spectral signatures have been proposed in the literature to describe a single vertex on a surface. Sun et al. [15] introduced the heat kernel signature (HKS) based on the fundamental solution of the heat equation (heat kernel). Its scaleinvariant version (SIHKS) was developed in [17]. Another physically inspired descriptor is the wave kernel signature (WKS), which was proposed in [16]. Unlike the HKS, the WKS separates influences of different frequencies, treating all frequencies equally. These descriptors have been shown to achieve an excellent performance in 3D shape analysis and recognition.
2.4 Bagoffeature model
Given a set of local pointwise signatures densely computed on each vertex on the mesh surface, we quantize the signature space to obtain a compact histogram representation of the shape using the codebook model approach. The geometric word vocabulary in the codebook model may be constructed in various ways, e.g., by approximate kmeans [30] or hierarchical kmeans [31]. We use the simple \(k\)means method, which is also used in the Shape Google algorithm [19]. Thus, the “geometric words” of a vocabulary \(P=\{{\varvec{p}}_k, k = 1, 2,\ldots ,K\}\) are obtained as the \(K\) centroid of \(k\)means clustering in the signature space. From any shape, a specific type of local spectral descriptor \(S=\{{\varvec{s}}_t, t=1,2,\ldots , T \}\) is used for comparison. Obviously, each local descriptor \({\varvec{s}}_t\) (represented as a vector) is associated with its nearest geometric word \( NN ({\varvec{s}}_t)\) in the codebook. By a certain vector coding technique, such as hard counting or ambiguity modeling, each shape will be described by a histogram \(H\). Since the number of vertices is usually different among different meshed shapes, an appropriate normalization technique is essential for the codewordcumulative histogram representation.
3 Intrinsic spatial pyramid matching
3.1 Isocontours
The eigenvalues and eigenfunctions have a nice physical interpretation: the square roots of the eigenvalues \(\sqrt{\lambda _{i}}\) are the eigenfrequencies of the membrane and \(\varphi _{i}(x)\) are the corresponding amplitudes at \(x\). In particular, the second eigenvalue \(\lambda _2\) corresponds to the sound we hear the best [33]. On the other hand, Uhlenbeck [34] showed that the eigenfunctions of the LB operator are Morse functions on the interior of the domain of the operator. Consequently, this generic property of the eigenfunctions gives rise to construction of the associated intrinsic isocurves.
3.2 Intrinsic spatial partition
The level sets of the second eigenfunction have been previously used to extract curve skeletons of nonrigid shapes [35], which is a vivid clue that these isocontours capture the global topological structure of shapes.
3.3 Matching by intrinsic spatial partition
4 Experimental results
The performance of our proposed intrinsic spatial pyramid was evaluated on two datasets, namely SHREC 2011 Benchmark [36] and the TOSCAbased robust shape retrieval database used in [19]. The first dataset is used to validate the discriminative power of ISPM between different shape categories, and the second one is used to test the robustness of ISPM.
4.1 SHREC 2011 database
We performed ISPM based on HKS and SIHKS dense descriptors. These descriptors showed excellent performance with the codebook model in the Shape Google algorithm. The first 150 eigenvalues and eigenvectors of the LB operator on each shape are used. We experimentally select the best parameters for HKS and SIHKS on SHREC 2011 dataset. For HKS, we formulate the diffusion time as \(t = t_0 \alpha ^{\tau }\), where \(\tau \) is sampled from 0 to a given scale \(T\) with a resolution \(1/4\). \(T = 5\), \(t_0 = 0.01\) and \(\alpha = 4\) are set in our case. In order to construct the SIHKS, we use \(t = \alpha ^{\tau }\), where \(\tau \) ranges from 1 to a given scale with finer increments of \(1/16\). \(T = 25\) and \(\alpha = 2\) are chosen. After applying the logarithm, derivative and Fourier transform, all the frequencies are used to obtain the best result.
The computation of the vocabulary is performed offline in advance. To confirm getting optimal results, the clustering is repeated three times, and each by a new set of initial cluster centroid positions. The solution with the lowest value for the sum of distances is returned. The running time depends on the number of the descriptors (number of vertices), the dimension of the descriptor, and the vocabulary size (the number of clusters). Since we simplify our mesh to 2,000 faces for each shape, we have a set of approximately \(6\times 10^5\) descriptors. The vocabulary size is fixed as \(32\), the dimension of HKS is \(21\) and the dimension of SIHKS is \(385\). It is important to point out that we performed the BoF experiments on SHREC2011 with various codeword sizes of 8, 12, 16, 24, 32, 48, 64, 80 and 200. It turns out that different descriptors attain the best results with size 32, and change slightly afterwards. As a result, we fixed the codeword size as 32 for SHREC2011. The running times to obtain the vocabulary for HKS and SIHKS are 1,043 and 9,033 s, respectively.
Performance (DCG) comparison of ISPM and singlelevel partition on SHREC 2011
Spectral descriptor  Level \(L\) (partitions)  Codebook Models  

Traditional  Uncertainty  
Single  Pyramid  Single  Pyramid  
HKS  1(2)  0.8345  0.8277  0.8497  0.8431 
4 (16)  0.8766  0.8721  0.8892  0.8870  
7 (128)  0.8893  0.8878  0.8892  0.8903  
9 (512)  0.8911  0.8902  0.8885  0.8891  
SIHKS  1(2)  0.8457  0.8436  0.8714  0.8661 
4 (16)  0.8688  0.8671  0.8862  0.8853  
7 (128)  0.8778  0.8771  0.8887  0.8890  
9 (512)  0.8798  0.8793  0.8888  0.8890 
Next, let us examine the behavior of the ISPM. For completeness, Table 1 lists the performance achieved using just the highest level of the pyramid (the “single” columns) as well as the performance of the complete matching scheme using multiple levels (the “pyramid” columns). For both HKS and SIHKS, the results improve considerably as we go from \(L = 1\) to a multilevel setup. We do not display the results for \(L=0\) because its highest single level is the same as with its pyramid. Although matching at the highest pyramid level seems to account for most of the improvement, using all the levels together helps provide stable results. For HKS with codeword uncertainty, singlelevel performance actually drops as we go from \(L=7\) to \(L=9\). This means that the highest level of the \(L=7\) pyramid is too finely subdivided, with individual bins yielding few matches. Despite the diminished discriminative power of the highest level, the performance of the entire \(L=9\) pyramid remains essentially identical to that of the \(L=7\) pyramid. Thus, the main advantage of the intrinsic spatial pyramid representation stems from combining multiple resolutions in a principled fashion, and it is robust to failures at individual levels.
In Fig. 5, we show two examples of top nine retrieval results for different methods. There are plenty of examples to demonstrate that our proposed ISPM method improves the Shape Google, we just take two to illustrate the idea. Between each two blue lines in that figure, the upper row is our approach, while the bottom row is Shape Google. For the first query alien, SIHKS confuses it with spider, while HKS confuses it with Santa. This is because these objects also have several long, thin pipelike parts and flat globular parts, and the proportions are similar. The spatial partition separates pipelike parts and globular parts into different subhistograms according to the global spatial position, thus resulting in a more descriptive representation. In particular, the Santa and alien models share similar body shape, but the shapes of Santa’s hat and the alien’s horns are spatially inverted, even though these two parts are similar in terms of the proportion of primitive geometric elements. For the second query dinosaur, ISPM successfully removes the incorrect results of the gorilla and woman models. But the error with armadillo still remains, which turns out the case as ISPM fails. It is understandable since even humans may incorrectly recognize it at first glance. In terms of global shape structures, the armadillo and dinosaur models are almost isometric. So ISPM considers the semantically correspondent parts as good matchings and compares them by their correspondent regions. Because of their similar geometric details of the four legs and the tail, ISPM is still not able to distinguish between these two shapes.
In addition to comparing shapes at their coarsest level (shapetoshape), ISPM is also able to quantitatively tell the difference at certain detailed levels (patchtopatch).
4.2 TOSCA database
4.3 Strengths and weaknesses of the proposed approach

Strengths: (1) The main advantage of ISPM over BoF is its integration of spatial information in a principled way. (2) ISPM provides a coarse correspondence of shapes.

Weaknesses: (1) A major drawback of ISPM is how to determine an appropriate partition number parameter. Such a limitation also exists in the original paper of Lazebnik’s et al. [4] on SPM. Thus, using too many partitions on the surface tend to degrade the performance of the proposed algorithm, largely because of the mismatching. (2) Unlike the graphbased method, ISPM may still lose the topological information, which is critical in distinguishing between shapes from different classes.
5 Conclusion and future work
We developed an intrinsic version of the SPM, making it suitable for the analysis of deformable 3D shapes. Our construction is based on the isocontours of the second eigenfunction of LB operator on Riemannian manifolds. The proposed partitioning can capture the global shape topological information and provide a deformation invariant representation. Furthermore, the ISPM is able to establish a global correspondence among shapes. It can be used in combination with any dense shape descriptor, e.g., heat kernel signature or scaleinvariant heat kernel signature, and consistently achieves a notable improvement over the BoF model, which only encodes orderless local information.
We plan to extend this work in two directions. First, the spatial pyramid framework offers insights into the success of the different dense shape descriptor in our experiments. Therefore, performing a spatial partitionbased investigation on all the recent spectral descriptors, such as wave kernel signature, may provide very practical instructions for further applications. Second, using the proposed global system of coordinates, intrinsic versions of many other aggregationbased compact representations popular in image analysis, such as Fisher vector, can be designed. We intend to explore these constructions in our future work.
References
 1.Lowe DG (2004) Distinctive image features from scaleinvariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
 2.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893Google Scholar
 3.Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338CrossRefGoogle Scholar
 4.Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178Google Scholar
 5.Fehr J, Streicher A, Burkhardt H (2009) A bag of features approach for 3D shape retrieval. In: Proceedings of the international symposium on advances in visual computing (ISVC’09), pp 34–43Google Scholar
 6.Lian Z, Godil A, Sun X (2010) Visual similarity based 3D shape retrieval using bagoffeatures. In: Proceedings of the IEEE international conference on shape modeling and applications (SMI’10), pp 25–36Google Scholar
 7.Toldo R, Castellani U, Fusiello A (2010) The bag of words approach for retrieval and categorization of 3D objects. Vis Comput 26(10):1257–1268CrossRefGoogle Scholar
 8.Darom T, Keller Y (2012) Scale invariant features for 3D mesh models. IEEE Trans Image Process 5(21):2758–2769MathSciNetCrossRefGoogle Scholar
 9.Liu Y, Zha H, Qin H (2006) Shape topics: a compact representation and new algorithms for 3D partial shape retrieval. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’06), vol 2, pp 2025–2032Google Scholar
 10.Tabia H, Colot O, Daoudi M, Vandeborre JP (2011) Nonrigid 3D shape classification using bagoffeature techniques. In: Proceedings of the IEEE international conference on multimedia and expo (ICME’11), pp 1–6Google Scholar
 11.Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30MathSciNetCrossRefzbMATHGoogle Scholar
 12.Reuter M, Wolter F, Peinecke N (2006) LaplaceBeltrami spectra as ’ShapeDNA’ of surfaces and solids. Comput Aid Des 38(4):342–366CrossRefGoogle Scholar
 13.Bérard P, Besson G, Gallot S (1994) Embedding Riemannian manifolds by their heat kernel. Geom Funct Anal 4(4):373–398MathSciNetCrossRefzbMATHGoogle Scholar
 14.Lévy B (2006) LaplaceBeltrami eigenfunctions: Towards an algorithm that “understands” geometry. In: Proceedings of the IEEE international conference shape modeling and applications, p 13Google Scholar
 15.Sun J, Ovsjanikov M, Guibas L (2009) A concise and provably informative multiscale signature based on heat diffusion. Comput Graph Forum 28(5):1383–1392CrossRefGoogle Scholar
 16.Aubry M, Schlickewei U, Cremers D (2011) The wave kernel signature: a quantum mechanical approach to shape analysis. In: Proceedings of the computational methods for the innovative design of electrical devices, pp 1626–1633Google Scholar
 17.Kokkinos I, Bronstein MM, Yuille A (2012) Dense scale invariant descriptors for images and surfaces. Technical report, INRIAGoogle Scholar
 18.Bronstein AM, Bronstein MM, Guibas LJ, Ovsjanikov M (2011) Shape Google: geometric words and expressions for invariant shape retrieval. ACM Trans Graph 30(1)Google Scholar
 19.Ovsjanikov M, Bronstein AM, Bronstein MM, Guibas LJ (2009) Shape Google: a computer vision approach to isometry invariant shape retrieval. In: Proceedings of the international conference on computer vision workshops (ICCVW’09), pp 320–327 Google Scholar
 20.Kokkinos I, Bronstein MM, Litman R, Bronstein AM (2012) Intrinsic shape context descriptors for deformable shapes. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’12). IEEE, pp 159–166Google Scholar
 21.Sipiran I, Bustos B (2010) A robust 3D interest points detector dased on harris operator. In: Proceedings of the eurographics workshop on 3D object retrieval (3DOR’10), pp 7–14Google Scholar
 22.Zaharescu A, Boyer E, Varanasi K, Horaud R (2009) Surface feature detection and description with applications to mesh matching. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’09). IEEE, pp 373–380Google Scholar
 23.Li X, Godil A (2009) Exploring the BagofWords method for 3D shape retrieval. In: Proceedings of the IEEE international conference image processing (ICIP’09), pp 437–440Google Scholar
 24.RedondoCabrera C, LópezSastre RJ, AcevedoRodriguez J, MaldonadoBascón S (2012) SURFing the point clouds: selective 3D spatial pyramids for categorylevel object recognition. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’12), pp 3458–3465Google Scholar
 25.Bronstein AM, Bronstein MM, Kimmel R (2008) Numerical geometry of nonrigid shapes. Springer, BerlinzbMATHGoogle Scholar
 26.Rosenberg S (1997) The Laplacian on a Riemannian manifold. Cambridge University Press, CambridgeGoogle Scholar
 27.Meyer M, Desbrun M, Schröder P, Barr A (2003) Discrete differentialgeometry operators for triangulated 2manifolds. Vis Math III 3(7):35–57Google Scholar
 28.Wardetzky M, Mathur S, Kälberer F, Grinspun E (2007) Discrete Laplace operators: no free lunch. In: Proceedings of the eurographics symposium on geometry processing (SGP’07), pp 33–37Google Scholar
 29.Rustamov RM (2007) LaplaceBeltrami eigenfunctions for deformation invariant shape representation. In: Proceedings of the eurographics symposium on geometry processing (SGP’07), pp 225–233Google Scholar
 30.Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’07), pp 1–8Google Scholar
 31.Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’06), vol 2, pp 2161–2168Google Scholar
 32.van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intel 32(7):1271–1283CrossRefGoogle Scholar
 33.Kac M (1966) Can one hear the shape of a drum? Am Math Mon 73(4):1–23CrossRefzbMATHGoogle Scholar
 34.Uhlenbeck K (1976) Generic properties of eigenfunctions. Am J Math 98(4):1059–1078MathSciNetCrossRefzbMATHGoogle Scholar
 35.Shi Y, Lai R, Krishna S, Sicotte N, Dinov I, Toga AW (2008) Anisotropic LaplaceBeltrami eigenmaps: Bridging Reeb graphs and skeletons. In: Proceedings of the IEEE computer vision and pattern recognition workshops (CVPRR’08), pp 1–7Google Scholar
 36.Lian Z, Godil A, Bustos B, Daoudi M, Hermans J, Kawamura S, Kurita Y, Lavoué G, Van Nguyen H, Ohbuchi R, Ohkita Y, Ohishi Y, Porikli F, Reuter M, Sipiran I, Smeets D, Suetens P, Tabia H, Vandermeulen D (2013) A comparison of methods for nonrigid 3D shape retrieval. Pattern Recogn 46(1):449–461CrossRefGoogle Scholar
 37.Järvelin K Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the international conference research and development in information retrieval (SIGIR’00), pp 41–48Google Scholar