Spectral Shape Recovery and Analysis Via Data-driven Connections

We introduce a novel learning-based method to recover shapes from their Laplacian spectra, based on establishing and exploring connections in a learned latent space. The core of our approach consists in a cycle-consistent module that maps between a learned latent space and sequences of eigenvalues. This module provides an efficient and effective link between the shape geometry, encoded in a latent vector, and its Laplacian spectrum. Our proposed data-driven approach replaces the need for ad-hoc regularizers required by prior methods, while providing more accurate results at a fraction of the computational cost. Moreover, these latent space connections enable novel applications for both analyzing and controlling the spectral properties of deformable shapes, especially in the context of a shape collection. Our learning model and the associated analysis apply without modifications across different dimensions (2D and 3D shapes alike), representations (meshes, contours and point clouds), nature of the latent space (generated by an auto-encoder or a parametric model), as well as across different shape classes, and admits arbitrary resolution of the input spectrum without affecting complexity. The increased flexibility allows us to address notoriously difficult tasks in 3D vision and geometry processing within a unified framework, including shape generation from spectrum, latent space exploration and analysis, mesh super-resolution, shape exploration, style transfer, spectrum estimation for point clouds, segmentation transfer and non-rigid shape matching. Supplementary Information The online version supplementary material available at 10.1007/s11263-021-01492-6.


Introduction
Constructing compact encodings of geometric shapes lies at the heart of 2D and 3D Computer Vision. While earlier approaches have concentrated on handcrafted representations, with the advent of geometric deep learning Masci et al. 2016), data-driven learned feature encodings have gained prominence. A desirable property in many applications, such as shape exploration and synthesis, is to be able to recover the shape from its (latent) encoding or to control the object deformations in a parametric fashion.  Various data-driven parametric models (Loper et al. 2015;Zuffi et al. 2017;Romero et al. 2017;Pavlakos et al. 2019) and auto-encoder architectures have been designed to solve this problem (Achlioptas et al. 2018;Litany et al. 2018;Mo et al. 2019;Gao et al. 2019). Despite significant progress in this area, the structure of the latent vectors is arduous to control. For example, the dimensions of the latent vectors typically lack a canonical ordering, while invariance to various geometric deformations is often only learned by data augmentation or complex constraints on the intermediate features.
At the same time, a classical approach in spectral geometry is to encode a shape using the set of increasingly ordered eigenvalues (spectrum) of its Laplacian operator. This representation is useful since: (1) it does not require any training, (2) it can be computed on various data representations, such as point clouds or meshes, regardless of sampling density, (3) it enjoys well-known theoretical properties such as a natural ordering of its elements and invariance to isometries, and (4) as shown recently (Cosmo et al. 2019;Rampini et al. 2019), alignment of eigenvalues often promotes near-isometries, Fig. 1 Our spectral reconstruction enables correspondence-free style transfer. Given pose and style "donors" (left and middle columns respectively), we synthesize a new shape with the pose of the former and the style of the latter. The generation is driven by a learning-based eigenvalues alignment (rightmost plots). Our approach handles different resolutions (middle row) and representations (bottom row; the surface underlying the point cloud is for visualization purposes only) which is useful in multiple tasks such as non-rigid shape retrieval and matching problems.
Unfortunately, although encoding shapes via their Laplacian spectra can be straightforward (at least for meshes), the inverse problem of recovering the shape is very difficult. Indeed, it is well-known that certain pairs of non-isometric shapes can have the same spectrum, or in other words "one cannot hear the shape of a drum" (Gordon et al. 1992). At the same time, recent evidence suggests that such cases are pathological and that in practice it might be possible to recover a shape from its spectrum (Cosmo et al. 2019). Nevertheless, existing approaches (Cosmo et al. 2019), while able to deform a shape into another with a given spectrum, can produce highly unrealistic shapes with strong artifacts failing in a large number of cases.
In this paper, we combine the strengths of data-driven latent representations with those of spectral methods. Our key idea is to construct a connection between the space of Laplacian spectra and a learned latent space. This connection allows us to synthesize shapes from either their learned latent codes or their Laplacian eigenvalues, and further provides us with a way to explore the latent space by an intuitive manipulation of eigenvalues. Moreover, we demonstrate that this process endows the latent space with certain desirable properties that are missing in standard auto-encoder architectures. Our shape-from-spectrum solution is very efficient since it requires a single pass through a trained network,  Rampini et al. 2019). Furthermore, our trainable module acts as a proxy to differentiable eigendecomposition, while encouraging geometric consistency within the network. Overall, our key contributions can be summarized as follows: -We propose the first learning-based model to robustly recover shapes from Laplacian spectra in a single pass; -For the first time, we provide a bidirectional connection between learned latent spaces and spectral geometric properties of 3D shapes, giving rise to new tools for the analysis of geometric data; -Our model is general, in that it applies with no modifications to different classes even across different geometric representations and dimensions, and generalizes to representations not available at training time; -Our connections can be applied to different kinds of latent representation, such as the ones provided by autoencoders or from parametric models; -We showcase our approach in multiple applications (e.g., Fig. 1), and show significant improvement over the state of the art; see Fig. 2 for an example.

Related Work
Spectral quantities and in particular the eigenvalues of the Laplace-Beltrami operator provide an informative summary of the intrinsic geometry. For example, closed-form estimates and analytical bounds for surface area, genus and curvature in terms of the Laplacian eigenvalues have been obtained (Chavel 1984). Given these properties, spectral shape analysis has been exploited in many computer vision and computer graphics tasks such as shape retrieval (Reuter et al. 2005), description and matching Aubry et al. 2011;Bronstein et al. 2011;Ovsjanikov et al. 2012), mesh seg-mentation (Reuter 2010), sampling (Öztireli et al. 2010) and compression (Karni and Gotsman 2000) among many others. Typically, the intrinsic properties of the shape are computed from its explicit representation and are used to encode compact geometric features invariant to isometric deformations.
Recently, several works have started to address the inverse problem: namely, recovering an extrinsic embedding from the intrinsic encoding (Boscaini et al. 2015;Cosmo et al. 2019). This is closely related to the fundamental theoretical question of "hearing the shape of the drum" (Kac 1966;Gordon et al. 1992). Although counter-examples have been proposed to show that in certain scenarios multiple shapes might have the same spectrum, there is recent work that proposes effective practical solutions to this problem. In Boscaini et al. (2015) the shape-from-operator method was proposed, aiming at obtaining the extrinsic shape from a Laplacian matrix where the 3D reconstruction was recovered after the estimation of the Riemannian metric in terms of edge lengths. In Corman et al. (2017) the intrinsic and extrinsic relations of geometric objects have been extensively defined and evaluated from both theoretical and practical aspects. The authors revised the framework of functional shape differences (Rustamov et al. 2013) to account for extrinsic structure, extending the reconstruction task to non-isometric shapes and models obtained from physical simulation and animation. Several works have also been proposed to recover shapes purely from Laplacian eigenvalues (Chu and Golub 2005;Aasen et al. 2013;Panine and Kempf 2016) or with mild additional information such as excitation amplitude in the case of musical key design (Bharaj et al. 2015). Most closely related to ours in this area is the recent isopectralization approach introduced in Cosmo et al. (2019), that aims directly to estimate the 3D shape from the spectrum. This approach works well in the vicinity of a good solution but is both computationally expensive and, as we show below, can quickly produce unrealistic instances, failing in a large number of cases in 3D, as shown in Fig. 2 for two examples.
In this paper we contribute to this line of work, and propose to replace the heuristics used in previous methods, such as Cosmo et al. (2019), with a purely data-driven approach for the first time. Our key idea is to design a deep neural network, that both constraints the space of solutions based on the set of shapes given at training, and at the same time, allows us to solve the isospectralization problem with a single forward pass, thus avoiding expensive and error-prone optimization.
We note that a related idea has been recently proposed in Huang et al. (2019) via the so-called OperatorNet architecture. However, that work is based on shape difference operators (Rustamov et al. 2013) and as such requires a fixed source shape and functional maps to each shape in the dataset to properly synthesize a shape. Our approach is based on Laplacian eigenvalues alone, and thus is completely correspondence-free.
Our approach also builds upon the recent work on learning generative shape models. A range of techniques have been proposed using volumetric representations (Wu et al. 2016), parametric models (Loper et al. 2015;Pavlakos et al. 2019;Zuffi et al. 2017), point cloud auto-encoders (Aumentado-Armstrong et al. 2019;Achlioptas et al. 2018), generative models based on meshes and implicit functions (Sinha et al. 2017;Groueix et al. 2018;Litany et al. 2018;Kostrikov et al. 2018;Chen and Zhang 2019), and part structures (Li et al. 2017;Mo et al. 2019;Gao et al. 2019;Wu et al. 2019), among many others. Although generative models, and auto-encoders in particular, have shown impressive performance, the structure of the latent space is typically difficult to control or analyze directly. To address this problem, some methods proposed a disentanglement of the latent space Aumentado-Armstrong et al. 2019) to split it in more semantic regions. Perhaps most closely related to ours in this domain, is the work in Aumentado-Armstrong et al. (2019), where the shape spectrum is used to promote disentanglement of the latent space into intrinsic and extrinsic components, that can be controlled separately. Nevertheless, the resulting network does not allow to synthesize shapes from their spectra.
Extending the studies of these approaches, our work provides the first way to connect the learned latent space to the spectral one, thus inheriting the benefits and providing the versatility of moving across the two representations. This allows our network to synthesize shapes from their spectra, and also to relate shapes with very different input structure (e.g., meshes and point clouds) across a vastness of sampling densities, enabling several novel applications.
This paper is an extended version of the work presented in Marin et al. (2020). Compared to the former version, our contribution is as follows: (i) We investigate different types of latent space, including those generated by an auto-encoder model as well as parametric spaces associated with morphable models, and study different parametrizations thereof; (ii) we include human bodies among the classes of analyzed shapes; (iii) we further develop the tools provided by our model for a meaningful exploration of the latent space, showing how the spectral prior contributes to the interpretability of latent codes, and enabling the disentanglement of intrinsic and extrinsic geometry as a novel application (Sect. 6); (iv) we introduce non-rigid matching as a new application of the shape-from-spectrum paradigm (Sect. 7).

Background
We model shapes as connected 2-dimensional Riemannian manifolds X embedded in R 3 , possibly with boundary ∂X , equipped with the standard metric. On each shape X we consider its positive semi-definite Laplace-Beltrami operator Fig. 3 Reconstruction examples of our shape-from-spectrum pipeline. We show the results obtained with two different inputs: the eigenvalues of the Laplacian discretized with linear FEM, and those of the cubic FEM discretization. The heatmap encodes point-wise reconstruction error, growing from white to dark red Δ X , generalizing the classical notion of Laplacian from the Euclidean setting to curved surfaces. Laplacian spectrum. Δ X admits an eigendecomposition into eigenvalues {λ i } and associated eigenfunctions {φ i } 1 . The Laplacian eigenvalues of X (its spectrum) form a discrete set, which is canonically ordered into a non-decreasing sequence In the special case where X is an interval in R, the eigenvalues λ i correspond to the (squares of) oscillation frequencies of Fourier basis functions φ i . This provides us with a connection to classical Fourier analysis, and with a natural notion of hierarchy induced by the ordering of the eigenvalues. In the light of this analogy, in practice, one is usually interested in a limited bandwidth consisting of the first k eigenvalues; typical values in geometry processing applications range from k = 30 to 100.
Furthermore, the spectrum is isometry-invariant, i.e., it does not change with deformations of the shape that preserve geodesic distances (e.g., changes in pose). Discretization. In the discrete setting, we represent shapes as triangle meshes X = (V , T ) with n vertices V and m triangular faces T ; depending on the application, we will also consider unorganized point clouds. Vertex coordinates in both cases are represented by a matrix X ∈ R n×3 .
The Laplace-Beltrami operator Δ X is discretized as a n×n matrix via the finite element method (FEM) (Ciarlet 2002). In the simplest setting (i.e., linear finite elements), this discretization corresponds to the cotangent Laplacian (Pinkall and Polthier 1993); however, in this paper we consider both quadratic FEM and cubic FEM (see, e.g., (Reuter 2010, Sec. 4.1) for a clear treatment). These yield a more accurate discretization as shown in Fig. 3 Rampini et al. (2019), this comes at virtually no additional cost for our pipeline, as we will show. On point clouds, Δ X can be discretized using the approach described in Clarenz et al. (2004), .

Method
Our main contribution is a deep learning model for recovering shapes from Laplacian eigenvalues. Our model operates in an end-to-end fashion: given a spectrum as input, it directly yields a shape with a single forward pass, thus avoiding expensive test-time optimization. Motivation. Our rationale lies in the observation that shape semantics can be learned from the data, rather than by relying upon the definition of ad-hoc regularizers (Cosmo et al. 2019), often resulting in unrealistic reconstructions. For example, a sheet of paper can be isometrically crumpled or folded into an airplane (see inset figure). Since both embeddings have exactly the same eigenvalues, the desired reconstruction must be imposed as a prior. By taking a datadriven approach, we make our method aware of the "space of realistic shapes", yielding both a dramatic improvement in accuracy and efficiency, and enabling new interactive applications.

Latent Space Connections for Auto-encoders
Our first key contribution is to construct an auto-encoder (AE) neural network architecture, augmented by explicitly modeling the connections between the latent space of the AE and the Laplacian spectrum of the input shape; see Fig. 4 for an illustration of this learning model.
Loosely speaking, our approach can be seen as implementing a coupling between two latent spaces: a learned one that operates on the shape embedding X , and the one provided by the eigenvalues Spec(X ). In the former case, the encoder E is trainable, whereas the mapping X → Spec(X ) is provided via the eigen-decomposition and fixed a priori. Further, we introduce the two coupling mappings π, ρ, trained with a bidirectional loss, to both enable communication across the latent spaces and to tune the learned space by endowing it with structure contained in Spec(X ).
We phrase our overall training loss as follows: Fig. 4 Our network model. The input shape X and its Laplacian spectrum Spec(X ) are passed, respectively, through an AE enforcing X ≈X , and an invertible module (π, ρ) mapping the eigenvalue sequence to a latent vector v. The two branches are trained simultaneously, forcing v to be updated accordingly. The trained model allows to recover the shape purely from its eigenvalues via the composition D(π(Spec(X ))) ≈ X where λ is a vector containing the first k (positive) eigenvalues in Spec(X ), X is the matrix of point coordinates, E is the encoder, D is the decoder (Fig. 4), · F denotes the Frobenius norm, and α = 10 −4 controls the relative strengths of the reconstruction loss X and the spectral term λ . The blocks D, E, π , and ρ are learnable and parametrized by neural networks (see the supplementary material for the implementation details). Eq. (6) enforces ρ ≈ π −1 ; in other words, π and ρ form a translation block between the latent vector and the spectral encoding of the shape. At test time, we recover a shape from a given spectrum Spec(X ) simply via the composition D(π(Spec(X ))) (Sect. 5). For additional applications we refer to Sect. 8. Shape representation. We consider two different settings: triangle meshes in point-to-point correspondence at training time (typical in graphics and geometry processing), and unorganized point clouds without a consistent vertex labeling (typical in 3D computer vision). Auto-encoder architecture. Our model can be built with potentially any AE. In our applications we chose relatively simple ones to deal with meshes and unorganized point clouds, although more powerful generative methods would be equally possible. Remark. Our architecture takes Spec(X ) as an input, i.e., the eigenvalues are not computed at training time. By learning an invertible mapping to the latent space, we avoid expensive backpropagation steps through the spectral decomposition of the Laplacian Δ X . In this sense, the mapping ρ acts as an efficient proxy to differentiable eigendecomposition, which we exploit in several applications below.
Since eigenvalue computation is only incurred as an offline cost, it can be performed with arbitrary accuracy (we use cubic FEM, see Fig. 3 and Table 2) without sacrificing efficiency. We refer to the supplementary material for details about the architecture, both in the case of meshes and

Latent Space Connections for Parametric Models
Our second key idea is to connect the Laplacian spectrum with the space of parameters of a given morphable model. We illustrate this construction in Fig. 5. This approach is similar to the previous one, with two important differences: 1) there is no encoder involved in the loop; 2) the latent space is also given as input, i.e., it is not learned during training. As before, we establish the connection between the two given representations by training the networks π and ρ with a bidirectional loss, which is similar to Eq. (6): where all the symbols have the same meaning as in the previous losses. The equation above can be obtained from Eq. (6) by replacing E(X) with v, and replacing the learned encoded representation with a fixed one. Parametric models. We consider two different parametric models, namely, the seminal model SMPL (Loper et al. 2015), and its updated version SMPL-X (Pavlakos et al. 2019). Despite dealing with similar data (human bodies), these two models have very different parametric spaces.

Results and analysis
In this section we report the results on our core application of shape from spectrum recovery, together with an analysis of the various parameters and timing. For each resolution, we independently compute the Laplacian spectrum and use these spectra to recover the shape. Comparison. We compared our method in terms of reconstruction accuracy to the state-of-the-art isospectralization method of Cosmo et al. (2019), as well as to a nearestneighbors baseline, consisting in picking the shape of the training set with the closest spectrum to the target one. In addition, we trained two separate architectures (with and without the ρ block) and compared them. The test without this network component is an ablation study we carry out to validate the importance of the invertible module connecting the spectral encoding to the learned latent codes. The quantitative results are reported in Table 1 as the mean squared error between the reconstructed shape and the ground-truth. Figures 2 and 6 further show qualitative comparisons with the different baselines on different shape representations. In Fig. 6, for the sake of illustration and similarly to Cosmo et al. (2019), Rampini et al. (2019), we also include 2D contours discretized as regular cycle graphs. As the results suggest, the ρ block both contributes to reduce the reconstruction error, and to enable novel applications (we explore them in depth in Sec. 8). Our method achieves a significant improvement over nearest neighbors in terms of accuracy, and an order of magnitude improvement over isospectralization Cosmo et al. (2019). Also, the latter approach consists in an expensive optimization which requires hours to run, while our method is instantaneous at test time.
We perform further experiments on the human bodies category, by training our model on a set of 3014 shapes (in T-Pose) from the SURREAL dataset Varol et al. (2017). The quantitative evaluation is reported on different test sets in the Finally, in Figure 9, we test our model on shapes that are outside the training distribution. In the first row, two target human shapes selected from the SHREC19 benchmark . In the second row, an example on animals for a shape from SHREC20 Dyke et al. (2020). Even if the input geometry is far from the training distribution, our model is able to provide meaningful results that respect the main semantic features of the target shape. For example, with the hippo shape in the bottom row, several features of the target shape are missing in our result, but we are able to retrieve the global geometry and the correct class among the ones present in the training set. We remark that these shapes are very challenging, since they come from different datasets, represent different subjects, different poses, and are discretized with completely different meshes.

Ablation Study
We conducted an in-depth ablation study on the human body category, for which we can easily compare across the different latent spaces introduced in the previous Section. In O k are based on an AE for meshes, P k adopts the AE for pointcloud, VAE exploits a variational AE, S k and S X k use the latent space from the parametric models SMPL and SMPL-X respectively. Parameter k is the dimension of the latent space and the number of strictly positive eigenvalues. The indices 1 and 2 denote the linear and quadratic FEM respectively, otherwise we use the cubic FEM. N N is the baseline that returns the shape in the training set that has the most similar spectrum to the input one. All the results are scaled by 10 5 for easier reading Fig. 7 Shape-from-spectrum reconstruction of a test shape from the SURREAL dataset (Varol et al. 2017). The subject was not seen at training time, and has a different discretization than the training shapes Fig. 8 Shape-from-spectrum reconstruction of a test shape from the FAUST dataset (Bogo et al. 2014). The input mesh has different connectivity as well as different pose from the shapes in the training set Fig. 9 Shape-from-spectrum reconstruction of test shapes outside the training distribution. First row: two human shapes from SHREC19; second row: a hippo from SHREC20. From left to right of each block, the target shape with its overlaid mesh, the target surface, and the output of our model Table 2 we compare different variants of our learning models: -O k is our AE-based model (Fig. 4) for meshes; -P k is the same as O k , but for point clouds; -VAE is a probabilistic variant of our AE-based model, obtained by replacing the deterministic AE with a variational autoencoder with the same architecture; -S k and S X k are based on the parametric models SMPL and SMPL-X, respectively (Fig. 5); -N N is the baseline; for every input spectrum, it outputs the training shape with the most similar spectrum (we use the Euclidean distance).
Parameter k denotes the dimension of the latent space (equal to the number of eigenvalues different from 0). The superscript indices 1 and 2 denote whether the eigenvalues are computed with a linear or quadratic FEM, respectively; in all the other cases, we use cubic FEM. The main difference between the two morphable models is in the dimension of the parametric space: 10 for SMPL and 400 for SMPL-X. For this reason, we can only select k = 10 for SMPL (S 10 ), and different values of k for SMPL-X (S X 15 , S X 30 , S X 60 ). We report the performance of these models in the last 4 columns of Table 2, and refer to the supplementary materials for further details. These comparisons serve to motivate our choice of taking a fully data-driven approach over more straightforward, parametric alternatives. The parametric space provided by the morphable models is given, and not learned, together with the maps ρ and π . Moreover, in this case, the decoding consists of a linear operation in contrast to the non-linear decoder of our network. The lower performance of the parametric model-based solutions show that non-linear operations achieve better results and that it is preferable to learn the latent space together with the bi-directional linkage to the space of spectra. While the training set is fixed, we consider different test sets with an increasing level of difficulty: -SURREAL: 755 shapes from the SURREAL dataset with the same pose and connectivity as the training shapes, but unseen subject; -SURREAL rem: remeshed version of the former, ranging from 25% to 70% of the original number of vertices (see Fig. 7 for an example); -SURREAL uni: remeshed version with uniform density, causing loss of detail for several thin subparts (see the top left shape of Fig. 16 for an example).
In these test sets, all the shapes are in the same pose and the ground truth is available. We measure the mean squared error between the 3D coordinates of the ground truth vertices and those of the shape recovered from the spectrum. Table 2  Finally, we test a probabilistic version of our pipeline involving a basic variational autoencoder (VAE). The resulting model is easily comparable with the other architectures proposed in the paper. Our VAE shares the same architecture of the AE with latent space of size k = 30, and we used cubic FEM for the computation of eigenvalues of the training set. In this case, the training loss becomes:

Number of eigenvalues. The comparison in
where K L = D K L (Q(v|X )|P(v)) is the Kullback-Leibler divergence to promote a Gaussian distribution in the latent space, with Q(v|X ) being the posterior distribution given an input shape X , and P(v) being the Gaussian prior. In the last column of Table 2, we report the results obtained with this model. We note a slight improvement of the reconstruction error on all the considered benchmarks. This result suggests that more complex probabilistic generative models (e.g. exploiting the mesh hierarchy) and additional refinement of our method for applications requiring a high level of accuracy are promising directions for further investigation. Generalization to different data. Finally, we tested on the FAUST dataset (Bogo et al. 2014), which is a data distribution outside of the training data SURREAL. Also in this case, we generated three different test sets: FAUST, FAUST rem and FAUST uni (last 3 rows of Table 2). These shapes are registrations of real human bodies, and are far from the ones seen at training time in terms of pose and subject (see Fig. 8 for an example). The task here is to evaluate the generalization capabilities of our model; given as input the eigenvalues of a FAUST shape in arbitrary pose, we aim to recover the FAUST shape in T-pose by using our model trained on SUR-REAL data. For the evaluation, we are given the ground-truth correspondence between the shapes from FAUST and SUR-REAL, and use it to compute the metric distortion between the two. This different error measure motivates the different error scales in the last three rows of Table 2. However, qualitatively the reconstructions are still accurate, as shown in Fig. 8. This set of experiments shows that an AE-based model trained on SURREAL does not generalize well. In fact, the last 4 columns for the FAUST experiments show better reconstruction accuracy than the others, meaning that our learning model based on a parametric latent space (S and S X) is preferable in an out-of-distribution scenario.
On the other hand, the AE-based model is more appropriate whenever the input spectra are sampled from the same distribution as the training data, which is characteristic of encoder-decoder models. This is confirmed by the SUR-REAL tests in the Table, where O 30 outperforms all the SMPL-X based models by a large gap.

Timing and Implementation Details
The experiments were run on a i9-9820X 3.30GHz CPU, with 32GB of RAM and a RTX 2080 Ti GPU. In general, the runtime depends on the number of vertices; for the data we used in our tests, on average we observed that an epoch requires 20 to 30 seconds. We used fewer vertices for the PointNet version of the network to compensate the computational cost of Chamfer distance computation. In our configuration, a full training requires 10 to 12 hours without any ad-hoc optimization (e.g., early stopping). Our code is publicly available at https://git.io/JGJWE.

Application: Disentanglement
Our model naturally provides a tool to investigate the relationship between intrinsic and extrinsic geometric properties of the shapes being analyzed. In particular, given a latent vector v representing a shape, our model provides two differentiable maps taking v as input (Fig. 4): -the decoder D between v and the extrinsic geometry of the shape, represented as vertex coordinates V ; -the network ρ, that maps v to the Laplacian spectrum, which is an intrinsic quantity widely used as a proxy for the shape metric.
These two maps allow us to locally separate between extrinsic and intrinsic shape information. Specifically, we can seek for shape deformations directly in the latent space, driven by either D or ρ. We first illustrate this mathematically, and then give concrete examples in the following. Starting from any given latent vector v, we can deform the corresponding shape X by moving v in the direction d that minimizes (or maximizes) the variation in the Laplacian spectrum. This is done by considering the Jacobian matrix of the network ρ, which we call J ρ . The direction d of minimum (maximum) variation of Spec(X ) is then given by the rightsingular vector of J ρ corresponding to the smallest (largest) singular value, as explained in Section 7 of the Supplementary material. Thus, we can take an infinitesimal step along d by the update rule v → v + αd, with small α.
In the case of deformable shapes as the ones of CoMA Ranjan et al. (2018), this results in the ability to continuously deform a shape while keeping its metric unchanged, i.e., to generate isometries. Examples of shapes generated according to this criterion are reported in Figure 10. As we can see, minimizing the spectral variation leads to approximately isometric deformations, resulting in a change of facial expression of the shapes, while maximizing the spectral variation induces a change in both their pose and identity.
Alternatively, we can find the deformation of X that changes the intrinsic metric while preventing its extrinsic distortion from being too large. This means to update v by maximizing the spectral variation and, at the same time, keeping the decoded shape vertices V as constant as possible. Conversely, we could enhance the extrinsic distortion in isometric deformations, in order to obtain more pronounced changes of pose than the ones in Figure 10. Similarly to the previous case, both deformations can be achieved considering J ρ and the Jacobian of the decoder, see Supplementary for the details. Therefore, two additional types of latent space exploration paths driven by the spectral prior are possible: maximum spectral variation plus minimum extrinsic variation and vice-versa. Examples of these latent space explorations on CoMA are reported in Figure 11. They should correspond to the change of pose and change of identity respectively. We stress that such paths should emulate a change of pose/identity in an approximate way, but are not expected to produce high quality shape animations. In fact, we move in the latent space making small steps around the latent vector of an initial shape, but we are not guaranteed to be in the vicinity of a good solution in the first place. More visually pleasant solutions might be achieved via further post-processing in the vertex space.

Application: Shape correspondence
An important application in the field of 3D shape analysis is establishing point-to-point correspondence between objects. In particular, given two shapes X and Y, we aim to find a map T X Y : X − → Y that associates for each point of the first shape a point of the latter. In this application, we exploit two of the main advantages of our method: the capability to recover a geometry from its spectrum, and the natural order of points provided by the decoder. Given two input shapes X and Y with their spectra λ X and λ Y , we can approximate them computing S X = D(π(λ X )) and S Y = D(π(λ Y )). Being the outputs of our network discretized by a common template, we naturally obtain a correspondence between S X and S Y . Given this correspondence, we can solve for the map T X Y in an alternative way: (1) we estimate T X S X and T S Y Y , which are easier to compute; (2) we compose these two maps via T S X S Y that is given by construction; (3) the composition T X S X • T S X S Y • T S Y Y finally yields the desired correspondence. We consider two different settings for this problem. Single-pose matching, where we consider two objects that share the same pose reconstructed by our model; Multi-pose matching, where the two geometries have different poses from the one seen at training time. We show how our approach helps in both these settings. Single-pose In this setting, X , Y and S X , S Y are all in the same pose and location in 3D space, thus we can estab- Paths obtained moving towards the direction of minimum spectral variation. Shapes go through isometries (change of facial expression). On the right, the eigenvalues recomputed on each shape of the row are reported Fig. 11 Latent space exploration using the spectral prior. First row is obtained maximizing intrinsic variation while keeping vertices as constant as possible, resulting in change of identity. Conversely, second row is maximum extrinsic variation and minimum variation of eigenvalues, inducing a change of pose lish a mapping between each input and its reconstruction via nearest-neighbor assignment in 3D. Then, exploiting the common discretization of S X and S Y , we obtain a sparse correspondence between the two original shapes. In the case of meshes, we then extend the sparse matching on all the surface using the functional maps framework (Ovsjanikov et al. 2012), while for point clouds we just propagate it by nearest-neighbor. We remark that we obtain the corre- Table 3 Quantitative evaluation for the non-rigid shape matching application, averaged over 10 shape pairs

Fig. 12
On the left, quantitative evaluation of matching (Kim et al. 2011) between 100 pairs of animals. On the right, a qualitative comparison on texture and segmentation transfer spondence automatically from the spectra of the shapes. We perform a quantitative evaluation on SMAL (Zuffi et al. 2017), testing on 100 non-isometric pairs of animals from different classes. As a baseline we consider ICP (Besl and McKay 1992) to rigidly align the two shapes (100 iterations), followed by nearest-neighbor assignment to obtain a correspondence. Two applications that benefit from our approach are texture and segmentation transfer; we tested them respectively on animals and segmented ShapeNet (Yi et al. 2017). See Fig. 12 and the supplementary for further details. Multi-pose matching We now consider two shapes that do not share the same spatial pose, have a different connectivity, and are also affected by non-rigid deformations. To find a correspondence, we use again the functional maps framework (Ovsjanikov et al. 2012) (FMAP). Such framework entirely relies on the intrinsic geometry of the shapes, and so it is robust to nearly-isometric changes of the subject, however, it suffers in the presence of non-isometric deformations. Here we consider 10 shape pairs (X , Y), where X is one of the 10 different human identities from FAUST, and Y is the SMPL template. Each shape X is non-isometric and in a different pose than Y. With our model, we compute S X as a mesh with the same connectivity and pose of SMPL that is isometric to X , while we let S Y = Y be the SMPL template. Then, we compute the correspondence between X and S X via the FMAP implementation of Nogneng and Ovsjanikov (2017), and obtain a matching between X and Y by composition as explained above. We perform this test while varying two important parameters of FMAP: the number of ground-truth landmarks used as probe functions (2 or 5), and the dimen- Fig. 13 Qualitative comparison of non-rigid shape matching. Top row: The source shape is a different subject than the target, and is in a different pose; the last two columns show the matching results obtained with standard methods. Bottom row: The source shape is recovered from the spectrum of the target shape by using our model O 30 , making the correspondence problem easier to solve. The geodesic error (in cm) is encoded by color, growing from white to dark red sion of the functional correspondence matrix (20 or 100).
To highlight the benefits introduced by our approach, we compare against the baseline obtained applying the framework (Nogneng and Ovsjanikov 2017) directly to the shape pair (X , Y). In the second and third columns of Table 3 we report the results of our method and the baseline respectively. We notice that by producing a more isometric template, we obtain a significant improvement in performance. Furthermore, in the last two columns, we report the results obtained with the ZoomOut refinement algorithm , applied with the parameters proposed in the original paper. This procedure promotes isometric maps, which makes our contribution even more crucial. A qualitative comparison is depicted in Fig. 13.

Additional applications
Our general model enables several additional applications, by exploiting the connection between spectral properties and shape generation. Due to the limited space, we collect in the supplementary materials the details of the training and test sets and the parameters used in our experiments.

Shape exploration
The results of Sects. 5 and 6 suggest that eigenvalues can be used to drive the exploration of the AE's latent space toward a desired direction. Another possibility is to regard the Exploring the space of shapes in real time via manipulation of the spectrum. The low-pass modification (middle) decreases the first 12 eigenvalues of the input shape; the band-pass modification (right) amplifies the last 12 eigenvalues. The damping of low eigenvalues leads to more pronounced geometric features (e.g. longer legs and snout), while amplification of mid-range eigenvalues affects the high-frequency details (e.g. the ears and fingers); see the supplementary video for a wall-clock demo eigenvalues themselves as a parametric model for isometry classes, and explore the "space of spectra" as is typically done with latent spaces. Our bi-directional coupling between spectra and latent codes makes this exploration feasible, as remarked by the following property: Since eigenvalues change continuously with the manifold metric (Bando and Urakawa 1983), a small variation in the spectrum will give rise to a small change in the geometry. We can visualize such variations in shape directly, by first deforming a given spectrum (e.g., by a simple linear interpolation between two spectra) to obtain the new eigenvalue sequence μ, and then directly computing D(π(μ)).
In Fig. 14 we show a related experiment. Here we train the network on 4,430 animal meshes generated with the SMAL parametric model following the official protocol (Zuffi et al. 2017). Given four low-resolution shapes X i as input, we first compute their spectra Spec(X i ), map these to the latent space via π(Spec(X i )), perform a bilinear interpolation of the resulting latent vectors, and finally reconstruct the corresponding shapes. We perform the same experiment on the human bodies category by exploiting the model O 30 . In Fig. 16, we consider two meshes from the SURREAL test set and two shapes from FAUST dataset. All the input shapes have been remeshed with different densities. The linear interpolation of the latent vectors obtained through π produces meaningful intermediate steps encoding the main intrinsic variation of the subjects involved. We remark that the pose variations of a human shape are close to isometric deformations and therefore do not affect the Laplacian spectrum. For this reason, it is not possible to retrieve the pose of a human body from its spectrum. In this spirit, we trained our model Fig. 17 Examples of style transfer. The target style (middle) is applied to the target pose (left) by solving problem (9) and then decoding the resulting latent vector, obtaining the result shown on the right. For each example we also report the corresponding eigenvalue alignment (rightmost plots). The black dotted line is the image of ρ. The numbers in the legend denote the distance from the target "style" spectrum to the source pose and to our generated shape; a small number suggests that the generated shape is a near-isometry of the style target only on shapes in T-Pose, motivating the pose of the interpolation steps in Fig. 16. Furthermore, our method is robust to changes in connectivity, extrinsic pose and embedding (note the rigid rotation between the initial and final input shapes in the second row).
Finally, in Fig. 15 we show an example of interactive spectrum-driven shape exploration for the animals class. Given a shape and its Laplacian eigenvalues as input, we navigate the space of shapes by directly modifying different frequency bands with the aid of a simple user interface. The modified spectra are then decoded by our network in real time. The interactive nature of this application is enabled by the efficiency of our shape from spectrum recovery (obtained in a single forward pass) and would not be possible with previous methods (Cosmo et al. 2019) that rely on costly test-time optimization. We refer to the accompanying video and the supplementary materials for additional illustrations.

Style transfer
As shown in Fig. 1, we can use our trained network to transfer the style of a shape X style to another shape X pose having both a different style and pose. This is done by a search in the latent space, phrased as: Here, the first term seeks a latent vector whose associated spectrum aligns with the eigenvalues of X style ; in other words, we regard style as an intrinsic property of the shape, and exploit the fact that the Laplacian spectrum is invariant to pose deformations. The second term keeps the latent vector close to that of the input pose (we initialize with v init = E(X pose )). We solve the optimization problem by back-propagating the gradient of the cost function of Eq. (9) with respect to v through ρ.
The sought shape is then given by a forward pass on the resulting minimizer. In Fig. 17, we show four examples (others can be found in the supplementary material). We emphasize here that the style is purely encoded in the input eigenvalues, therefore it does not rely on the test shapes being in point-to-point correspondence with the training set. This leads to the following: Property 2 Our method can be used in a correspondencefree scenario. By taking eigenvalues as input, it enables applications that traditionally require a correspondence, but side-steps this requirement.
This observation was also mentioned in other spectrumbased approaches (Cosmo et al. 2019;Rampini et al. 2019). However, the data-driven nature of our method makes it more robust, efficient and accurate, therefore greatly improving its practical utility.

Super-resolution
A key feature that emerges from the experiment in Fig. 14 is the perfect reconstruction of the low-resolution shapes once their eigenvalues are mapped to the latent space via π . This brings us to a fundamental property of our approach: Property 3 Since eigenvalues are largely insensitive to mesh resolution and sampling, so is our trained network. This fact is especially evident when using cubic FEM discretization, as we do in all our tests, since it more closely approximates the continuous setting and is thus much less affected by the surface discretization. Remark. It is worth mentioning that existing methods can employ cubic FEM as well; however, this soon becomes prohibitively expensive due to the differentiation of spectral decomposition required by their optimizations (Cosmo et al. 2019;Rampini et al. 2019).
These properties allow us to use our network for the task of mesh super-resolution. Given a low-resolution mesh as input, our aim is to recover a higher resolution counterpart of Fig. 19 Qualitative and quantitative evaluation of point cloud spectra estimation. On the left we show the qualitative comparison for different samplings on three classes (animals, human faces and objects). We show the eigenvalues estimations alongside the input point cloud (depicted as surface samplings), and the ground truth spectrum (in red). On the last two columns, we report the average cumulative error curves evaluated on the FLAME dataset for the two different distributions (F1 and F2) and on ShapeNet (S) it. Furthermore, while the input mesh has arbitrary resolution and is unknown to the network (and a correspondence with the training models is not given), an additional desideratum is for the new shape to be in dense point-to-point correspondence with models from the training set. We do so in a single shot, by predicting the decoded shape as: X hires = D(π(Spec(X lowres ))) .
This simple approach exploits the resolution-independent geometric information encoded in the spectrum along with the power of a data-driven generative model.
In Fig. 18 we show a comparison with nearest-neighbors between eigenvalues (among shapes in the training set), and the isospectralization method of Cosmo et al. (2019). Since we can exploit the cubic FEM, which is less sensitive to the different resolutions, our solution closely reproduces the high-resolution target. Isospectralization correctly aligns the eigenvalues, but it recovers unrealistic shapes due to ineffective regularization. This phenomenon highlights the following Property 4 Our data-driven approach replaces ad-hoc regularizers, that are difficult to model axiomatically, with realistic priors learned from examples. This is especially important for deformable objects; shapes falling into the same isometry class are often hard to disambiguate without using geometric priors.

Estimating point cloud spectra
As an additional experiment, we show how our network can directly predict Laplacian eigenvalues for unorganized point clouds. This task is particularly challenging due to the lack of a structure in the point set, and existing approaches such as (Clarenz et al. 2004;Belkin et al. 2009) often fail at approximating the eigenvalues of the underlying surface accurately.
The difficulty is even more pronounced when the point sets are irregularly sampled, as we empirically show here. In our case, estimation of the spectrum boils down to the single forward pass: Spec(X ) = ρ(E(X )) . (11) To address this task we train our network by feeding unorganized point clouds as input, together with the spectra computed from the corresponding meshes (which are available at training time). As described in the supplementary materials, for this setting we use a PointNet (Qi et al. 2017) encoder and a fully connected decoder, and we replace the reconstruction loss of Eq. (5) with the Chamfer distance. This application highlights the generality of our model, which can accommodate different representations of geometric data.
We consider two types of point clouds: (1) with similar point density and regularity as in the training set (shown in the supplementary materials), and (2) with randomized nonuniform sampling. We compare the spectrum estimated via ρ(E(X )) to axiomatic methods (Clarenz et al. 2004;Belkin et al. 2009), and to the NN baseline (applied in the latent space); see Fig. 19. The qualitative results are obtained by training on SMAL (Zuffi et al. 2017) (left), COMA (Ranjan et al. 2018) (middle) and ShapeNet watertight (Huang et al. 2018) (right). To highlight its generalization capability, the network trained on COMA is tested on point clouds from the FLAME dataset, while on ShapeNet we consider 4 different classes (airplanes, boats, screens and chairs). We compute the cumulative error curves of the distance between the eigenvalues from the meshes corresponding to the test point clouds. The mean error across all test sets is also reported in the legend. Our method leads to a significant improvement over the closest state-of-the-art baseline (Belkin et al. 2009).

Conclusions
We introduced the first data-driven method for shape generation from Laplacian spectra. Our approach consists in enriching a standard AE with a pair of cycle-consistent maps, associating ordered sequences of eigenvalues to latent codes and vice-versa. This explicit coupling brings forth key advantages of spectral methods to generative models, enabling novel applications and a significant improvement over existing approaches. These maps provide an effective tool for a geometrically meaningful exploration of the latent space, and further allow to disentangle the intrinsic from the extrinsic information of the shapes. Our main limitations are shared with other spectral methods in the computation of a robust Laplacian discretization. Adopting the recent approach (Sharp et al. 2019) for such borderline cases is a promising possibility. Further, while the Laplacian is a classical choice due to its Fourier-like properties, the spectra of other operators with different properties may lead to other promising applications. Finally, considering more complex and structured generative models (e.g. probabilistic or hierarchical ones (Gao et al. 2019)) in our pipeline may give rise to promising directions for further investigation.