Abstract
Over the last years, significant advances have been made in robotic manipulation, but still, the handling of non-rigid objects, such as cloth garments, is an open problem. Physical interaction with non-rigid objects is uncertain and complex to model. Thus, extracting useful information from sample data can considerably improve modeling performance. However, the training of such models is a challenging task due to the high-dimensionality of the state representation. In this paper, we propose Controlled Gaussian Process Dynamical Models (CGPDMs) for learning high-dimensional, nonlinear dynamics by embedding them in a low-dimensional manifold. A CGPDM is constituted by a low-dimensional latent space, with an associated dynamics where external control variables can act and a mapping to the observation space. The parameters of both maps are marginalized out by considering Gaussian Process priors. Hence, a CGPDM projects a high-dimensional state space into a smaller dimension latent space, in which it is feasible to learn the system dynamics from training data. The modeling capacity of CGPDM has been tested in both a simulated and a real scenario, where it proved to be capable of generalizing over a wide range of movements and confidently predicting the cloth motions obtained by previously unseen sequences of control actions.
Similar content being viewed by others
Explore related subjects
Find the latest articles, discoveries, and news in related topics.Avoid common mistakes on your manuscript.
1 Introduction
Robotic cloth manipulation has a wide range of applications, from textile industry to assistive robotics [5, 8, 14, 19, 23, 29]. However, the complexity of cloth behavior results in a high uncertainty in the state transition given a certain action. This uncertainty is what makes manipulating cloth much more challenging than handling rigid objects. Intuitively, learning the cloth’s dynamics is the solution to reduce such uncertainty. In the literature, we can find several cloth models that simulate the internal cloth state [3, 25, 30]. They represent cloth as a mesh of material points and simulate their behavior taking into account physical constraints. However, fitting those models to real data can be a complex task. Moreover, such models need not only to behave similarly enough to the cloth garment, but to have a tractable dimensionality, for computational reasons. As an example, an \(8\times 8\) mesh representing a square towel results in a 192-dimensional manifold. Such dimensionality is unmanageable, not only in terms of computational costs, but also for building a tractable state-action space policy. Such is the case of [4], where simulated results are obtained after hours of computations.
Hence, Dimensionality Reduction (DR) methods can be very beneficial. In [11], linear DR techniques were used for learning cloth manipulation by biasing the latent space projection with each execution’s performance. Nonlinear methods, such as Gaussian Process Latent Variable Models (GPLVMs) [20] have also been applied for this purpose. In [18], GPLVM was employed to project task-specific motor-skills of the robot onto a much smaller state representation, whereas in [13] a GPLVM was also used to represent a robot manipulation policy in a latent space, taking contextual features into account. However, these approaches focus the dimensionality reduction on the robot action characterization, rather than on the manipulated object’s dynamics. Instead, in [17] a GPLVM learns a latent representation of the cloth state from point clouds. However, such approach did not consider the cloth handling task dynamics, limiting the application to quasi-static manipulations.
In this paper, we assume to have recorded data from several cloth motions, as a time-varying mesh of points. To fit such data into a tractable dynamical model, we consider Gaussian Process Dynamical Models (GPDMs), first introduced in [32], which are an extension of the GPLVM structure explicitly oriented to the analysis of high-dimensional time series. GPDMs have been applied in several different fields, from human motion tracking [31, 33] to dynamic texture modeling [35]. In the context of cloth manipulation, GPDMs were adopted in [16] to learn a latent model of the dynamics of a cloth handling task. However, this framework, as it stands, lacks in its structure a fundamental component to correctly describe the dynamics of a system, namely control actions, limiting generalization capacity.
Therefore, we propose here an extension of the GPDM structure, that takes into account the influence of external control actions on the modeled dynamics. We call it Controlled Gaussian Process Dynamical Models (CGPDM). In this new version, control actions directly affect the dynamics in the latent space. Thus, a CGPDM, trained on a sufficiently diverse set of interactions, is able to predict the effects of control actions never experienced before inside a space of reduced dimensions, and then reconstruct high-dimensional motions by projecting the latent state trajectories into the observation space. CGPDM has proved capable of fitting different types of cloth movements, in both a simulated and a real cloth manipulation scenario, and being able to predict the results of control actions never seen during training (example reported in Fig. 1). Finally, we compared two possible CGPDM parameterizations. The first is a straightforward extension of standard GPDM, whereas in the second we propose to employ squared exponential (SE) kernels with automatic relevance determination (ARD) [24] and inhomogeneous linear kernels, together with tunable dynamical map scaling factors, obtaining a better accuracy and generalization, especially in the low-data regime.
To summarize, the main contributions of this article are:
-
The proposal of the CGPDM structure, an extension of the GPDM capable of taking into account the presence of exogenous inputs.
-
The definition of a more rich parameterization able to achieve better accuracy and generalization w.r.t. the standard structure previously employed in the GPDM context.
-
The successful application of the proposed CGPDM to (both simulated and real) dynamic robotic cloth manipulation problems.
The remainder of the paper is structured as follows. Section 2 provides the details of the proposed CGPDM approach. Results obtained by CGPDM in cloth dynamics modeling are described in Sect. 3, both in simulation and in a real case scenario. Finally, the obtained results are discussed in Sect. 4 and conclusions are drawn in Sect. 5.
2 Methods
This section thoroughly describes the proposed method. We start by providing some background notions about the models we build on top: GP, GPLVM, and GPDM (Sect. 2.1). Then, we present the CGPDM (Sect. 2.2), detailing the structure of its latent and dynamics maps. In particular, we present two alternative CGPDM structures: naive and advanced. The first is a straightforward inclusion of exogenous inputs into standard GPDM, while the latter is the proposed CGPDM characterized by a richer parameterization. Finally, we conclude by describing the model training and prediction procedures (Sect. 2.3).
2.1 Background: From GP to GPDM
GPs [27] are the infinite-dimensional generalization of multivariate Gaussian distributions. They are defined as infinite-dimension stochastic processes such that, for any finite set of input locations \({\textbf{x}}_1, ..., {\textbf{x}}_n\), the random variables \(f({\textbf{x}}_1), ..., f({\textbf{x}}_n)\) have joint Gaussian distributions. A GP is defined by its mean function \(m({\textbf{x}})\) and kernel \(k({\textbf{x}}, {\textbf{x}}')\), that must be a symmetric and positive semi-definite function. Usually GPs are denoted as \(f({\textbf{x}}) \sim \mathcal{G}\mathcal{P}(m({\textbf{x}}), k({\textbf{x}}, {\textbf{x}}'))\).
GPs can be used for regression models of the form \(y = f({\textbf{x}}) + \varepsilon \), with \(\varepsilon \) an i.i.d. Gaussian noise, as they provide closed formulae to predict new target \(y^*\), given new input \({\textbf{x}}^*\). GP regression has been widely applied as a data-driven tool for dynamical system identification [15], usually describing each state by its own GP. Nevertheless, such approach struggles to scale to high-dimensional systems. Thus, DR strategies must be considered.
GPLVMs [20, 22] emerged as feature extraction methods that can be used as multiple-output GP regression models. These models, under a DR perspective, associate and learn low-dimensional representations of higher-dimensional observed data, assuming that observed variables are determined by the latent ones. Finally, GPLVMs provide, as a result of an optimization, a mapping from the latent space to the observation space, together with a set of latent variables representing the observed values. However, GPLVMs are not explicitly thought to deal with time series, where a dynamics relate the values observed at consecutive time steps.
Thus, [32] first introduced Gaussian Process Dynamical Models (GPDM), an extension of the GPLVM structure explicitly oriented to the analysis of high-dimensional time series. A GPDM entails essentially two stages: (i) a latent mapping that projects high-dimensional observations to a low-dimensional latent space; (ii) a discrete-time Markovian dynamics that captures the evolution of the time series inside the reduced latent space. GPs are used to model both maps.
2.2 Controlled GPDM
Let us consider a system governed by an unknown dynamics. At each time step t, \({\varvec{u}}_t \in {\mathbb {R}}^E\) represents the applied control action and \({\varvec{y}}_t \in {\mathbb {R}}^D\) the observation. For high-dimensional observation spaces, it could be unfeasible to directly model the evolution of a sequence of observations in response to a series of inputs. For instance, in the case of a robot moving a piece of cloth, we can consider as control actions \({\varvec{u}}_t\) the instantaneous movement of the end-effector, while the observations \({\varvec{y}}_t\) could be the coordinates of a mesh of material points, representing the cloth configuration. In this context, it could be convenient to capture the dynamics of the system in a low-dimensional latent space \({\mathbb {R}}^d\), with \(d<<D\). Let \({\varvec{x}}_t \in {\mathbb {R}}^d\) be the latent state associated with \({\varvec{y}}_t\). We propose to use a variation of the GPDM that keeps into account the influence of control actions, while maintaining the dimensionality reduction properties of the original model. We call it Controlled Gaussian Process Dynamical Model (CGPDM).
A CGPDM consists of a latent map (1) projecting observations \({\varvec{y}}_t\) into latent states \({\varvec{x}}_t\), and a dynamics map (2) that describes the evolution of \({\varvec{x}}_t\), subject to \({\varvec{u}}_t\). We denote the two maps as,
where \({\varvec{n}}_{y,t}\) and \({\varvec{n}}_{x,t}\) are two zero-mean isotropic Gaussian noise processes, while g and h are two unknown functions. Differently from original GPDM, here the latent transition function (2) is also influenced by exogenous control inputs \({\textbf{u}}_t\). Note that we consider \({\varvec{x}}_{t+1} - {\varvec{x}}_t\) to be the output of the CGPDM dynamic map, [33] suggested that this choice can improve latent trajectories smoothness. In the following, we report how we modeled (1) and (2) by means of GPs , while Fig. 2 illustrates the relation assumed by CGPDM between the latent, input, and output spaces along N time steps.
2.2.1 Latent variable mapping
Each component of the observation vector \({\varvec{y}}_t = [y_t^{(1)}, \dots , y_t^{(D)}]^T\) can be modeled a priori as a zero-mean GP that takes as input \({\varvec{x}}_t\), for \(t=1,\dots ,N\). Let \({\textbf{Y}} = [ {\varvec{y}}_1,\dots , {\varvec{y}}_N]^T \in {\mathbb {R}}^{N \times D}\) be the matrix that collects the set of N observations, and \({\textbf{X}} = [ {\varvec{x}}_1,\dots , {\varvec{x}}_N]^T \in {\mathbb {R}}^{N \times d}\) be the matrix of associated latent states. We denote with \({\textbf{Y}}_{:,j}\) the vector containing the j-th components of all the N observations. Then, if we assume that the D observation components are independent variables, the probability over the whole set of observations can be expressed by the product of the D GPs. In addition, if we choose the same kernel function \(k_y(\cdot ,\cdot )\) for each GP, differentiated only through a variable scaling factor \(w_{y,j}^{-2}\), with \(j=1,\dots ,D\), the joint likelihood over the whole set of observations is given by
where \({\textbf{W}}_y=\text {diag}(w_{y,1},\dots , w_{y,D})\), \({\textbf{K}}_y(X)\) is the covariance matrix defined element-wise by \(k_y(\cdot ,\cdot )\). Independence assumption may be relaxed by applying coregionalization models [1], at the cost of greater computational demands. In previous GPDM works [31,32,33], the GPs of the latent map were equipped with an isotropic SE kernel,
with parameters \(\beta _1\) and \(\beta _2\) (with \(\delta ({\varvec{x}}_r,{\varvec{x}}_s)\) we indicate the Kronecker delta). Instead here, we adopt a richer ARD structure for the SE kernel, characterized by a different length-scale for each latent state component:
\(\varvec{\Lambda }_y^{-1} = \text {diag}(\lambda _{y,1}^{-2},\dots ,\lambda _{y,D}^{-2})\) is a positive definite diagonal matrix, which weights the norm used in the SE function, and \(\sigma _y^2\) is the variance of the isotropic noise in (1). The trainable hyper-parameters of the latent map model are then \(\varvec{\theta }_y = \left[ w_{y,1},\dots , w_{y,D}, \lambda _{y,1},\dots ,\lambda _{y,D}, \sigma _y\right] ^T\).
2.2.2 Dynamics mapping
Similarly to Sect. 2.2.1, we can model a priori each component of the latent state difference \({\varvec{x}}_{t+1}-{\varvec{x}}_t = [x_{t+1}^{(1)}-x_t^{(1)}, \dots , x_{t+1}^{(d)}-x_t^{(d)} ]^T\) as a zero-mean GP that takes as input the pair \(({\varvec{x}}_t,{\varvec{u}}_t)\), for \(t=1,\dots ,N-1\).
Let \({\textbf{X}} = [ {\varvec{x}}_1,\dots , {\varvec{x}}_N]^T \in {\mathbb {R}}^{N\times d}\) be the matrix collecting the set of N latent states, we can denote by \({\textbf{X}}_{r:s,i}\) the vector of the i-th components from time step r to time step s, with \(r,s=1,\dots ,N\). We indicate the vector of differences between consecutive latent states along their i-th component with \({\varvec{{\Delta }}}_{:,i} = ({\textbf{X}}_{2:N,i} - {\textbf{X}}_{1:N-1,i})\in {\mathbb {R}}^{N-1}\). \({\varvec{{\Delta }}} = [{\varvec{{\Delta }}}_{:,1},\dots ,{\varvec{{\Delta }}}_{:,d} ]\in {\mathbb {R}}^{(N-1)\times d}\) is the matrix that collects differences along all the components.
Finally, we compactly represent the GP input of the dynamic model as \(\tilde{{\varvec{x}}}_t = [{\varvec{x}}_t^T, {\varvec{u}}_t^T]^T \in {\mathbb {R}}^{d+E}\), and refer to the the matrix collecting \(\tilde{{\varvec{x}}}_t\) for \(t=1,\dots ,N-1\) with \(\tilde{{\textbf{X}}} = \left[ \tilde{{\varvec{x}}}_1,\dots , \tilde{{\varvec{x}}}_{N-1}\right] ^T \in {\mathbb {R}}^{(N-1) \times (d+E)}\). With similar assumptions to the ones made for the latent map, and denoting the common kernel function for all the GPs with \(k_x(\cdot ,\cdot )\), and the different scaling factors with \(w_{x,i}\), for \(i=1,\dots ,d\), the joint likelihood is given by
where \({\textbf{W}}_x=\text {diag}(w_{x,1},\dots ,w_{x,d})\) and \({\textbf{K}}_x(\tilde{{\textbf{X}}})\) is the covariance matrix defined by \(k_x(\cdot ,\cdot )\). In standard GPDM [32], dynamic mapping GPs have been proposed with constant scaling factors \(w_{x,i}=1\) for \(i=1,\dots ,d\), and equipped with a naive kernel resulting from the sum of an isotropic SE and an homogeneous linear function, with only four trainable parameters:
Analogously to the latent mapping case, we decided to adopt the following kernel function,
\(\varvec{\Lambda }_{x}^{-1} = \text {diag}(\lambda _{x,1}^{-2},\dots ,\lambda _{x,d+E}^{-2})\) is a positive definite diagonal matrix, which weights the norm used in the SE component of the kernel. Also \(\varvec{\Phi } = \text {diag}(\phi _{1}^2,\dots ,\phi _{d+E+1}^2) \) is a positive definite diagonal matrix that describes the linear component. \(\sigma _x^2\) is the variance of the isotropic noise in (2). In comparison with (7), the adopted kernel weights differently the various components of the input in both SE and linear part, where the GP input is also extended as \(\left[ \tilde{{\varvec{x}}}_s^T, 1\right] ^T\). The trainable hyper-parameters of the dynamic map model are then \(\varvec{\theta }_x = \left[ w_{x,1},\dots , w_{x,d}, \lambda _{x,1},\dots ,\lambda _{x,d}, \phi _{1},\dots ,\phi _{d+E+1}, \sigma _x\right] ^T\).
In the following, we will refer with naive CGPDM to the model that straightforwardly extends the standard GPDM structure from [32], using its same kernels, (4),(7), and constant scaling factors; while we denote with advanced CGPDM the proposed model characterized by kernels (5),(8) and trainable scaling factors in the dynamical map. Although ARD kernels are commonly adopted in GP regression [27], they were not tested before in GPDMs. Trainable scaling factors constitute a novelty for this kind of model too.
2.2.3 Working with multiple sequences
It is possible to easily extend the CGPDM formulation to P multiple sequences of observations, \({\textbf{Y}}^{(1)}, \dots , {\textbf{Y}}^{(P)}\), and control inputs, \({\textbf{U}}^{(1)}, \dots , {\textbf{U}}^{(P)}\). Let the length of each sequence p, for \(p=1,\dots ,P\), be equal to \(N_p\), with \(\sum _{p=1}^PN_p = N\). Define the latent states associated with each sequence as \({\textbf{X}}^{(1)}, \dots , {\textbf{X}}^{(P)}\). Following the notation of Sect. 2.2.2, define \(\tilde{{\textbf{X}}}^{(1)}, \dots , \tilde{{\textbf{X}}}^{(P)}\), as the sequence of the aggregated matrices of latent states and control inputs, and \({\varvec{\Delta }}^{(1)}, \dots , {\varvec{\Delta }}^{(P)}\) as the difference matrices. Hence, model joint likelihoods can be calculated by using the following concatenated matrices inside (3) and (6): \({\textbf{Y}} = [{\textbf{Y}}^{(1)T}\vert \dots \vert {\textbf{Y}}^{(P)T}]^T\), \({\textbf{X}} = [{\textbf{X}}^{(1)T}\vert \dots \vert {\textbf{X}}^{(P)T}]^T\), \({\varvec{\Delta }} = [{\varvec{\Delta }}^{(1)T}\vert \dots \vert {\varvec{\Delta }}^{(P)T}]^T\) and \(\tilde{{\textbf{X}}} = [\tilde{{\textbf{X}}}^{(1)T}\vert \dots \vert \tilde{{\textbf{X}}}^{(P)T}]^T\). Note that, when dealing with multiple sequences, the number of data points in the dynamic mapping becomes \(N-P\), and expression (6) must change accordingly.
2.3 CGPDM training and prediction
Training a CGPDM entails using numerical optimization techniques to estimate the unknowns in the model, i.e., latent states \({\textbf{X}}\) and the hyper-parameters \(\varvec{\theta }_x,\varvec{\theta }_y\). Latent coordinates \({\textbf{X}}\) are initialized by means of PCA [6], selecting the first d principal components of \({\textbf{Y}}\). A natural approach for training CGPDMs is to maximize the joint log-likelihood \(\text {ln}\;p({\textbf{Y}}\vert {\textbf{X}}) +\text {ln}\;p({\varvec{\Delta }}\vert \tilde{{\textbf{X}}})\) w.r.t. \(\{{\textbf{X}}, \varvec{\theta }_x,\varvec{\theta }_y\}\). To do so, in this work, we adopted the L-BFGS algorithm [9].
The overall loss to be optimized can be written as \({\mathcal {L}} = {\mathcal {L}}_y+ {\mathcal {L}}_x\), with \({\mathcal {L}}_y\) and \({\mathcal {L}}_x\) defined as
In case the CGPDM is trained on multiple sequences of inputs and observations, make sure to employ the aggregated matrices defined in Sect. 2.2.3 when computing loss functions 9-10. It is also necessary to use the factor \(N-P\) instead of \(N-1\) inside the \({\mathcal {L}}_x\) expression. The overall training procedure is represented schematically in Fig. 3.
A trained CGPDM can be used to fulfill two different purposes: (i) map a given new latent state \({\varvec{x}}_t^*\) to the corresponding \({\varvec{y}}_t^*\) in observation space, (ii) predict the evolution of the latent state at the next time step \({\varvec{x}}_{t+1}^*\), given \({\varvec{x}}_{t}^*\) and a certain control \({\varvec{u}}_{t}^*\). The two processes, together, can predict the observations produced by a given series of control actions.
2.3.1 Latent prediction
Given \({\varvec{x}}_t^*\), its corresponding \({\varvec{y}}_t^*\) is distributed as \(p({\varvec{y}}_t^*\vert {\varvec{x}}_t^*, {\textbf{X}}, \varvec{\theta }_y) = {\mathcal {N}}(\varvec{\mu }_y({\varvec{x}}_t^*),v_y({\varvec{x}}_t^*){\textbf{W}}_y^{-2})\), with
where \({\varvec{k}}_y({\varvec{x}}_t^*,{\textbf{X}}) = \left[ k_y({\varvec{x}}_t^*,{\varvec{x}}_1),\dots , k_y({\varvec{x}}_t^*,{\varvec{x}}_N)\right] ^T\).
2.3.2 Dynamics prediction
Given \({\varvec{x}}_t^*\) and \({\varvec{u}}_t^*\), let’s define \(\tilde{{\varvec{x}}}_t^*=[{\varvec{x}}_t^{*T},{\varvec{u}}_t^{*T}]^T\). The probability density of the latent state at the next time step \({\varvec{x}}_{t+1}^*\) is \(p({\varvec{x}}_{t+1}^*\vert \tilde{{\varvec{x}}}_t^*, {\textbf{X}}, \varvec{\theta }_x) = {\mathcal {N}}(\varvec{\mu }_x({\varvec{x}}_t^*),v_x({\varvec{x}}_t^*){\textbf{W}}_x^{-2})\), with
with \({\varvec{k}}_x(\tilde{{\varvec{x}}}_t^*,\tilde{{\textbf{X}}})=\left[ k_x(\tilde{{\varvec{x}}}_t^*,\tilde{{\varvec{x}}}_1)\dots k_x(\tilde{{\varvec{x}}}_t^*,\tilde{{\varvec{x}}}_{N-1})\right] ^T\).
2.3.3 Trajectory prediction
Starting from an initial latent state \({\varvec{x}}_1^*\), one can predict the system evolution over a desired horizon of length \(N_d\), when subject to a given sequence of control actions \({\varvec{u}}_1^*,\dots ,{\varvec{u}}_{N_d-1}^*\). At each time step \(t=1,\dots ,N_d-1\), \({\varvec{x}}_{t+1}^*\) can be sampled from the normal distribution \(p({\varvec{x}}_{t+1}^*\vert \tilde{{\varvec{x}}}_t^*, {\textbf{X}}, \varvec{\theta }_x)\) defined in Sec. 2.3.2. Hence, the generated trajectory in the latent space \({\varvec{x}}_1^*,\dots ,{\varvec{x}}_{N_d}^*\) can be mapped into the associated sequences of observations \({\varvec{y}}_1^*,\dots ,{\varvec{y}}_{N_d}^*\) by considering the previously defined probability distribution \(p({\varvec{y}}_t^*\vert {\varvec{x}}_t^*, {\textbf{X}}, \varvec{\theta }_y)\).
3 Results
We employed the proposed CGPDM to model the high-dimensional dynamics that characterizes the motion of a piece of cloth held by a robotic system. This section reports the results obtained in two sets of experiments: a simulated session (Sect. 3.1) and one conducted on a real setup (Sect. 3.2). We exploited simulation to assess the performance of CGPDM over a wide set of scenarios (different amount of training data, motion ranges, and model structure), while the real-world experiment served as validation over non-synthetic data. The objective of the experiments was to learn the high-dimensional cloth dynamics using CGPDM, in order to make predictions about cloth movements in response to sequences of actions that were not seen during training. In particular, we aimed to evaluate how model prediction accuracy is affected by:
-
the number of data used for training,
-
the oscillation range of the cloth movements,
-
the use of advanced or naive CGPDM structures (as defined in Sect. 2.2).
Such high-dimensional task would be unfeasible to model by standard GP regression without DR. CGPDMs were implemented in PythonFootnote 1, employing PyTorch [26].
3.1 Simulated cloth experiment
In the simulated scenario, we considered a bimanual robot moving a squared piece of cloth by holding its two upper corners, as shown in Fig. 4. The cloth was modeled as an 8\(\times \)8 mesh of material points. We made the assumption that the two upper corner points are attached to the robot’s end-effectors, while the other points move freely following the dynamical model proposed in [12].
In this context, the observation vector is given by the Cartesian coordinates of all the points in the mesh (measured in meters); hence \({\varvec{y}}_t\in {\mathbb {R}}^D\) with \(D=192\). We assumed to control exactly the two robot arms in the operational space, keeping the same orientation and relative distance between the two end-effectors and producing oscillation in the Y-Z plane. Thus, we considered as control actions the differences between consecutive commanded end-effector positions in the Y and Z directions, resulting in a \({\varvec{u}}_t\in {\mathbb {R}}^E\) with \(E=2\).
3.1.1 Data collection
Training and test data were obtained by recording mesh trajectories associated with several types of cloth oscillation, obtained by applying different sequences of control actions. All the considered trajectories start from the same cloth configuration and last 5 seconds. Observations were recorded at 20 Hz, hence \(N=100\) total number of steps for each sequence.
Robot end-effectors move in a coordinate fashion drawing oscillations on the Y-Z plane. Let \({\varvec{u}}_t = \left[ \Delta ee^Y_t, \Delta ee^Z_t\right] ^T\), where \(\Delta ee^Y_t,\) and \(\Delta ee^Z_t,\) indicate the difference between consecutive end-effector commanded positions along the Y and Z axes. Specifically, their values were given by the two following periodic expressions:
Such controls make the end-effectors oscillate on the Y-Z plane of the operational space. The maximum displacement is regulated by A, that we set to 0.01 meters. Parameter \(\gamma \) can be interpreted as the inclination of \({\varvec{u}}_1\) w.r.t. the horizontal, and it loosely defines a direction of the oscillation. \(f_Y\) and \(f_Z\) define the frequencies of the oscillations along Y and Z axes. If they are similar, the end-effectors move mostly along the direction defined by \(\gamma \), if not, they swipe in a broader space.
In order to obtain a heterogeneous set of trajectories for the composition of training and test sets, we collected several movements obtained by choosing in a random fashion the control parameters \(\gamma \), \(f_Y\) and \(f_Z\). Angles \(\gamma \) were uniformly sampled inside a variable range \([-\frac{R}{2},\frac{R}{2}]\) (deg); in the following, we indicate this range with the amplitude of its angular area, R (deg). Instead, frequencies \(f_Y\) and \(f_Z\) were uniformly sampled inside the fixed interval [0.3, 0.6] (Hz). We considered four movement ranges of increasing width, namely \(R\in \{30{^\circ },60{^\circ },90{^\circ },120{^\circ }\}\) (Fig. 5), and collected a specific dataset \({\mathcal {D}}_R\) associated with each range. Every set contains 50 cloth trajectories obtained by applying control actions of the form (15) with 50 different random choices for parameters \(\gamma \), \(f_Y\) and \(f_Z\). From each \({\mathcal {D}}_R\), 10 trajectories were extracted and used as test sets \({\mathcal {D}}_R^{test}\) for the corresponding movement range, while several training sets \({\mathcal {D}}_R^{train}\) were built by randomly picking from the remaining sequences.
3.1.2 Model training
In all the models, we adopted a latent space of dimension \(d=3\), resulting in a dimensionality reduction factor of \(D/d=64\). This d value was chosen empirically after preliminary tests and allows to easily visualize the latent variables behavior in a three-dimensional space, see for instance Fig. 1. Other choices are possible, but such sensitivity analysis is left out of the scope of this experimental analysis.
The objective of the experiment was to evaluate CGPDM prediction accuracy at different movement ranges, and for different amounts of training data. Moreover, we wanted to observe if the use of the proposed advanced CGPDM structure yields a substantial difference in terms of accuracy when compared to the naive model. Consequently, for each considered movement range R, we trained two different sets of CGPDMs, adopting in one the naive structure and in the other the advanced one. Each model in the two sets was trained employing an increasing number of sequences randomly picked from \({\mathcal {D}}_R^{train}\). Specifically, we used five different random combinations of 5, 10, 15 and 20 sequences for each oscillation range (varying each time the random seed). In this way, we were able to reduce the dependencies on the specific training trajectories considered, and to average prediction accuracy over different possible sets of training data.
3.1.3 Model prediction
We used each learned CGPDM to predict the cloth movements when subject to the control actions observed for each test sequence inside \({\mathcal {D}}_R^{test}\), with \(R\in \{30{^\circ },60{^\circ },90{^\circ },120{^\circ }\}\). Let \({\varvec{y}}_t^{(R,k)}\) and \({\varvec{u}}_t^{(R,k)}\) denote, respectively, the observation and control action at time step t of the k-th test trajectory in \({\mathcal {D}}_R^{test}\) (with \(k=1,\dots ,10\)).
For each considered range R, one can follow the procedure of Sect. 2.3.3 and employ the trained CGPDMs to predict the trajectories resulting from the application of \(\{{\varvec{u}}_t^{(R,k)}\}_{t=1}^{N-1}\), for \(k=1,\dots ,10\). Let \({\varvec{x}}_t^{*(R,k)}\) be the predicted latent state at time t, and \({\varvec{y}}_t^{*(R,k)}\) the corresponding predicted observation. As an example, in Fig. 6 we show a sequence of true and predicted cloth configurations for one of the considered test trajectory. Please, refer to the videoFootnote 2 for a clearer visualization of the obtained results.
For every predicted trajectory, we measured the average distance between the real and the predicted mesh points. Figure 7 represents the observed errors by means of boxplots, indicating also the statistical relevance of the naive-advanced difference in each experiment configuration (T-test performed by using the open-source library StatannotationsFootnote 3). Moreover, Table 1 reports the average distances between true and predicted mesh points obtained in the test sets by the different CGPDM configurations in all the movement ranges. Results are expressed in terms of mean and 95% confidence intervals obtained by averaging over the different training sets adopted (all the experiments were repeated five times, using a randomly composed \({\mathcal {D}}_R^{train}\)).
3.2 Real cloth experiment
In this second set of experiments, we tested CGPDM on data collected in a real cloth manipulation scenario. For this purpose, we used a Barrett WAM ArmFootnote 4, whose end-effector consists of a coat rack that can firmly grip a piece of cloth from its corners. The overall setup is depicted in Fig. 8. We controlled the robot’s end-effector in position, recording the resulting movement of the cloth through a motion capture system based on information extracted from an RGBD camera. We combined object detection, image and point cloud processing for segmenting cloth-like objectsFootnote 5, following [7, 28] and [34].
3.2.1 Data collection
As in the simulated scenario, we captured the cloth as an 8\(\times \)8 mesh of points, whose spatial coordinates constitute the observation vector \({\varvec{y}}_t\in {\mathbb {R}}^D\) with \(D=192\).
Control actions were defined following again expressions (15) and commanded to the robot at 100 Hz. Parameters \(f_Y\) and \(f_Z\) were uniformly sampled within [0.2, 0.5] (Hz) and A was set to 0.004 meters. In this experiment, we considered only the \(R=30^\circ \), \(R=60^\circ \), and \(R=90^\circ \) oscillation ranges (\(R=120^\circ \) was excluded because of robot workspace limitations).
The motion capture system could work only at rates lower than 100 Hz, with no guaranteed sampling interval length. Thus, it was necessary to post-process the data to make them ready for modeling. Firstly, motion capture data were smoothed by a moving average filter. Then we interpolated the positions of both the end-effector and the cloth mesh, to obtain two synchronized sequences of observations and control actions, sampled at 20 Hz. For each of the three ranges, we collected 10 trajectories each 3 seconds long.
3.2.2 Model training & prediction
For every considered oscillation range, we trained two sets of CGPDMs, one using the naive and one the advanced model structure. Each set of trajectories is composed of 10 sequences, hence we followed a cross-validation method for training and testing the models. At every range, we trained the models using all the sequences but one, left out for testing, repeating the procedure ten times varying the test sequence each time.
The models were used to predict the cloth movements obtained in response to the control actions of each test trajectory, measuring the average distance between the real and the predicted mesh points. In Fig. 9, we provide a visual representation of the cloth movements, by representing the true and predicted trajectories of a subset of mesh points, in one of the example test cases. Please refer to the video\(^2\) for better visualizing the obtained results. Similarly to the simulated experiment case, Fig. 10 represents the observed errors by means of boxplots and the last row of Table 1 reports the mean distances between true and predicted mesh points obtained in all the considered movement ranges.
4 Discussion
The experimental results obtained in simulation confirm the capacity of CGPDM to capture the cloth dynamics of oscillations along axes Y and Z. When trained with a sufficient amount of data, CGPDMs obtained satisfying results in a variety of movement ranges. Training with only five sequences seems insufficient to properly capture the considered dynamics. When training from 10 to 15 sequences, the observed errors diminish significantly; instead working with 20 training trajectories generate minor signs of over-fitting.
For smaller movement ranges (\(R=30^\circ \) or \(R=60^\circ \)), the reconstructed trajectories of the mesh of points appear similar to the true ones. Conversely, for wider ranges (\(R=90^\circ \) or \(R=120^\circ \)), discrepancies between true and predicted points begin to be more evident, but the CGPDMs are still able to capture the overall movement of the cloth.
Moreover, the proposed advanced CGPDM structure significantly improves accuracy and consistency of the results in the majority of cases, when compared to the naive model. This effect is clearer in a low-data regime and when dealing with wide oscillation ranges.
Finally, results obtained in the real-world experiments confirm the trends observed in the simulated scenario. The advanced CGPDM structure drastically outperforms the naive model that seems unable to cope with the high noise that afflicts the real experimental setup.
5 Conclusion
We presented CGPDM, a modeling framework for high-dimensional dynamics governed by control actions. Essentially, this model projects observations into a latent space of low dimension, where dynamical relations are easier to infer. CGPDMs were applied to a robotic cloth manipulation task, where the observations are the coordinates of the cloth mesh. We tested CGPDMs in both simulated and real experiments. The observed results empirically demonstrate that the proposed advanced CGPDM structure can capture the complex high-dimensional cloth dynamics given a small number of trajectories to learn from by leveraging the data efficiency that characterizes GP-based methods.
In future works, we aim to apply CGPDM within Model-Based Reinforcement Learning algorithms (such as [2, 10]) to automatically learn control policies for high-dimensional systems. Moreover, CGPDM formulation could be extended through the introduction of back constraints [21] to preserve local distances and obtain an explicit formulation of the mapping from the observation to latent space. Finally, the integration of context variables within the CGPDM formulation could permit generalizing over different types of cloth fabric.
Data availability
The datasets generated and analyzed during the current study are available in the open-source code repository cgpdm lib, https://github.com/fabio-amadio/cgpdm lib.
Notes
Code publicly available at https://github.com/fabio-amadio/cgpdm_lib
Videos of the experiments (simulated and real) are available at https://youtu.be/JnqkelnP5-E
Statannotations library available at https://github.com/trevismd/statannotations
Barrett WAM Arm: https://advanced.barrett.com/wam-arm-1
Code publicly available at https://github.com/MiguelARD/cloth_point_cloud_segmentation
References
Alvarez MA, Rosasco L, Lawrence ND et al (2012) Kernels for vector-valued functions: a review. Found Trends Mach Learn 4(3):195–266
Amadio F, Dalla Libera A, Antonello R et al (2022) Model-based policy search using monte carlo gradient estimation with real systems application. IEEE Trans Robot. https://doi.org/10.1109/TRO.2022.3184837
Baraff D, Witkin A (1998) Large steps in cloth simulation. In: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp 43–54
Baraff D, Witkin A (2016) Dexterous manipulation of cloth. Comput Graph Forum 35(2):523–532
Bersch C, Pitzer B, Kammel S (2011) Bimanual robotic cloth manipulation for laundry folding. In: 2011 IEEE/RSJ International conference on intelligent robots and systems, pp 1413–1419, https://doi.org/10.1109/IROS.2011.6095109
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Borràs J, Alenyà G, Torras C (2020) A grasping-centered analysis for cloth manipulation. IEEE Trans Rob 36(3):924–936. https://doi.org/10.1109/TRO.2020.2986921
Byrd RH, Lu P, Nocedal J et al (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Chatzilygeroudis K, Mouret JB (2018) Using parameterized black-box priors to scale up model-based policy search for robotics. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 5121–5128, https://doi.org/10.1109/ICRA.2018.8461083
Colomé A, Torras C (2018) Dimensionality reduction for dynamic movement primitives and application to bimanual manipulation of clothes. IEEE Trans Rob 34(3):602–615. https://doi.org/10.1109/TRO.2018.2808924
Coltraro F, Amorós J, Alberich-Carramiñana M et al (2022) An inextensible model for the robotic manipulation of textiles. Appl Math Model 101:832–858. https://doi.org/10.1016/j.apm.2021.09.013
Delgado-Guerrero JA, Colomé A, Torras C (2020) Contextual policy search for micro-data robot motion learning through covariate gaussian process latent variable models. In: 2020 IEEE/RSJ international conference on intelligent robots and systems, pp 5511–5517
Garcia-Camacho I, Lippi M, Welle MC et al (2020) Benchmarking bimanual cloth manipulation. IEEE Robot Autom Lett 5(2):1111–1118. https://doi.org/10.1109/LRA.2020.2965891
Kocijan J, Girard A, Banko B et al (2005) Dynamic systems identification with gaussian processes. Math Comput Model Dyn Syst 11(4):411–424. https://doi.org/10.1080/13873950500068567
Koganti N, Ngeo JG, Tomoya T, et al (2015) Cloth dynamics modeling in latent spaces and its application to robotic clothing assistance. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp 3464–3469
Koganti N, Tamei T, Ikeda K et al (2017) Bayesian nonparametric learning of cloth models for real-time state estimation. IEEE Trans Rob 33(4):916–931
Koganti N, Shibata T, Tamei T et al (2019) Data-efficient learning of robotic clothing assistance using Bayesian Gaussian process latent variable model. Adv Robot 33(15–16):800–814
Lakshmanan K, Sachdev A, Xie Z, et al (2013) A constraint-aware motion planning algorithm for robotic folding of clothes. In: Experimental Robotics, Springer, pp 547–562
Lawrence N, Hyvärinen A (2005) Probabilistic non-linear principal component analysis with gaussian process latent variable models. J Mach Learn Res 6(11):1783–1816
Lawrence ND, Quinonero-Candela J (2006) Local distance preservation in the gp-lvm through back constraints. In: Proceedings of the 23rd international conference on Machine learning, pp 513–520
Li P, Chen S (2016) A review on gaussian process latent variable models. CAAI Trans Intell Technol 1(4):366–376
Miller S, van den Berg J, Fritz M et al (2012) A geometric approach to robotic laundry folding. Int J Robot Res 31(2):249–267. https://doi.org/10.1177/0278364911430417
Neal RM (2012) Bayesian learning for neural networks. Springer, Berlin
Nealen A, Müller M, Keiser R, et al (2006) Physically based deformable models in computer graphics. In: Computer graphics forum, Wiley Online Library, pp 809–836
Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, 32
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Rother C, Kolmogorov V, Blake A (2004) Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314. https://doi.org/10.1145/1015706.1015720
Sanchez J, Corrales Ramon JA, Bouzgarrou BC et al (2018) Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey. Int J Robot Res 37:688–716. https://doi.org/10.1177/0278364918779698
Terzopoulos D, Platt J, Barr A, et al (1987) Elastically deformable models. In: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pp 205–214
Urtasun R, Fleet DJ, Fua P (2006) 3d people tracking with gaussian process dynamical models. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), IEEE, pp 238–245
Wang JM, Hertzmann A, Fleet DJ (2005) Gaussian process dynamical models. Adv Neural Inf Process Syst 18:1441–1448
Wang JM, Fleet DJ, Hertzmann A (2007) Gaussian process dynamical models for human motion. IEEE Trans Pattern Anal Mach Intell 30(2):283–298
Zhan Q, Liang Y, Xiao Y (2009) Color-based segmentation of point clouds. Laser Scan 38(3):155–161
Zhu Z, You X, Yu S et al (2016) Dynamic texture modeling and synthesis using multi-kernel gaussian process dynamic model. Signal Process 124:63–71
Acknowledgements
We would like to thank Adriá Luque Acera for his help with the data collection in the real-world experiment, and Ce Xu for his useful feedback during code development.
Funding
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially developed in the context of the project CLOTHILDE (“CLOTH manIpulation Learning from DEmonstrations”), which has received funding from ERC under the European Union’s Horizon 2020 research and innovation program (Advanced Grant agreement No 741930). This work has also received funding from project CHLOE-GRAPH (PID2020-118649RB-I00) funded by MCIN/ AEI /10.13039/501100011033.
Author information
Authors and Affiliations
Contributions
FA, Juan ADG and AC conceived the presented idea. FA developed the theory, implemented the code and carried out the numerical experiments. FA took the lead in writing the manuscript. CT supervised the project. All authors provided critical feedback and helped shape the research, analysis and manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Amadio, F., Delgado-Guerrero, J.A., Colomé, A. et al. Controlled Gaussian process dynamical models with application to robotic cloth manipulation. Int. J. Dynam. Control 11, 3209–3219 (2023). https://doi.org/10.1007/s40435-023-01205-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40435-023-01205-6