Kernel representations for evolving continuous functions
Authors
- First Online:
- Received:
- Revised:
- Accepted:
DOI: 10.1007/s12065-012-0070-y
- Cite this article as:
- Glasmachers, T., Koutník, J. & Schmidhuber, J. Evol. Intel. (2012) 5: 171. doi:10.1007/s12065-012-0070-y
- 116 Views
Abstract
To parameterize continuous functions for evolutionary learning, we use kernel expansions in nested sequences of function spaces of growing complexity. This approach is particularly powerful when dealing with non-convex constraints and discontinuous objective functions. Kernel methods offer a number of beneficial properties for parameterizing continuous functions, such as smoothness and locality, which make them attractive as a basis for mutation operators. Beyond such practical considerations, kernel methods make heavy use of inner products in function space and offer a well established regularization framework. We show how evolutionary computation can profit from these properties. Searching function spaces of iteratively increasing complexity allows the solution to evolve from a simple first guess to a complex and highly refined function. At transition points where the evolution strategy is confronted with the next level of functional complexity, the kernel framework can be used to project the search distribution into the extended search space. The feasibility of the method is demonstrated on challenging trajectory planning problems where redundant robots have to avoid obstacles.
Keywords
Trajectory planningCollision avoidanceRoboticsKernelsNested function spacesEvolution strategy1 Introduction
The problem of learning continuous functions when dealing with discontinuous objective functions or non-trivial task constraints is tackled in this paper. This is a very general learning problem. For example, it arises naturally when dealing with robotic plants, typically controlled in joint angle space (configuration space), under constraints formulated in Cartesian task space. In this setup, the problem of planning a trajectory towards a target position while avoiding obstacles is considered.
Collision-free movement of robots from an initial position to a goal state in an environment with obstacles is one of the challenging problems in robotics. It can be decomposed into two parts; kinematics of the movement, and control. In situations where a strict separation of these two movement aspects is feasible (when movements are relatively slow), most planning problems involving obstacle avoidance are due to kinematics only, while the problem of trajectory following can be decoupled. When planning a trajectory in a typical robotic application, joint and link positions need to be expressed in joint angle (or translations) coordinates. From a dynamics point of view, these positions are functions of joint torques (or forces) applied to the joints by a controller. In contrast—when decoupling planning and control—the plan can be computed first in a much simpler kinematic model, and the job of the controller is reduced to following the planned trajectory as closely as possible. This allows to solve the inverse kinematics and inverse dynamics problems separately.
In many approaches the two problems are merged into one, where no explicit motion plan is generated first and the dynamic control is executed based on the inputs from the environment and the goal [9, 12, 14, 19, 23, 32]. Several authors focus just on planning [1] and leave the control problem to well established algorithms for optimal control [21, 22]. The present study is in line with this approach.
Solving inverse kinematics optimally becomes intractable when the environment contains obstacles—the complexity of the problem is exponential in the configuration space degrees of freedom. Both problems get even more complex when the environment changes over time (e.g., there are moving targets or obstacles). In such cases, rather than solving the inverse problems analytically, various heuristic approaches are being used [5, 7], such as roadmaps [2, 17], cell decomposition [20], potential fields [18], or other dynamical systems approaches [16]. Hayashi [15] presents a method in which the joint angles are represented by a continuous function of time and the joint index, generating smooth movement plans. In such approaches overfitting is a potential issue and low solution complexity becomes an important goal.
Here, kernel expansions are used to represent the trajectory explicitly as a continuous function of time. Kernel-based learning algorithms [4, 26, 30] have found widespread use in machine learning, especially inspired by the support vector machine algorithm [6, 29]. The general methodology has been extended from binary classification to a wide variety of learning problems, ranging from supervised and unsupervised learning to elaborate statistical tests. Kernel methods offer easy-to-handle representations of continuous functions. At the same time, regularization is omnipresent in kernel approaches with the principal role of overfitting avoidance. These methods typically come with convex loss functions. Thus, efficient training schemes are available, and the uniqueness of the solution allows for a thorough mathematical analysis.
In the present study we leave this solid ground and explore learning of kernel representations under less standard conditions, particularly in non-convex learning problems with discontinuous objective functions, where most established training algorithms break down. The problem is resolved by employing evolutionary algorithms, a class of methods known to handle such challenges well. When dealing with kernel methods, evolution strategies (ES) are of special interest, because they are designed for search and optimization in real-valued parameter spaces. State-of-the-art evolution strategies for black box optimization [3, 8, 13, 24, 31] provide robust and scalable techniques for pursuing global optima.
Modern, flexible, and easy to implement natural evolution strategies (NES) [10, 11, 25, 27, 28, 31] are used as search algorithms for kernel representations of continuous functions of a single variable (time). The efficiency is demonstrated on the problem of finding kinematic plan representations for redundant robot arms movement. Such arms are controlled in joint-angle space, which has a highly non-trivial, non-linear relationship to the three-space configuration of the different limbs and the end-effector.
This paper is organized as follows. The next section provides a short introduction to ES, with an emphasis on a modern class of algorithms called natural ES. The use of kernel methods in machine learning is discussed in Sect. 3. A new method for kernel-based evolutionary learning of trajectories is introduced in Sect. 4. The method comprises an incremental search scheme based on increasing the kernel representation complexity. It uses a slick algorithm that gradually extends the covariance matrix of the search distribution. The method generalizes to evolutionary learning of any continuous function. Experiments with a redundant arm in Euclidean two-space and a human-like 7 DOF arm in Euclidean three-space are described in Sect. 5. The paper concludes with Sect. 7.
2 Evolution strategies
Evolution strategies are evolutionary algorithms specialized for search and optimization in real vector spaces. There exists a wide variety of ES in the literature, refer to [3, 8, 24] for a comprehensive introduction. In canonical form, ES generate their offspring by Gaussian mutations, but other parametric families of search distributions have also been investigated. The different strategies vary widely w.r.t. their selection and recombination schemes. In population-based variants (μ, λ) selection is common, and recombination is often performed globally among all surviving individuals, such as in CMA-ES [13]. However, elitism in the form of hill-climbers and steady-state selection are common as well.
Typical ES are well suited to follow a ‘global trend’ of the fitness function \({f : {\mathbb{R}}^d \to {\mathbb{R}}}\). They identify (local) optima efficiently and with arbitrarily high precision. This requires a continuous adaptation of the search strategy parameters to the local characteristics of the problem at hand. The most important parameters to adapt are center and scale of the search distribution, while adaptation of the full covariance matrix has become state-of-the-art [13, 31]. Strategy adaptation, making mutations that have proven advantageous more probable in the future, is performed either actively or passively by means of mutation and selection.
Important aspects of ES are self-adaptation of the search strategy and invariance under certain transformations of the search space and the fitness function. Most ES are (up to initialization) invariant under translation and scaling, but often also rotations, or even all affine linear transformations. In addition, rank-based selection makes modern ES invariant under monotone fitness transformations.
Natural evolution strategies [10, 11, 25, 27, 28, 31] have been derived as population-based variants and as hill-climbers from the simple and powerful principle to adapt the search distribution in order to optimize (here, minimize) expected fitness by means of natural gradient descent. This general paradigm can be applied to all kinds of search distributions. Gaussians with different subsets of adaptive parameters have been treated in the literature, such as adaptation of the full covariance matrix [10, 11, 27, 28, 31] and diagonal covariance matrices [25].
Fitness shaping [31] is used to normalize the fitness values by shaping them into rank-based utility values \({u_k \in {\mathbb{R}}}\), \(k\in\{1,\ldots,\lambda\}\). For this purpose the individuals are ordered by fitness, with z_{1:λ} denoting the best and z_{λ:λ} denoting the worst offspring.
The xNES algorithm
Input: \({d \in {\mathbb{N}}}\), \({F : {\mathbb{R}}^d \to {\mathbb{R}}}\), \({\mu \in {\mathbb{R}}^d}\), σ > 0, \({{\bf B} \in {\mathbb{R}}^{d \times d}}\) with \(\det({\bf B})=1\) |
whilestopping condition not metdo |
for\(i \in \{1, \ldots, \lambda\}\)do |
\({{\bf s}_i \leftarrow \mathcal{N}({\bf 0}, {\mathbb{I}})}\) |
\({\bf z}_i \leftarrow {\varvec{\mu}} + \sigma {\bf B} \cdot {\bf s}_i\) |
end |
sort {(s_{i}, z_{i})} with respect to F(z_{i}) |
\({\bf g}_{{\varvec{\delta}}} \leftarrow \sum\nolimits_{i=1}^{n} u_i \cdot {\bf s}_i\) |
\({{\bf G}_{{\bf M}} \leftarrow \sum\nolimits_{i=1}^{n} u_i \cdot ({\bf s}_i {\bf s}_i^T - {\mathbb{I}})}\) |
\(g_{\sigma} \leftarrow \hbox{tr}({\bf G}_{{\bf M}})/d\) |
\({{\bf G}_{{\bf B}} \leftarrow {\bf G}_{{\bf M}} - g_{\sigma} \cdot {\mathbb{I}}}\) |
\({\varvec{\mu}}\leftarrow {\varvec{\mu}} + \eta_{{\varvec{\mu}}} \cdot \sigma {\bf B} \cdot {\bf g}_{\varvec{\delta}}\) |
\(\sigma \leftarrow \sigma \cdot \exp(\eta_{\sigma}/2 \cdot g_{\sigma})\) |
\({\bf B} \leftarrow {\bf B} \cdot \exp(\eta_B/2 \cdot {\bf G}_{{\bf B}})\) |
end |
The corresponding hill-climber variant (1+1)-xNES primarily differs in its step size adaptation rule [10]. Success-based utility values (which are rank-based in the sense of (1+1) selection) are used to implement a success-based self-adaptation rule, resulting in a behavior close to Rechenberg’s 1/5-rule [24].
Both the population-based xNES and the hill-climber (1+1)-xNES come with good default settings for population size and learning rates, depending only on the search space dimension. In this sense, they are completely parameter free and applicable to black box problems.
3 Kernel-based machine learning
A kernel-based learning algorithm operates on an architecture that is linear in its parameters: Let \(\{f_1,\ldots,f_n\}\) be a fixed set of basis functions, then \(f=\sum\nolimits_{i=1}^n \alpha_i \cdot f_n\) is linear in the coefficients \({{\varvec{\alpha}}\in {\mathbb{R}}^n}\), while suitably chosen basis functions allow to add the necessary level of complexity to the model.
Typically, classic linear algorithms like linear regression or principal component analysis can be extended to this type of linear function architecture. Besides standard vector space operations, most such algorithms require only an inner product. The resulting learning problem is often convex in \({\varvec{\alpha}}\), allowing for its efficient solution even in high-dimensional cases. All Hilbert space operations on f: addition, scalar multiplication, and inner products, can be expressed in terms of linear combinations of inner products of the basis functions f_{i}. Thus, only a finite number of “basis” inner products needs to be known, which are collected in the Gram matrix \({{\bf K}\in {\mathbb{R}}^{n \times n}}\) with entries \(K_{ij}=\langle f_i, f_j \rangle\). The resulting learning algorithm is then formulated for the coefficients \({\varvec{\alpha}}\), and it depends on the basis functions only in terms of the positive semi-definite “kernel” Gram matrix K.
The basis functions f_{i} usually originate from a kernel function \({k : X \times X \to {\mathbb{R}}}\) given ‘training’ data \(\{x_1,\ldots,x_n\}\), by defining \(f_i(\cdot)=k(\cdot,x_i)\). For an input space X and a positive definite kernel function \({k:X \times X \to {\mathbb{R}}}\), Mercer’s theorem ensures the existence of an only implicitly defined feature (Hilbert) space \(\mathcal{H}\) and a feature map \(\phi:X \to \mathcal{H}\), such that the kernel \(k(x, x')= \langle \phi(x),\phi(x')\rangle\) computes the inner product of the features. Alternatively one can start from an embedding into a Hilbert space with given inner product and define the kernel function as \(k(x,x'):=\langle \phi(x), \phi(x') \rangle\). In this study we are interested in learning functions, and thus an only implicitly defined (function) feature space is not helpful. Thus, we will rely on the latter approach starting from a given set of basis functions with explicitly defined inner product thereon.
Kernel methods are typically non-parametric, since the set of basis functions in use scales naturally with the data at hand. In learning, this high flexibility comes at the danger of easily overfitting the training data. The standard counter measure is regularization, which is most commonly achieved by augmenting the objective function of the learning problem with a so-called complexity penalty. The penalty is introduced by minimizing the squared norm \(\|f\|^2\) of the function f in the Hilbert space. For a fixed set of basis functions with kernel matrix K and a function f with coefficient vector \({\varvec{\alpha}}\) the squared norm can be written as \(\|f\|^2={\varvec{\alpha}}^T {\bf K} {\varvec{\alpha}}\). This relation demonstrates also how computations in the infinite dimensional function Hilbert space can be expressed by finite computations based on pairwise inner products of basis functions.
4 Learning continuous functions
4.1 Indirect kernel encoding
Why should the function be represented as a kernel expansion? In contrast to many other evolutionary search and learning tasks there is no direct encoding of the problem available, since (1) the search space is infinite dimensional and would thus require an infinitely long chromosome, and (2) even if this was possible, continuity of the solution would be hard to maintain in such a representation. This forces the use of an indirect encoding. Many such encodings are well established in the machine learning literature, such as feed-forward neural networks and kernel expansions, which are both known as universal function approximators. Other simple choices are polynomials or Fourier expansions.
The resulting evolutionary search works in epochs, corresponding to the different subspaces. During the search, the sequence of spaces is traversed, starting with relatively simple low-dimensional \(\mathcal{F}_{\ell}\) for small ℓ. The low dimensionality n_{ℓ} of the initial search spaces greatly simplifies the job of the evolution strategy to come up with a coarse initial guess and to put the ES on track towards the optimum. Then, during the process, the algorithm transitions iteratively to more and more complex search spaces, increasing the epoch index ℓ in order to reach a sufficient level of detail. This added flexibility allows for a controlled iterative refinement process, guiding the evolutionary search along a path of solutions of increasing complexity towards the optimum.
Sometimes the right task granularity is known in advance. Then it may be feasible to use this fixed representation instead. However, the beauty of our approach is that it eventually reaches the necessary complexity in any case.
Starting the search in relatively low-dimensional search spaces has a strong regularization effect. However, as the dimension of the search space grows it becomes more important to further regularize the solution. We use the standard two-norm penalty \(\sum\nolimits_{i=1}^d \|f^{(i)}\|^2\) over all components of the function for this purpose.
The choice of basis functions f_{i} is arbitrary to some extent. Natural choices are polynomials (either monomials or a Legendre basis), or Fourier basis functions. However, these choices are global in the sense that changing the corresponding coefficient affects the resulting function in the whole unit interval. For the purpose of evolving first a coarse approximation of the function and adding details in later stages it is more intuitive to use a basis that allows for local modifications. Therefore Gaussian kernels are employed, ranging from relatively broad to arbitrarily peaked (and therefore localized) function primitives. This property of Gaussians is important in this context, because the effect of coefficient mutation is local, and often advantageous.
With this choice the number of basis functions grows exponentially over the epochs. At first glance this may seem to cause computational problems. However, note that other families with polynomial growth of the number of basis functions can be used. Moreover, fast growth of the number of bases allows the algorithm to arrive quickly at the required accuracy. An equivalent point of view is that it takes only a logarithmic number of epochs to arrive at the number of basis functions required for solving a given task.
The above formula (5) together with the basis functions (4) is employed in all experiments presented in the remainder of this paper.
4.2 An illustrative example
The constraint f(0) = x_{0} could be handled by a further term in the fitness function. However, since the initial configuration is known it is easier to encode this term into the basis functions. W.l.o.g. let us assume that x_{0} = (0, 0). Then we modify all basis functions f_{i}(t) by subtracting a constant, resulting in the new basis functions \(\tilde f_i(t) = f_i(t) - f_i(0)\). This way we obtain the property \(\tilde f_i(0) = 0\), and thus f(0) = 0. There are two reasons for not handling the goal position in a similar way: First, an encoding that enforces reaching the target all the time will result in a very large fraction of all solutions to be infeasible, and second, in the more involved robotic setups in the next section this would require solving the inverse kinematics problem.
The constraints (avoiding the walls) can be handled in many different ways. A simple solution is the “death penalty” strategy, which amounts to discarding (and thus never selecting) infeasible individuals. For elitist selection this is equivalent to adding a huge (or infinite) penalty term to the fitness function, which turns the fitness into a discontinuous function.
It takes the algorithm five epochs to solve the problem with high accuracy. This amounts to 2^{5} + 5 − 1 = 36 coefficients per function, and \(2 \cdot 36 = 72\) coefficients in total. The result is shown in Fig. 4 (right). The final trajectory plan is obviously more complex than the coarse initial plan, but it is not over-complicated. Note that the objective term \(\|f(1) - {\bf x}_{\rm target}\|\) does not keep the trajectory from evolving all kinds of bumps in the time interval [0, 1), just by chance, as a by-product of the randomized search. However, the regularization term is an elegant way of avoiding such correct but overly complex solutions.
Now let us pretend that we knew beforehand that five epochs, or 36 basis function, are sufficient for this task. What happens if we run only epoch number five of the algorithm, starting from scratch? We observe two properties of the resulting trajectories: First, the trajectories found are indeed much more complex, since the process does not profit from the implicit regularization of starting with the best solution from the previous epoch. This issue could be fixed by increasing the complexity penalty λ in the fitness function. But more importantly, the algorithm converges into a local optimum in about 50 % of the runs, that is, it does not reach the target location. Thus, iteratively increasing the solution complexity stabilizes the search process.
4.3 Covariance matrix extension
Search in a fixed subspace \(\mathcal{F}_{\ell}\) is relatively straightforward, once an indirect kernel-based function encoding is fixed. At the end of each epoch the evolution strategy has identified a (local) optimum in this space. If the corresponding function is not a sufficiently good solution to the problem at hand then the search space needs to be extended to the next subspace \(\mathcal{F}_{\ell+1}\). Since the subspaces are nested we can reuse previously accumulated knowledge by performing a “warm-start” of the search. Projecting individuals and even the mean of the search distribution from \(\mathcal{F}_{\ell}\) into the extended subspace \(\mathcal{F}_{\ell+1}\) is rather trivial; the corresponding coefficient vectors can simply be padded with zeros.
From an evolutionary algorithms point of view the transition from a set \(\{f_1,\ldots,f_n\}\) to an extended set \(\{f_1,\ldots,f_n,f_{n+1},\ldots,f_N\}\) of basis functions is a non-trivial step. In this situation the coefficient vector \({\varvec{\alpha}}=(\alpha_1,\ldots,\alpha_n)\) is extended to \({\varvec{\alpha}}^{\prime}=(\alpha_1,\ldots,\alpha_N)\), and thus the xNES search distribution needs to be extended to the new coefficients. Let \({{\varvec{\mu}} \in {\mathbb{R}}^n}\), σ > 0 and \({{\bf B} \in {\mathbb{R}}^{n \times n}}\) denote distribution mean, standard deviation, and normalized covariance factor of the search distribution over \({\varvec{\alpha}}\), and let \({{\varvec{\mu}}^{\prime} \in {\mathbb{R}}^N}\), σ′, and \({{\bf B}^{\prime} \in {\mathbb{R}}^{N \times N}}\) denote the extended parameters, describing a search distribution over the extended coefficient vector \({\varvec{\alpha}}^{\prime}\).
Three different approaches for extending the search distribution are considered. The seemingly most naïve approach is to reset the search distribution to a radially symmetrical shape, which amounts to setting the normalized covariance factor B′ to the identity matrix of dimension N. This approach can be reasonable under the assumption that the information contained in the search distribution by the end of the previous epoch is of no or very limited use, for example, because the search distribution encoded by B was too much tailored towards fine tuning. However, the scale σ can be preserved, or truncated to plausible limits.
Alternatively, the evolved covariance matrix for the old coefficients can be kept, which leaves us with the problem of filling in the new entries of the matrix B′. One seemingly canonical way of extending the search distribution is to pick the parameters that minimize the Kullback–Leibler divergence (or any other distance measure) between the old and the extended search distribution. However, because the old distribution can be represented exactly with the new set of coefficients this would amount to not using the new components at all, which does not result in additional flexibility.
- 1.
The distribution center (encoding the currently best solution in (1+1)-xNES) should not change.
- 2.
The covariance of the extended distribution, projected to the subspace spanned by the previously available directions, should also be preserved.
- 3.
In the newly available subspace orthogonal to the previously available directions the covariance matrix should be a multiple of the unit matrix, such as \({\sigma^2 {\mathbb{I}}}\).
The matrices \({\varvec{\Uppi}}\) and \({\varvec{\pi}}\) allow to represent the two different sub-spaces involved in properties 2 and 3. The old search space is the image of the projection, and the component of a vector orthogonal to the old subspace is the vector itself minus its projection.
This covariance matrix extension is the canonical choice in the sense of properties 2 and 3. It can be computed relatively straightforward with simple matrix operations involving the kernel Gram matrix K, the old covariance factor B, and the scaled unit matrix \({\sigma^2 {\mathbb{I}}}\) on the new components.
When adapting multiple functions simultaneously, there is no direct structural connection between coefficients of kernels corresponding to different components. Thus, when evolving a function \({f: [0,1] \to {\mathbb{R}}^d}\) for d > 1, the kernel matrix K may be replaced by a d-fold block diagonal matrix with the kernel Gram matrix in each block. However, depending on the task, the connection between different components may be clear, in which case this general scheme should be modified accordingly.
4.4 Switching between epochs and representations
A principled ES for search in the nested spaces \(\mathcal{F}_{\ell}\) should provide an automated way of deciding when to switch from an underlying representation to the next more complex one. Intuitively, this should happen whenever better progress can be made on an extended set of basis functions, and at the latest when the evolution strategy converges.
The only reasonable switching criteria that can be derived from the sequence of fitness values, success rates, or the internal state of the ES are related to convergence. Such criteria are typically used as stopping conditions. For example, the algorithm is stopped when the step size σ of xNES falls below some value σ_{min}, or the number of fitness evaluations exceeds a threshold N_{0}. When dealing with nested search spaces the algorithm moves on to the next representation instead of stopping.
The resulting switching behavior is not necessarily the best possible. It may be advantageous to switch the representation earlier in order to avoid the possibly long convergence phase. Thus, we want the algorithm to switch to the next more complex representation as soon as this pays off. It is virtually impossible to find a good switching point by just looking at the history of fitness values or success rates. Instead one can use a trial-and-error procedure, which comes at the cost of a few additional fitness evaluations: Every T generations of the ES we sample a set of λ_{ext} points from the current search distribution, but extended to the next set of basis functions. We obtain fitness values \(\{f^{\rm ext}_1, \ldots, f^{\rm ext}_{\lambda_{\rm ext}}\}\) for the extended representation, and \(\{f^{\rm cur}_1, \ldots, f^{\rm cur}_{\lambda_{\rm ext}}\}\) for the current search distribution, by projecting the search points back to the current set of basis functions. The question whether the extended fitness values systematically outperform the current ones can be answered based on the confidence score of a one-sided Mann-Whitney U-test (also known as Wilcoxon rank sum test), with null hypothesis f^{ext} ≥ f^{cur}. The algorithm decides to switch to the extended set of basis functions if the null hypothesis is rejected at a confidence level of p_{0}, indicating that the extended representation makes consistently better progress, or if the global step size σ falls below a critical threshold σ_{0}, indicating convergence of the ES.
Epoch-based search with iterative refinement of the representation
Input: sequence of basis functions \({(f_i)_{i \in {\mathbb{N}}}}\), |
sequence of cardinalities \({(n_\ell)_{\ell \in {\mathbb{N}}}}\), |
solution dimension d, |
initial search point \({{\varvec{\mu}} \in {\mathbb{R}}^{d \cdot n_\ell}}\), |
initial step size σ, |
fitness function F, |
convergence criterion, e.g. a threshold, |
covariance matrix extension mechanism Extend, e.g., equation (7). |
epoch counter: \(\ell \leftarrow 1\) |
\({{\bf B} \leftarrow {\mathbb{I}}}\) |
whilestopping condition not metdo |
iterate xNES: \(({\varvec{\mu}}, \sigma, {\bf B}) \leftarrow \operatorname{xNES}({\varvec{\mu}}, \sigma, {\bf B}, F)\) |
ifconvergence (e.g., σ < threshold) then |
// start next epoch: |
\(\ell \leftarrow \ell + 1\) |
\({{\varvec{\mu}} \leftarrow \left(\begin{array}{c}{\varvec{\mu}}\\ 0\end{array}\right)\in{\mathbb{R}}^{d \cdot n_\ell}}\) |
optionally truncate the range of sigma: \(\sigma \leftarrow \max\{\sigma, \sigma_0\}\) |
extend the covariance matrix factor: \({\bf B} \leftarrow {\sc Extend}({\bf B})\) |
end |
end |
return\({\varvec{\mu}}\) |
4.5 Comparison to alternative methods
Depending on the context there are many different ways to represent and learn continuous functions. The method of choice depends in particular on properties of the objective function and of eventual constraints. Our approach is particularly useful if alternative, more efficient techniques such as analytic solutions of regression problems, gradient-based methods, or convex optimization are not applicable. Thus, our method is targeted at non-smooth or even discontinuous objective functions and at non-convex constraints. It is even applicable if no analytic expression for the constraints is available at all. Simple constraint handling techniques for evolutionary algorithms, such as resampling and the death-penalty strategy, require only that the feasibility of a search point can be queried (e.g., by a call to a function returning a boolean, similar to the fitness function returning a real number). In such situations most standard methods are not applicable at all, leaving direct search as the only option.
The problem of trajectory planning while avoiding collisions with arbitrary obstacles is of exactly this type. The feasible region is in general non-convex, which can be expressed either with non-convex constraints or with a discontinuous objective function. A robot arm configuration is a function of the joint angles, and therefore trajectory planning naturally takes place in joint angle coordinates. On the other hand, obstacles are represented in task coordinates (typically three-space), and no analytic expression is available in joint angle (configuration) space. The two spaces are connected by the kinematics of the robot, which is a unique and simple-to-compute mapping in the forward direction (joint angles to task space configuration), but inverse kinematics is in general non-unique and much harder to compute. Also, collision detection in task-space is typically conceptually simple and computationally cheap. Our approach requires only forward kinematics and collision detection.
Dynamical systems approaches, such as the potential field method, are a popular alternative for trajectory planning. One advantage is that such methods can naturally work online. The potential field dynamics can be defined either in configuration space or in task space. When planning in configuration space, the obstacles need to be transformed into joint angle representations, such that appropriate repellers can be placed. In a non-trivial robotics scenario this is generally infeasible. Dynamics in task space are easy to define. This has the disadvantage that a large share of the complexity of path planning is left to a sub-module that translates movements of the end-effector in task space into joint angles. This amounts to continuously solving (at least a linearized version of) the inverse kinematics problem.
Another conceptual distinction to online methods is as follows: Online methods find exactly one trajectory, by computing the best control output in each step, typically without ever making a “global” movement plan. Our offline method does make such a global movement plan, at the expense of searching a possibly large number of trajectories. These offline simulated trajectories can be thought of as “mental trials”, and execution starts only once a sufficiently good plan is identified. Of course, this offline planning paradigm requires a certain level of continuity of the environment. An online method is more suitable in scenarios that require continuous adaptation of the movement plan during the movement due to quickly or abruptly changing circumstances.
In the light of this discussion our method compares favorably for its minimal prerequisites. However, depending on the application, it may be limited by its computational needs for simulating a possibly large number of candidate trajectories in a forward kinematics model, including collision detection, and by its offline character.
5 Experiments
In this section, our method is evaluated on two different robot planning tasks. Redundant robot arms consisting of a number of linked segments are placed in environments with obstacles. The tasks challenge the algorithm with the difficulty of handling the constraints, a problem that can equivalently be expressed by a non-continuous fitness function, making evolutionary strategies interesting candidates. The method generates solutions with arbitrary precision, from coarse functions represented by few kernels to fine solutions constructed from a large number of kernels.
The complexity of the task can be tuned by adjusting the length of the kinematic chain of the arm as well as by placing an arbitrary number of obstacles in the environment. The obstacles can make the problem really difficult by forming a narrow, curved corridor in a high-dimensional search space that leads to the optimal solution. This is the case in particular in joint space coordinates. The mapping from joint angles to Cartesian task space is highly non-linear, especially for long kinematic chains. Thus, even obstacles with a relatively simple structure in Cartesian space, such as boxes, result in highly curved and typically non-convex boundaries of the feasible region in joint space.
The following sections introduce the robot models, the parameterization of trajectories, collision detection, and the form of the fitness function. Then the actual experiments and results are presented.
5.1 Arm models
The first benchmark uses a kinematics-only variant of an octopus-like highly redundant arm [33]. This two-dimensional plant allows for easy design of interesting benchmarks, and both kinematics and collision detection are easy to implement and very fast to compute. As already mentioned in the introduction we focus on evolution of a kinematic plan, under the assumption that the actual control, involving the dynamics of the arm, can be decoupled from this problem. Using the kinematic simulation only reduces complexity of the simulation and speeds up the objective function evaluation significantly.
In a second experiment this setup is extended to a redundant humanoid robot arm with seven degrees of freedom (7-DOF). A nice side effect of planning in joint space is that our method works unchanged in three-space, without even affecting its complexity.
The three-dimensional 7-DOF arm follows the typical humanoid setup of three DOF in the shoulder, and two DOF in the elbow and the wrist, respectively, with plausible human-like joint limits. In the (initial) zero position the arm is spread out sideways, with the elbow slightly angled. Its kinematics can be computed along the chain with the same recursive procedure as for the two-space arm, for example, by concatenating parameterized 4 × 4 affine transformation matrices.
5.2 Trajectory parameterization
Trajectories are encoded as tuples of functions, with each joint angle being a function of time \(t \in [0, 1]\). This amounts to eight functions (eight joints) for the octopus arm and seven functions for the 7-DOF humanoid arm.
This property also allows for a simple encoding of the constraint that the trajectory needs to start in the given initial arm configuration. It is reflected by the following change in the basis functions: Instead of the Gaussians f_{i} the shifted Gaussians f_{i}′(t) = f_{i}(t) − f_{i}(0) are used, automatically fulfilling the property f(0) = ∑_{i=1}^{n} α_{i}f_{i}′(0) = 0 for all coefficient vectors \({\varvec{\alpha}}\).
5.3 Collision detection
The search aims for movement plans without arm self-collisions and collisions with obstacles. The arm as well as all obstacles are modeled as unions of simple geometries. In the two-dimensional case of the octopus these are line segments, and all geometries in the three-dimensional case are represented as capsules (cylinders with rounded ends). Let S be the number of segments of the robot arm, and let O be the number of simple geometries describing the obstacles. Then a brute-force collision test takes \(\mathcal{O}(S^2 + O \cdot S)\) operations. The number S is bounded a priori, and the test scales linear in the number O of obstacles. In our experiments this number is low enough such that the brute-force test of all possibly colliding pairs is feasible (see also Fig. 5).
Time is discretized into 100 steps, and thus into 101 configurations at times \(\mathcal{T}=\{0, 0.01, 0.02, \ldots, 1\}\). Instead of much more complex continuous time collision detection, this discretization allows us to compute collisions only for fixed configurations, which greatly increases computational efficiency and simplifies the implementation.
5.4 Fitness function
Robot simulation and fitness evaluation for the octopus arm
\(F \leftarrow 0\) |
for\(s \in \{1, \ldots, S\}\)do |
\(f^{(s)}(t)=\sum\limits_{i=1}^n \alpha_{i,s} \cdot \left[ f_i(t)- f_i(0)\right]\) |
\(\sigma_s(f)= \left\{\begin{array}{ll} (\xi_0)_s +((\xi_0)_s - (L_l)_s) \cdot \tanh \left( \frac{f}{(\xi_0)_s - (L_l)_s} \right) & \hbox{for } f \leq 0,\\ (\xi_0)_s + ((L_u)_s - (\xi_0)_s) \cdot \tanh \left( \frac{f}{(L_u)_s - (\xi_0)_s} \right) & \hbox{for } f \geq 0. \end{array}\right.\) |
\(F \leftarrow F + \gamma_c \cdot {\varvec{\alpha}}_{\bf s}^T {\bf K} {\varvec{\alpha}}_{\bf s}\) |
for\(t \in \{0, 0.01, 0.02, \ldots, 1\}\)do |
\(\beta \leftarrow 0\) |
for\(s \in \{1, \ldots, S\}\)do |
\(\xi^{(s)}(t) \leftarrow \sigma_s \left( f^{(s)}(t) \right)\) |
\(\beta \leftarrow \beta + \xi_s(t)\) |
\({\bf x}_s \leftarrow {\bf x}_{s-1} + \Uplambda_s \cdot \left(\begin{array}{l}\cos(\beta) \\ \sin(\beta)\end{array}\right)\) |
\(A_s \leftarrow \hbox{ line segment } [{\bf x}_{s-1}(t),{\bf x}_s(t)]\) |
end |
\(F \leftarrow F + \gamma_{\rm target} \cdot {\bf 1}_{[t_{\rm target}, 1]}(t) \cdot \left\|{\bf x}_S(t)-{\bf x}_{\rm target}(t)\right\|\) |
\(F \leftarrow F + \sum\limits_{s_1=1}^S \left(\sum\limits_{s_2=1}^{s_1-2} C_a\left(d(A_{s_1}(t), A_{s_2}(t))\right)+ \sum\limits_{s_2=s_1+2}^S C_a\left(d(A_{s_1}(t), A_{s_2}(t))\right)\right)\) |
\(F \leftarrow F + \sum\limits_{s=1}^S \sum\limits_{o=1}^O C_o\left(d(M_o, A_s(t))\right)\) |
end |
returnF |
5.5 Setup
The tasks for the two robot arms, consisting of initial configuration, obstacles, and target location, are illustrated in Fig. 5.
The humanoid 7-DOF arm is confronted with the task to reach to a target location that is hidden behind two large wall-like obstacles. The only way to reach the goal location is to carefully reach through a narrow slit in between the two obstacles. The slit is in no way aligned with the robot’s joints. Thus, this task requires a highly coordinated movements of different joints.
5.6 Results
In both experiments we observed differences in fitness between trials due to random effects introduced by the evolution strategy. The experiments indicate that these effects exceed the impact of the methods used for switching between epochs and extending the covariance matrix. We found nearly no impact of the different strategies for extending the covariance matrix, and any sufficiently conservative epoch switching strategy results in qualitatively equivalent trajectories. Thus, in accordance with Occam’s razor, we rely on the simplest such strategy in our experiments, as follows: We switch to the next epoch only when the evolution strategy converges, by checking whether the step size parameter falls below a small threshold, in this case 10^{−14}. For the next epoch we reset this parameter to a more reasonable value of 10^{−8} and extend the covariance matrix factor with the unit matrix.
6 Discussion
The results have a number of implications. First of all, we have shown that iteratively increasing the solution complexity results in more robust solutions, compared to starting with a (fixed) high complexity. Furthermore, increasing the solution complexity in an iterative manner allows the method to figure out the ‘right’ level of complexity for the problem at hand by itself.
The seemingly complex trajectory planning problem for the 7-DOF humanoid arm can be solved with only two coefficients per joint. This illustrates the power of our parameterization of squashing a Gaussian kernel expansion with a sigmoid function between the joint limits. For complex tasks the method indeed profits from an increase in solution complexity.
We have found that the epoch switch and the covariance matrix extension methods can be conservative and simple in practice. This relieves us from tuning sensitive hyper-parameters. We conjecture that the reasons might be as follows: Switching too late is better than switching too early, which is why a conservative switching strategy is more important than an elaborate one. Furthermore, the self-adaptation capability of the evolution strategy can make up for possible sub-optimal covariance matrix extensions. This indicates that for robustness and simplicity of implementation, relatively simple strategies should be used in practice.
The xNES algorithm is designed to follow the global trend of the fitness landscape and to quickly identify a (local) optimum. However, the trajectory planning problem often results in highly multi-modal landscapes, where the arm can get trapped behind an obstacle or in a self-collision. Thus, in principle, our method should be combined with a technique dealing with local optima. In the tasks investigated in this study this turns out to be superfluous, and in general a simple restart strategy should be sufficient as long as the success probability is sufficiently high.
Our method is a powerful tool for trajectory planning under constraints, such as obstacle avoidance. The experiment with a humanoid 7-DOF arm model meets real-time constraints, mostly because the solution can be well approximated at the first level of complexity. However, being based on an evolutionary algorithm, the method can easily be parallelized. Parallelism is even easier to exploit in a multi-start strategy.
7 Conclusion
We propose a principled framework for evolutionary learning in the infinite dimensional search space of continuous functions on the unit interval. This is a very general setting, with many potential applications. Evolutionary algorithms allow for searching in the presence of discontinuous objective functions and/or non-trivial task constraints.
The proposed method of exhausting (a dense subspace of) the space of continuous functions with a nested sequence of finite dimensional sub-spaces makes evolutionary search tractable without loss of conceptual flexibility provided by the full function space.
The method profits from the kernel framework in two ways: First, the squared norm in function space is a natural regularizer avoiding over-complex solutions. Second, the kernel framework provides a canonical way of extending the evolutionary search distribution when iterating over a sequence of nested sub-spaces.
The trajectory planning experiments demonstrate the advantage of iteratively evolving gradually more complex function representations. For a flexible, redundant arm in two-space and a 7-DOF robot arm in three-space the resulting evolution strategy successfully copes with difficult trajectory learning problems involving moving targets and obstacle avoidance.
Acknowledgments
This work was funded through the 7th Framework Programme of the EU under grant number 231576 (STIFF project) and SNF grant 200020-125038/1.