Significant portions of this chapter also appear in [Shkolnik et al., 2009].
5.1
RRT Planning Under Differential Constraints
This section describes the performance of the RRT in the presence of kinodynamic
constraints, with an emphasis on planning for underactuated systems. in state space.
5.1.1
Basic RRT Operation
Recall the basic RRT algorithm discussed in section 1.3.1 and shown in Algorithm 1. The
pattern by which nodes are selected for expansion in an RRT is a function of both the
sampling distribution and the nearest-neighbor distance metric.
Typically, samples are
drawn uniformly over the state space while the most common metric for the nearest-neighbor
selection is the Euclidean distance between points. In this case, the expansion pattern of the
tree is modeled by the Voronoi diagram over the nodes within the tree. The probability of
a node being expanded is directly proportional to the volume of its corresponding Voronoi
region. Nodes that have a larger Voronoi region are more likely to be chosen for expansion
and are referred to as major nodes. In the case of the Euclidean metric, these nodes tend
to lie on the outside of the tree during the initial exploration. Conversely, inner or minor
nodes have smaller Voronoi regions and often lie on the inside of the tree. Once the tree has
explored the state space, these become major nodes as the algorithm begins to fill in the
space. This phenomenon of favoring some nodes over others is referred to as the Voronoi bias,
and yields an initial preference towards the exploration of the state space. Over time, the
Voronoi regions become more uniform in volume and, in the case of planning for holonomic
systems, the likelihood of expanding a node tends toward the sampling distribution [Kuffner
et al., 2000].
5.1.2
Performance with Kinodynamic Constraints
The efficiency by which the vanilla RRT algorithm is able to grow the tree and explore the
space is highly sensitive to the distance metric. In the presence of kinematic constraints
(e.g., joint limits, obstacle collisions) and dynamic constraints (including torque limits, and
the equations of motion), widely-used metrics like the Euclidean distance are a very poor
representation for the true distance between points in the space. For a given state, the
equations of motion and actuator limitations prevent the moving in arbitrary directions in
state space. As a result, the RRT will repeatedly attempt to extend the same nodes without
growing the tree any closer to the sampled region.
Consider the swing-up task for a torque-limited pendulum. The two-dimensional state
space consists of the joint angle and the angular velocity, x = θ, ˙
θ
. The task is to bring
74
Figure 5-1: Example state space trajectory for the torque-limited, single link pendulum. Due
to the torque limits at the shoulder, a swing up phase is necessary to drive the pendulum
from the vertical downward position to the stationary upright position. The two black dots
connected by a segment shows that the Euclidean metric is not a good measure of cost-to-go.
c 2009 IEEE [Shkolnik et al., 2009].
the pendulum from a stationary down position, x
init
= −π/2, 0
, to the fixed upright goal
pose, x
goal
= π/2, 0
. If the torque-limits are severe, then a swing up phase is required
in order to drive the system to most states in the state space, including the goal pose. The
spiral state trajectories in Figure 5-1 demonstrate this behavior as the pendulum is swung
back and forth to reach the goal.
The Euclidean distance metric is ignorant to the fact that the differential constraints on
the pendulum dynamics require a swing up phase to reach most of the state space. Consider
the pair of points connected with a black line, shown in Figure 5-1. The Euclidean distance
metric would suggest that this pair is very close to each other, but the pendulum is unable
to transition between these two parts of state space without swing up. While RRT-based
planners have successfully solved swing up control for the underactuated pendulum [Branicky
and Curtiss, 2002] using the Euclidean metric, it comes at a cost of expanding a large number
of fruitless nodes. This might be tolerable for low-dimensional systems like the single-link
pendulum, but can become intractable for higher-dimensional systems for which sample-
based planners are typically well-suited.
75
5.1.3
RRT Modifications
The motion planning literature contains a number of attempts to improve the performance
in constrained systems. One solution is to use a metric other than the Euclidean distance
that is better-suited to the problem. Perhaps an ideal metric would be the optimal cost-
to-go in terms of the time or energy required to drive the system between a pair of states
[Frazzoli et al., 2002, Glassman and Tedrake, 2010]. Unfortunately, computing the ideal
cost is intractable for most higher-order systems since it is equivalent to optimally solving
the original motion planning problem [LaValle, 2006]. Instead, RRT-based planners can
utilize a sub-optimal heuristic function that has been tuned to the particular problem as
the distance measure. The metric is not as accurate as the optimal cost-to-go, but typically
performs much better than the Euclidean distance. Unfortunately, these metrics rely upon
domain-specific knowledge and are not generalizable across different systems.
Rather than design a system specific metric for planning, another option is to adaptively
learn a suitable metric during planning. Such is the approach of Cheng [Cheng, 2005, Ch.
6], who scores nodes according to their consistency with constraints as well as that of their
children in the tree. These values are updated while building the tree and used to select
nodes during exploration. The RRT extension proposed in [Kim et al., 2005] is similar, in
which the algorithm keeps the history of expansion attempts on nodes, considering failed
attempts as both collisions, as well as repetition of previous expansions. Furthermore, in
place of the Euclidean distance metric, this algorithm calculates the cost-to-go from each
node in the tree toward the sample using a linear approximation of the dynamics for each
node. The approach alleviates some of the problems with the Euclidean distance metric, but
can be computationally expensive to compute, and is sensitive to how well the linearization
approximates the actual cost-to-go.
The dynamic-domain RRT [Yershova et al., 2005, Yershova, 2008] combats futile
oversampling by altering the size of the Voronoi regions for those nodes that are close to
obstacles. By lowering the bias associated with these nodes, the dynamic-domain RRT
reduces the likelihood of drawing samples from regions of the state space that are more
likely to induce collisions. More specifically, the algorithm identifies boundary nodes as
those for which a collision occurred while expanding a sample. The domain of samples
that can then be matched with a boundary node is then restricted to a sphere of fixed
radius in the state space. One limitation of the dynamic-domain RRT is that it must first
encounter collisions before modifying the Voronoi bias. Additionally, it currently supports
only spherical domains centered about the boundary nodes, which do not necessarily reflect
the best sampling regions.
Other modifications to the RRT assume a discretized set of actions. In the RRT-Blossom
algorithm [Kalisiak, 2008], whenever a node is chosen for expansion, all possible (discrete)
actions are attempted, while sparse expansion is encouraged by weeding out any expansions
of the tree if the expanded node is closer to other nodes in the tree other than the parent.
Once a node is visited once, it never needs to be visited again, so redundant exploration
is a non-issue. However, in some problems (for example the pendulum) it does not make
sense to remove child nodes in this way since child nodes will be near other nodes in the
76
tree, unless the integration time step ∆t is very small, which will increase the size of the
resulting tree exponentially. Another variant that assumes discrete actions, but was only
demonstrated on path planning problems without differential constraints, is the Rapidly-
Exploring Random Leafy Tree (RRLT) [Morgan et al., 2004]. Leaf nodes, corresponding to
nodes that are reachable in one time step from the current tree are stored in the tree. The
algorithm picks the closest leaf node to a random sample, converts it to a main node in tree,
and then generates new reachable leaf nodes from this new node.
5.2
Reachability-Guided RRT
The previous section presented an analysis of the shortcomings of existing RRT-based
planners in dealing with systems with differential constraints. In particular, there was an
emphasis on the sensitivity of RRTs to the nearest neighbor metric and the implications that
this has for building the tree.
Figure 5-2:
The reachable set, R(x), for the underactuated pendulum consists of an
approximately linear segment, the endpoints of which are defined by the points in state
space that are achieved by applying the minimum and maximum torques at the shoulder.
c 2009 IEEE [Shkolnik et al., 2009].
77
In this section the Reachability-Guided Rapidly-exploring Random Tree (RG-RRT) is
presented, as an alternative planning strategy for general systems that are subject to
differential constraints. The RG-RRT takes the form of a modified RRT that explicitly
accounts for the limitations of the system dynamics to alter the Voronoi bias so as to
emphasize nodes within the tree that exhibit the greatest contribution towards exploring
the state space. The RG-RRT alleviates the sensitivity to the distance metric and, in turn,
does not require a system-specific metric heuristic. The result is an expansion of the tree
that makes efficient use of the system dynamics to more rapidly reach the goal.
5.2.1
Problem Formulation
Consider the problem of motion planning for a system subject to differential constraints.
Let X denote the state space that applies to kinodynamic systems and let C represent the
general configuration space. The objective is to find a time-varying control input, u(t) ∈
U for t ∈ [0, T ] that drives the system from some initial state, x
init
, to an arbitrary goal
state, x
goal
, in finite time, T . The resulting trajectory, x(t), must be free of collisions, i.e.
x(t) ∈ X
free
where X
free
denotes free space, and satisfy the differential constraints according
to the state transition function,
˙x = f (x, u).
(5.1)
The problem becomes significantly more challenging when considering systems that are
additionally constrained to be underactuated or nonholonomic.
5.2.2
The RG-RRT Algorithm
The Reachability-Guided RRT is derived from the premise that sampling is relatively
inexpensive compared to the process of searching for collision-free trajectories from a node
while satisfying differential constraints. The RG-RRT provides a means for choosing nodes
that have a better chance of yielding consistent trajectories without having to redefine
the metric function.
Instead, the RG-RRT actively constrains the set of nodes under
consideration for nearest-neighbor pairing with a sample to those that are actually able
to expand towards the given sample. Effectively, it applies the Euclidean metric to nodes for
which it is a valid indication of the approximate cost-to-go under the differential constraints.
The RG-RRT does so by considering the reachable region of the state space associated with
each node.
Definition 6. For a state x
0
∈ X , we define its reachable set, R
∆t
(x
0
), to be the set of all
points that can be achieved from x
0
in bounded time, ∆t ∈ [∆T
low
, ∆T
high
], according to the
state equations (5.1) and the set of available control inputs, U .
In the case of the torque-limited single-link pendulum, the reachable set for a state,
assuming a constant time step, is a curved segment. For simplicity, this segment can be
78
(a)
(b)
Figure 5-3: Two consecutive steps in a single iteration of the RG-RRT, as demonstrated for
the underactuated pendulum. In (a), a random sample is drawn from the state space and
paired with the nearest node in the tree and the closest point in its reachable set according to
the Euclidean distance metric. Shown in (b), the algorithm then expands the node towards
the point in the reachable set and adds that point to the tree. c 2009 IEEE [Shkolnik et al.,
2009].
79
approximated as a straight segment, the bounds of which are found by integrating the
dynamics while applying the maximum negative and positive torque, −u
max
and u
max
.
Figure 5-2 depicts the reachable set approximation for a node in the tree. Any state along
the segment can be (approximately) achieved in time ∆t by an allowable control input,
|u| ≤ u
max
.
Algorithm 8 T ← Build-RG-RRT(x
init
)
1:
T ← InitializeTree(x
init
);
2:
R ← ApproxR([ ], T , x
init
);
3:
for k = 1 to K do
4:
RejectSample ← true;
5:
while RejectSample do
6:
x
rand
← RandomState();
7:
[ ], dist
T
← NearestState(x
rand
, T );
8:
r
near
, dist
R
← NearestState(x
rand
, R);
9:
if dist
R
< dist
T
then
10:
RejectSample ← f alse;
11:
x
near
← P arentN ode(r
near
, R, T );
12:
end if
13:
end while
14:
u ← SolveControl(x
near
, x
rand
);
15:
[x
new
, isF easible] ← NewState(x
near
, u);
16:
if isFeasible then
17:
T ← InsertNode(x
new
, T );
18:
R ← ApproxR(R, T , x
new
);
19:
if ReachedGoal(x
new
) then
20:
return T
21:
end if
22:
end if
23:
end for
24:
return [ ]
The structure of the RG-RRT algorithm is outlined in Algorithm 8. Given an initial point
in state space, x
init
, the first step is to initialize the tree with this state as the root node.
Each time a node is added to the tree, the ApproxR() function solves for R
∆t
(x
new
), an
approximation of the set of reachable points in the state space that are consistent with the
differential constraints (5.1). The approximated reachable set for the whole tree is stored in
a tree-like structure, R
∆t
(T ), or simply R for shorthand notation, which contains reachable
set information as well as pointers to the parent nodes for each node in the corresponding
tree, T . For many systems of interest, the approximate bounds of the reachable set, R
hull
,
can be generated by sampling over the limits in the action space, and then integrating
the corresponding dynamics forward. For these systems, assuming a relatively short action
time step, ∆t, integrating actions between the limits results in states that are within, or
80
reasonably close to being within, the convex hull of R
hull
. In such cases it is sufficient
to consider only the bounds of the action set to generate the reachability approximation.
When the dimension of the action space becomes large, it may become more efficient to
approximate the reachable set with a simple geometric function, such as an ellipsoid. We
found that the reachable set approximation does not need to be complete, and even a crude
approximation of the set can vastly improve planner performance in systems with dynamics.
Another benefit of approximating the reachable set by sampling actions is that the resulting
points can be efficiently tested for collisions before they are added to the Reachable set
approximation. This reduces the likelihood of trajectories leaving free space as part of the
exploration phase.
After a node and its corresponding reachable set is added to the tree, a random state-
space sample, x
rand
, is drawn. The NearestState(x
rand
, [T , R]) function compares the
distance from the random sample to either, the nodes of the tree, T , or to the reachable
set of the tree, R. The function returns the closest node, and distance from the node to
the sample. Samples that are closer to the tree, T , rather than the reachable set, R, are
rejected. Sampling continues until a sample is closer to R, at which point, the P arentN ode
function returns the node in T , which is the parent of the closest reachable point in R.
Figure 5-3 demonstrates this process for the pendulum example. The figure shows a
tree grown from the initial state corresponding to the pendulum pointing down, with zero
rotational velocity. The nodes and edges comprising the tree are in blue, while the magenta
points that emanate from each node correspond to the approximation of the Tree’s reachable
set. In Figure 5-3(a), a sample was drawn from the state space. The Euclidean distance
from this sample to each node in the Tree, and to each Reachable node was computed.
This sample was identified as being closest to the Reachable set. The parent of the closest
Reachable node (shown with the dashed green line) is then selected for expansion.
The result of this policy of rejecting samples for which the nearest node in the Tree is
closer than its Reachable set is a change in the Voronoi bias. The RG-RRT allows samples
only from Voronoi regions for which the differential constraints permit the expansion of the
node towards the sample. While samples may be drawn from anywhere within the state
space, only a subset of regions are actually used to grow the tree. This has the effect of
modifying the Voronoi bias to emphasize nodes that are better suited to explore the state
space while growing the tree. Figure 5-4 depicts this phenomenon for the pendulum. The
plot presents the Voronoi regions for each of the nodes in {R, T } and identifies in green the
regions for which drawn samples are deemed as valid. Many of the large Voronoi regions,
associated with outer nodes of the tree, have appropriately been identified as not suitable
for expansion, since it is impossible to grow the tree into these regions in time ∆t under
the differential constraints. The nodes may still be expanded upon, but towards samples
drawn towards the inside of the tree. These nodes, which would otherwise serve as major
nodes with standard RRT-based planners using the Euclidean distance, are instead treated
as minor nodes.
81
(a) 30 nodes
(b) 60 nodes
Figure 5-4: RG-RRT Voronoi diagrams for a pendulum. The blue diagram in (a) corresponds
to the tree after 30 nodes have been added while that in (b) corresponds to the tree with 60
nodes. Magenta dots are discretely sampled reachable points affiliated with the tree. Green
regions are areas where samples are ‘allowed’, and correspond to Voronoi areas associated
with the reachable set. Samples that fall in any grey areas are discarded. Note that these
regions are constantly changing as the tree grows. c 2009 IEEE [Shkolnik et al., 2009].
82
(a) Standard RRT
(b) RG-RRT
Figure 5-5: The trees for the underactuated pendulum after adding 360 nodes for (a) the
standard RRT-based kinodynamic planner and (b) the RG-RRT planner. While the RG-
RRT algorithm has reached the goal state, the RRT-based planner has yet to explore much
of the state space. The RG-RRT converged in 3 seconds.
c 2009 IEEE [Shkolnik et al.,
2009].
83
Upon identifying a suitable node for expansion, the RG-RRT extends the tree from the
node. The SolveControl(x
near
, x
rand
) function defines an action that (with probability
> 0) drives the system from the state towards the sample. Figure 5-3(b) demonstrates this
step where the node has been extended. The new point, x
new
, is added as a node to the tree,
and its reachable set, R x
new
, is computed. The RG-RRT then continues with the next
iteration of the algorithm.
5.3
Results
This section describes results of implementing the RG-RRT on two different systems in
simulation. The first application is to find swing up trajectories for the single-link, torque-
limited pendulum. The second application is to solve for collision-free trajectories for a
simple nonholonomic car driving amongst obstacles. For comparison, both simulations were
also compared against the standard RRT. The two planners used a uniform distribution over
the state space to generate samples and the Euclidean metric to identify nodes for expansion.
Figure 5-6: The final tree generated with the standard kinodynamic RRT-based planner. The
tree consists of over 2300 nodes and required 75 seconds to generate. c 2009 IEEE [Shkolnik
et al., 2009].
84
5.3.1
Swing Up Control for an Underactuated Pendulum
The RG-RRT was utilized to solve for a swing up controller for the underactuated pendulum.
The control authority at the shoulder was limited in magnitude. The system parameters were
set to a mass of m = 1, a length of l = 0.5, a damping coefficient of b = 0.1, gravitational
constant of 9.8, and a maximum torque of |u| ≤ 0.1.
Figure 5-5 compares the trees for the standard RRT and the RG-RRT after 360 nodes
have been expanded. At this point, the RG-RRT algorithm has identified a path to the
goal that is consistent with the differential constraints. In contrast, the planner based upon
the standard RRT has yet to explore much of the state space. Instead, one can see the
aforementioned sensitivity to the Euclidean metric, which has resulted in the expansion of
many major (outer) nodes for which the torque limits allow the tree to only grow inwards.
This is evident in the large number of overlapping branches that extend inward rather than
extending towards the larger Voronoi regions. Referring back to the Voronoi diagrams in
Figure 5-4(a), note that the only valid Voronoi regions for these nodes would be on the inside
of the tree where sampling is consistent with the ability to grow the tree inwards.
Dostları ilə paylaş: |