II INTERNATIONAL SCIENTIFIC CONFERENCE OF YOUNG RESEARCHERS
586
Qafqaz University
18-19 April 2014, Baku, Azerbaijan
USING PARTIALLY OBSERVABLE MARKOV DECISION PROCESS FOR
INDOOR PLANNING AND NAVIGATION IN ROBOTICS
Nadir HAJIYEV, Umit ILHAN
Qafqaz University
n.hajiyev@ictsrcqu.org, umit.ilhan@ictsrcqu.org
AZƏRBAYCAN
It is already part of our life to work with automated systems that enables a set of useful applications for us. A field that
got our attention for development of such applications is robotics. It has a lot of applications for different environments, but
in this thesis we will take only indoor applications into consideration. Reason of writing the thesis is to show results of our
research on using POMDP for indoor navigation and planning problem in robotics. As a result of this research we are going
to develop an application for indoor navigation problem of AzerRobot project.
Planning and navigation is the basic problem in robotics and there are already a lot of proposed solutions for solving the
problem. The Dervish[1], Xavier[2] and other projects have shown good results and had very large impact in finding
optimal solutions for this. But using the robot in indoor environment has some additional problems like similar environment
parameters e.g. - offices, corridors. Also, because of noisy sensor data, changing environment and not being able to fully
observe environment makes it is hard to estimate future steps efficiently. Recent research in several labs shows that using
POMDP for such problem gives very hopeful results.
Before describing proposed solution let's first define POMDP for our environment. POMDP states Partially Observable
Markov Decision Process and formally defined by four parameters S, A, T and R and for each new iteration I
i+1
for S
i+1,
A
i+1,
T
i+1,
R
i+1
is used only S
i,
A
i,
T
i,
R
i
where
S – state space; current state of the robot, perceived sensor data (also known as belief state)
T – transition matrix; set of possible state space transition probabilities pbts(s'|s,a)
A – action; an executed process according to state space and transition matrix
R – reward; a gain from last action (can be positive or negative)
In each iteration or time step robot initially has state space S and it is able to observe possible transitions T, then taking
action A using transition T robot updates it's belief and gains reward R.
As stated in [3] the action in a partially observable environment can be decomposed into a state estimator and policy
where state estimator takes the last belief state S
i
, the most recent action A
i
and the most recent observation as input, and
returns an updated belief state S
i+1
.
Like it is said above there are some problems. Besides the POMDP algorithm most of the problems can be solved by
using appropriate sensors. Generally for distance detection is used sonar sensors. They have some noise, but it is not actual
problem. The actual problem with sonars or other kind of distance sensors is they are not able to perceive the environment
visually, so we got lots of negative reward in our application. Sonars used to measure distance, so robots can't differ a
corridor or open door. To solve this problem needed visual data of the environment, and simple object recognition
algorithm. It is possible to use 2D or 3D cameras for recognition but it is only required to recognize the doors and it can be
done by using any 2D camera an PCA to reduce image dimensionality [4].
Our solution combines the proposed two applications. First is to use camera or other sensor for getting visual data of
environment and second is to process given data with POMDP algorithm. POMDP algorithm already implemented in
simulation and we are going to implement it in AzerRobot project. But as a first stage of the project we use NAO robot
from Aldebaran Robotics and are going to test our algotihm on it.
Dostları ilə paylaş: |