Software: Symbolic Perseus
Author: Pascal Poupart (Copyright 2007)
Release Log
-----------
2008-03-17 Release
* Tracing/executing policies: 3 new functions (tracePOMDPpolicyGraph,
tracePOMDPpolicyStationary and tracePOMDPpolicyNonStationary) have been
added to ease the process of executing/tracing policies. Previously, the
execution of a policy was done by looping through some instructions that
required the manual encoding of an unfriendly obsConfig matrix. The new
functions automate this process and provide a user friendly interface to
input the observations.
2008-02-27 Release
* solvePOMDP.m: all parameters of the code can now be set by passing
appropriate arguments to the solvePOMDP function. Type "help solvePOMDP"
for a summary of the parameters and their default values.
* Belief points can now be sampled with several initial policies:
- MDP: MDP policy is simulated by choosing actions based on the true
underlying state
- QMDP: QMDP policy with look ahead search
- random: actions are chosen at random (uniformly)
It is also possible to mix the above policies with random actions by
specifying a non-zero exploration probability. The default policy is QMDP
with a look ahead depth of 1 and exploration probability of 0.
* It is possible to specify a number of "rounds" such that the set of
belief points is re-sampled after each round. The policy at the end of the
previous round is executed (with some exploration probability) to re-sample
the belief points. By default, only one round is performed.
* The value function is now initialized (by default) to the alpha vectors
corresponding to pure strategies (i.e., policies that always execute the
same action).
* 3 policies can be evaluated:
- evalPOMDPpolicyGraph: selects actions based on the policy graph that is
computed by symbolic perseus. The value of the initial belief state
reported by symbolic perseus is an estimate of the policy graph. The
policy graph is the fastest to execute since there is no need for
finding the best alpha vector at each belief.
- evalPOMDPpolicyNonStationary: selects actions by finding the best alpha
vector computed at each iteration of symbolic perseus. This requires a
lot of memory since the value function computed at each iteration must
be loaded in memory. However, this policy is strictly better than the
policy graph.
- evalPOMDPpolicyStationary: selects actions by finding the best alpha
vector of the last iteration of symbolic perseus. Hence only the value
function of the last iteration needs to be loaded in memory, which
reduces memory requirements. However, the policy may have a value
arbitrarily lower than that of the non-stationary policy or the
policy graph.
* Various speed ups have been incorporated.
* Typos have been corrected in the example "problems/coffee3po.txt"
2007-04-20 Release
* Java code now compiles without any warning with javac 1.5.0.
* Bug fix in dd_reachableBeliefRegion.m: the set of belief points used
to be sampled by always executing action 1 (which is ok for the
handwashing problem cppo3.txt, but not for other problems. Belief
points are now sampled by executing a random policy that selects an
action at random at each step. Thanks to Jesse Hoey for pointing out
this problem.
* Inclusion of a second POMDP example: the classic coffee
problem. Thanks to Jesse Hoey for encoding the problem and dutifully
documenting the encoding. It provides an intuitive example of how to
encode POMDPs to be solved by Symbolic Perseus.
2007-04-17 Release
* Initial Release