Software: Symbolic Perseus Author: Pascal Poupart (Copyright 2007) Release Log ----------- 2008-02-27 Release * solvePOMDP.m: all parameters of the code can now be set by passing appropriate arguments to the solvePOMDP function. Type "help solvePOMDP" for a summary of the parameters and their default values. * Belief points can now be sampled with several initial policies: - MDP: MDP policy is simulated by choosing actions based on the true underlying state - QMDP: QMDP policy with look ahead search - random: actions are chosen at random (uniformly) It is also possible to mix the above policies with random actions by specifying a non-zero exploration probability. The default policy is QMDP with a look ahead depth of 1 and exploration probability of 0. * It is possible to specify a number of "rounds" such that the set of belief points is re-sampled after each round. The policy at the end of the previous round is executed (with some exploration probability) to re-sample the belief points. By default, only one round is performed. * The value function is now initialized (by default) to the alpha vectors corresponding to pure strategies (i.e., policies that always execute the same action). * 3 policies can be evaluated: - evalPOMDPpolicyGraph: selects actions based on the policy graph that is computed by symbolic perseus. The value of the initial belief state reported by symbolic perseus is an estimate of the policy graph. The policy graph is the fastest to execute since there is no need for finding the best alpha vector at each belief. - evalPOMDPpolicyNonStationary: selects actions by finding the best alpha vector computed at each iteration of symbolic perseus. This requires a lot of memory since the value function computed at each iteration must be loaded in memory. However, this policy is strictly better than the policy graph. - evalPOMDPpolicyStationary: selects actions by finding the best alpha vector of the last iteration of symbolic perseus. Hence only the value function of the last iteration needs to be loaded in memory, which reduces memory requirements. However, the policy may have a value arbitrarily lower than that of the non-stationary policy or the policy graph. * Various speed ups have been incorporated. * Typos have been corrected in the example "problems/coffee3po.txt" 2007-04-20 Release * Java code now compiles without any warning with javac 1.5.0. * Bug fix in dd_reachableBeliefRegion.m: the set of belief points used to be sampled by always executing action 1 (which is ok for the handwashing problem cppo3.txt, but not for other problems. Belief points are now sampled by executing a random policy that selects an action at random at each step. Thanks to Jesse Hoey for pointing out this problem. * Inclusion of a second POMDP example: the classic coffee problem. Thanks to Jesse Hoey for encoding the problem and dutifully documenting the encoding. It provides an intuitive example of how to encode POMDPs to be solved by Symbolic Perseus. 2007-04-17 Release * Initial Release