Brief instructions to install and run symbolicPerseus
-----------------------------------------------------
Author: Pascal Poupart (ppoupart@cs.uwaterloo.ca)
Reference: Chapter 5 of Poupart's PhD thesis (http://www.cs.uwaterloo.ca/~ppoupart/publications/publications.html#phd-thesis)
Installation:
-------------
1) Untar and unzip symbolicPerseus.tgz
tar -xvzf symbolicPerseus.tgz
2) Compile the java code with Sun's java SDK. Use the same version as
Matlab's java virtual machine. To check the java version used by
matlab, type "version -java" at the Matlab prompt. The java code has
been compiled with java 1.3.1 to 1.4.2 without any problem. It also
compiles with java 1.5.0 with some warnings that can be ignored.
cd symbolicPerseus/dd_javaClasses
javac *.java
Launching Matlab
----------------
Launch matlab in the symbolicPerseus directory. This is critical
since matlab will automatically read the files startup.m and
java.opts which tell Matlab's java virtual machine where to find the
java classes (startup.m) and how much memory to allocate
(java.opts).
cd symbolicPerseus
matlab
N.B. Do not launch matlab with the -nojvm option because that will
prevent the java virtual machine from starting up. The code has been
tested with Matlab 7.
Solving POMDPs:
---------------
Run the "solvePOMDP" function by typing
>> [valueFunction,policy] = solvePOMDP('problems/cppo3.txt')
It first parses the POMDP 'cppo3.txt' in the "explicit SPUDD format".
The explicit SPUDD format requires that all values of all variables be
specified (i.e., no shortcut allowed). Then some reachable belief
states are sampled with dd_reachableBelRegion.m. This may take a few
minutes. Then the POMDP is solved by running the dd_boundedPerseus.m
function. Again, this may take a few minutes per iteration. The
value function and policies are automatically saved in the file
'problems/cppo3_1000bel_30iter.mat' indicating that 1000 belief states were
sampled and 30 Perseus iterations were performed.
Evaluating a policy:
--------------------
To evaluate a policy, run the evalPOMDP policy function by typing
>> avRew = evalPOMDPpolicy('problems/cppo3.txt', 'problems/cppo3_1000bel_30iter.mat')
It averages the total discounted reward earned over 500 runs of 50 steps
each. The average of the runs performed so far is reported after each run.
Executing a policy:
-------------------
1) Parse a POMDP file
>> ddPOMDP = parsePOMDP('cppo3.txt');
2) Load a policy and value function
>> load problems/cppo3_1000bel_30iter.mat;
3) Query a policy
>> [actId,actName] = queryPolicy(belState, valueFunction, policy, ddPOMDP)
The argument "ddPOMDP" comes from parsing the POMDP file in step 1.
The arguments "valueFunction" and "policy" are obtained in step 2 when
loading a policy file. The argument "belState" can be set to
ddPOMDP.initialBelState (initial belief state) the first time or to the
updated belief state computed in step 4.
4) Update a belief state
>> nextBelState = beliefUpdate(ddPOMDP, belState, actId, obsConfig);
The argument "ddPOMDP" comes from parsing the POMDP file in step 1.
The argument "belState" is the current belief state. The argument
"actId" is the action selected in step 3. The argument "obsConfig" is
an observation. ObsConfig should be a 2xN matrix such that the first
row lists the ids of the observation variables and the second row
lists the values of the observation variables. For example [4,5;1,2]
indicates that observation variable 4 has value 1 and observation
variable 5 has value 2.