For this example, consider a 5-by-5 grid world with the following rules:
-
A 5-by-5 grid world bounded by borders, with 4 possible actions (North = 1, South = 2, East = 3, West = 4).
-
The agent begins from cell [2,1] (second row, first column).
-
The agent receives reward +10 if it reaches the terminal state at cell [5,5] (blue).
-
The environment contains a special jump from cell [2,4] to cell [4,4] with +5 reward.
-
The agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
-
All other actions result in -1 reward.
First, create a GridWorld
object using the createGridWorld
function.
GW = GridWorld with properties: GridSize: [5 5] CurrentState: '[1,1]' States: [25x1 string] Actions: [4x1 string] T: [25x25x4 double] R: [25x25x4 double] ObstacleStates: [0x1 string] TerminalStates: [0x1 string] ProbabilityTolerance: 8.8818e-16
Now, set the initial, terminal and obstacle states.
GW.TerminalStates = ‘[5,5]’;
GW.ObstacleStates = [‘[3,3]’;‘[3,4]’;‘[3,5]’;‘[4,3]’];
Update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
updateStateTranstionForObstacles(GW) GW.T(state2idx(GW,'[2,4]'),:,:) = 0; GW.T(state2idx(GW,'[2,4]'),state2idx(GW,'[4,4]'),:) = 1;
Next, define the rewards in the reward transition matrix.
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(state2idx(GW,‘[2,4]’),state2idx(GW,‘[4,4]’),:) = 5;
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
Now, use rlMDPEnv
to create a grid world environment using the GridWorld
object GW
.
env = rlMDPEnv(GW)
rlMDPEnv with properties:
Model: [1×1 rl.env.GridWorld]
ResetFcn: []
You can visualize the grid world environment using the plot
function.
plot(env)