The Bellman Equation in reinforcement learning is used to provide optimal results of any state. In this article we are going to study the prerequisites of Bellman's equation ie. we are going to discuss some factors of bellman's equation.
There are three main factors that affect Bellman's Equation. Refer to the below diagram to know all the three factors:
1) DISCOUNT FACTOR
There are two types of tasks performed in Reinforcements learning:
- Episodic- State of action in which starting and termination point is known one. In other words, it can be said that it is a collection of finite states. For example, a racing game.
- Continuous- State of action in which termination point is unknown. This state of action follows regulation iteration. For example, a learning process.
Discount Factor plays a very vital role in terms of Continuous state of actions. To elaborate further, in a continuous state of action, the Discount factor is used to calculate rewards in two parts:
1) Immediate Rewards- This is the type of reward that is received by the agent just after the action is performed. The immediate reward is represented by '0' in bellman's equation.
2) Future rewards- This is the type of reward that is received by the agent after some actions are performed. This type of reward represents '1' in bellman's equation.
Note: When it comes to the practical implementation of the Bellman Equation. The value of the Discount factor always lies between 0.2 to 0.8.
2) VALUE FUNCTION
The Value Function says how good the particular state is for the agent. Also, it can be said that the Value function is directly proportional to the Actions that an agent performs.
This is the third and also an important factor for Bellman Equation. The policy is nothing but a set of actions performed by an agent being in a particular state. The term policy is denoted by pie().