"""
Deep Reinforcement Learning (DRL) has revolutionized complex decision-making tasks, but still faces challenges in environments with sparse rewards, high-dimensional spaces, and long-term credit assignment issues. This project introduces the Teleport Markov Decision Processes (TMDPs) framework, which enhances the exploration capabilities of RL agents through a teleportation mechanism, contributing to more effective curriculum learning.
A Teleport MDP extends the traditional Markov Decision Process (MDP) by adding a teleportation mechanism. It allows an agent to be relocated to any state during an episode, controlled by:
- Teleport rate (τ): Determines the frequency of teleportation
- State teleport probability distribution (ξ): Dictates the possible states for teleportation
TMDPs start with a high teleport rate for wide exploration, gradually reducing it to increase task complexity and converge towards the original problem formulation.
A TMDP is defined by the tuple M=⟨S,A,P,R,γ,μ,τ,ξ⟩, where:
- S: State space
- A: Action space
- P(s′∣s,a): Transition probability model
- R(s,a): Reward function
- γ: Discount factor
- μ: Initial state distribution
- τ: Teleport rate
- ξ: Teleport probability distribution
The transition model in TMDP is defined as:
Pτ(s′∣s,a)=(1−τ)P(s′∣s,a)+τξ(s′)
We developed several algorithms integrating teleport-based curricula:
- Teleport Model Policy Iteration (TMPI)
- Static Teleport (S-T)
- Dynamic Teleport (D-T)
We conducted experiments using two RL environments:
- Frozen Lake
- River Swim
Results demonstrated that TMDP-based algorithms consistently outperformed their vanilla counterparts in both environments.
The Teleport MDP framework offers a flexible and effective approach to curriculum design in reinforcement learning, reducing reliance on domain-specific expertise and improving learning efficiency.
This research was conducted in collaboration with:
- Prof. Marcello Restelli
- Dr. Alberto Maria Metelli
- Dr. Luca Sabbioni
- Andrychowicz, M., et al. (2017). Hindsight experience replay.
- Florensa, C., et al. (2017). Reverse curriculum generation for reinforcement learning.
- Kakade, S. M., & Langford, J. (2002). Approximately optimal approximate reinforcement learning.
- Metelli, A. M., et al. (2018). Configurable Markov decision processes.
- Schulman, J., et al. (2017). Proximal policy optimization algorithms.
- Bengio, Y., et al. (2009). Curriculum learning.