Random Latent Exploration for Deep Reinforcement Learning

Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal

Improbable AI Lab

Computer Science and Artificial Intelligence Laboratory (CSAIL)

Massachusetts Institute of Technology (MIT)

International Conference on Machine Learning (ICML), 2024



Video



Abstract


Exploration is a longstanding challenge in reinforcement learning (RL), which is typically addressed through either noise-based or bonus-based exploration strategies. Noise-based methods are easy to implement but struggle with deep exploration, while bonus-based methods, though more complex, excel in tasks requiring deep exploration. Since both approaches perform similarly on average, noise-based exploration is commonly used in deep RL algorithms due to its simplicity. In this paper, we introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy that outperforms both methods on average. The core idea is to motivate the agent to explore different parts of the environment by pursuing randomly sampled goals. The agent's policy is conditioned on randomly sampled vectors that serve as goals, and these vectors provide varying rewards at each state, encouraging the agent to explore different areas of the environment. RLE is as simple as noise-based methods, as it avoids complex bonus calculations. Our experiments show that RLE improves performance on average in both discrete (e.g., Atari) and continuous control tasks (e.g., Isaac Gym), enhancing exploration while remaining a simple and general plug-in for existing RL algorithms.


Paper


Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal

International Conference on Machine Learning (ICML), 2024

paper / project page / bibtex


Overview


Generating diverse trajectories is hard with common exploration strategies


Action noise leads to similar trajectories in one training iteration.
Parameter noise leads to slightly more diverse trajectories, but still not enough.
Exploration bonus leads to diverse experience, but only over multiple training iterations.

RLE formulation and implementation

Random Latent Exploration leads to diverse trajectories in one iteration.
Implementation of RLE

Results


Improved state space coverage with RLE


RLE improves in Atari (57 Tasks) and IsaacGym (9 Tasks)

Atari results IsaacGym results




Author Contributions


Srinath Mahankali ran initial experiments to investigate the benefit of random rewards that informed the eventual formulation of RLE, which he then compared against baseline methods on Atari and IsaacGym environments and helped with paper writing.

Zhang-Wei Hong conceived the possibility of using random rewards for exploration. He was involved in research discussions, helped scale experiments, played a significant role in paper writing, and advised Srinath.

Ayush Sekhari was involved in research discussions and helped set the overall formulation of RLE. He played a significant role in paper writing, and advised Srinath and Zhang-Wei.

Alexander Rakhlin was involved in research discussions and advising.

Pulkit Agrawal was involved in research discussions, overall advising, paper writing, and positioning of the work.



Website made using this template.