Random Latent Exploration for Deep Reinforcement Learning

Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal

Improbable AI Lab

Computer Science and Artificial Intelligence Laboratory (CSAIL)

Massachusetts Institute of Technology (MIT)

International Conference on Machine Learning (ICML), 2024

Video

Abstract

We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL). On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors. The core idea of RLE is to encourage the agent to explore different parts of the environment by pursuing randomly sampled goals in a latent space. RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods. Our experiments show that RLE improves performance on average in both discrete (e.g., Atari) and continuous control tasks (e.g., Isaac Gym), enhancing exploration while remaining a simple and general plug-in for existing RL algorithms.

Paper

Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal

International Conference on Machine Learning (ICML), 2024

paper / project page / bibtex

Overview

Generating diverse trajectories is hard with common exploration strategies

Action noise leads to similar trajectories in one training iteration.

Parameter noise leads to slightly more diverse trajectories, but still not enough.

Exploration bonus leads to diverse experience, but only over multiple training iterations.

RLE formulation and implementation

Random Latent Exploration leads to diverse trajectories in one iteration.

Results

Improved state space coverage with RLE

RLE improves in Atari (57 Tasks) and IsaacGym (9 Tasks)

Author Contributions

Srinath Mahankali ran initial experiments to investigate the benefit of random rewards that informed the eventual formulation of RLE, which he then compared against baseline methods on Atari and IsaacGym environments and helped with paper writing.

Zhang-Wei Hong conceived the possibility of using random rewards for exploration. He was involved in research discussions, helped scale experiments, played a significant role in paper writing, and advised Srinath.

Ayush Sekhari was involved in research discussions and helped set the overall formulation of RLE. He played a significant role in paper writing, and advised Srinath and Zhang-Wei.

Alexander Rakhlin was involved in research discussions and advising.

Pulkit Agrawal was involved in research discussions, overall advising, paper writing, and positioning of the work.

Website made using this template.