Maximizing Quadruped Velocity by Minimizing Energy

Srinath Mahankali*, Chi-Chang Lee*, Gabriel B. Margolis, Zhang-Wei Hong, Pulkit Agrawal

Improbable AI Lab

Computer Science and Artificial Intelligence Laboratory (CSAIL)

Massachusetts Institute of Technology (MIT)

International Conference on Robotics and Automation (ICRA), 2024



Video



Abstract


Reinforcement Learning (RL) has been a powerful tool for training robots to acquire agile locomotion skills. To learn locomotion, it is commonly necessary to introduce additional reward-shaping terms, such as an energy minimization term, to guide an algorithm like Proximal Policy Optimization (PPO) to good performance. Prior works rely on hyper-parameter tuning on the weight of the reward shaping terms to obtain satisfactory task performance. To save the efforts of tuning these weights, we adopt the Extrinsic-Intrinsic Policy Optimization (EIPO) framework. The key idea of EIPO is to establish a constrained optimization framework for the primary objective of enhancing task performance and the secondary objective of minimizing energy consumption. It seeks a policy that minimizes the energy consumption objective within the optimal policy space for task performance. This guarantees that the learned policy excels in task performance while conserving energy, all without requiring manual weight adjustments for both objectives. Our experiments evaluate EIPO on various quadruped locomotion tasks, revealing that policies trained with EIPO consistently achieve higher task performance than PPO comparisons while maintaining comparable energy consumption levels. Furthermore, EIPO exhibits superior task performance in real-world evaluations compared to PPO.


Paper


Maximizing Quadruped Velocity by Minimizing Energy
Srinath Mahankali*, Chi-Chang Lee*, Gabriel B. Margolis, Zhang-Wei Hong, Pulkit Agrawal

International Conference on Robotics and Automation (ICRA), 2024

paper / project page / bibtex


Overview


Tuning rewards is tedious but essential in sim-to-real RL


Automatically tuning the auxiliary reward coefficient with EIPO


Results


EIPO consistently outperforms PPO in simulation


Omnidirectional velocity tracking with EIPO and extremely simple auxiliary reward




Author Contributions


Srinath Mahankali designed the energy efficiency reward function, ran experiments in simulation, helped run real-world deployment experiments, and helped with paper writing.

Chi-Chang Lee implemented EIPO, helped run experiments in simulation, and helped with paper writing.

Gabriel B. Margolis was involved in research discussions and paper writing, helped run real-world deployment experiments, and advised Srinath and Chi-Chang.

Zhang-Wei Hong was involved in research discussions, played a significant role in paper writing, and advised Srinath and Chi-Chang.

Pulkit Agrawal was involved in research discussions, overall advising, paper writing, and positioning of the work.



Website made using this template.