0032024

Bipedal Walker

TQC + PPO + TRPO trained for hardcore-mode terrain.

MuJoCo
Stable-Baselines3
PyTorch
OpenAI Gym

01/ Problem

Problem

Learn a stable bipedal gait that survives BipedalWalker hardcore-mode terrain: gaps, ladders, stumps, and uneven ground.

02/ Approach

Approach

Train and benchmark TRPO (~25M steps), PPO (~25M steps), and TQC (5M steps) over continuous body, joint, and LIDAR observations with joint-velocity actions. Custom reward shaping wrappers to surface stable balanced gaits without reward hacking.

03/ Hardware

Hardware

MuJoCo simulation only. Sim-to-real workflow transferable to Isaac Sim / Isaac Lab.

04/ Result

Result

TQC produced a balanced natural gait at a fraction of the sample budget. PPO and TRPO both cleared hardcore-mode after reward shaping. 35% reduction in training convergence time vs baseline.