Unitree GO1 Locomotion#
Unitree GO1 is a quadruped robot platform. This example demonstrates how to train GO1 to achieve stable gait walking on flat terrain.
Task Description#
The GO1 quadruped robot has 12 degrees of freedom (3 joints per leg) and needs to learn coordinated gait control through deep reinforcement learning:
State Space: 48-dimensional, including robot linear velocity, angular velocity, posture, joint angles, joint velocities, actions, and commands
Action Space: 12-dimensional, controlling target positions of each joint (converted to torques through PD controller)
Reward Function: Composite reward including speed tracking, posture stability, energy efficiency, and other components
Termination Conditions: Robot trunk contacts ground or other unstable states
Usage Guide#
1. Environment Preview#
uv run scripts/view.py --env go1-flat-terrain-walk
2. Start Training#
uv run scripts/train.py --env go1-flat-terrain-walk
3. View Training Progress#
uv run tensorboard --logdir runs/go1-flat-terrain-walk
4. Test Training Results#
uv run scripts/play.py --env go1-flat-terrain-walk
Reward Function Design#
GO1’s reward function is a complex composite function containing multiple components:
Main Reward Components#
reward_config.scales = {
"tracking_lin_vel": 1.0, # Linear velocity tracking reward
"tracking_ang_vel": 0.5, # Angular velocity tracking reward
"feet_air_time": 1.0, # Foot air time reward
"lin_vel_z": -2.0, # Z-axis linear velocity penalty
"ang_vel_xy": -0.05, # XY-axis angular velocity penalty
"orientation": -0.0, # Posture deviation penalty
"torques": -0.00001, # Torque consumption penalty
"dof_acc": -2.5e-7, # Joint acceleration penalty
"action_rate": -0.001, # Action change rate penalty
"hip_pos": -1, # Hip joint position penalty
"calf_pos": -0.3, # Calf joint position penalty
}
Key Reward Functions#
Velocity Tracking Reward#
# Track linear velocity commands (xy plane)
def _reward_tracking_lin_vel(self, data, commands):
# Track angular velocity commands (yaw)
def _reward_tracking_ang_vel(self, data, commands):
Foot Air Time Reward#
def _reward_feet_air_time(self, commands, info):
Observation Space Composition#
GO1’s observation space is 48-dimensional, containing the following information:
obs = np.hstack([
noisy_linvel, # 3D: Local coordinate system linear velocity
noisy_gyro, # 3D: Gyroscope data
local_gravity, # 3D: Local gravity direction
noisy_joint_angle, # 12D: Joint angles (relative to default values)
noisy_joint_vel, # 12D: Joint velocities
last_actions, # 12D: Previous frame actions
command, # 3D: Velocity commands [vx, vy, vyaw]
])
Motion Velocity Command Generation#
Random velocity commands are generated during training to ensure the agent can track different movement speeds:
def resample_commands(self, num_envs: int):
Expected Training Results#
Stable quadruped gait
Good speed tracking