3D Humanoid Robot#

The 3D Humanoid Robot (Humanoid) is a classic bipedal locomotion task from DeepMind Control Suite. The goal is to train a simulated 3D humanoid robot to achieve standing, walking, and running by controlling joint torques.

Task Description#

Humanoid is a 3D bipedal humanoid robot task. The robot consists of a head, torso, two arms, and two legs, with 21 controlled joints (actuators). The agent controls the robot by applying torques to these joints to achieve standing balance, forward walking, or fast running. This task requires coordinated bipedal gait, balance control, and 3D spatial posture stability.

Action Space#

Item	Details
Type	`Box(-1.0, 1.0, (21,), float32)`
Dim	21

Actions correspond to the following:

Index	Action (Joint Torque)	Min	Max	XML Name
0	Abdomen Y-axis rotation torque	-1.0	1.0	`abdomen_y`
1	Abdomen Z-axis rotation torque	-1.0	1.0	`abdomen_z`
2	Abdomen X-axis rotation torque	-1.0	1.0	`abdomen_x`
3	Right hip X-axis torque	-1.0	1.0	`right_hip_x`
4	Right hip Z-axis torque	-1.0	1.0	`right_hip_z`
5	Right hip Y-axis torque	-1.0	1.0	`right_hip_y`
6	Right knee torque	-1.0	1.0	`right_knee`
7	Right ankle X-axis torque	-1.0	1.0	`right_ankle_x`
8	Right ankle Y-axis torque	-1.0	1.0	`right_ankle_y`
9	Left hip X-axis torque	-1.0	1.0	`left_hip_x`
10	Left hip Z-axis torque	-1.0	1.0	`left_hip_z`
11	Left hip Y-axis torque	-1.0	1.0	`left_hip_y`
12	Left knee torque	-1.0	1.0	`left_knee`
13	Left ankle X-axis torque	-1.0	1.0	`left_ankle_x`
14	Left ankle Y-axis torque	-1.0	1.0	`left_ankle_y`
15	Right shoulder 1 torque	-1.0	1.0	`right_shoulder1`
16	Right shoulder 2 torque	-1.0	1.0	`right_shoulder2`
17	Right elbow torque	-1.0	1.0	`right_elbow`
18	Left shoulder 1 torque	-1.0	1.0	`left_shoulder1`
19	Left shoulder 2 torque	-1.0	1.0	`left_shoulder2`
20	Left elbow torque	-1.0	1.0	`left_elbow`

Observation Space#

Item	Details
Type	`Box(-inf, inf, (73,), float32)`
Dim	73

The observation space of the Humanoid environment consists of the following components (in order):

Component	Description	Dim	Notes
joint_angles	Joint angles (excluding root’s 7 DOF)	22	Angles of 22 joints
head_height	Head height	1	Height of head relative to ground
extremities	Extremity positions (relative to torso)	12	Left hand, left foot, right hand, right foot (3D each, in that order)
torso_vertical	Torso vertical direction vector	3	Vertical direction in local coordinates
com_vel	Center of mass linear velocity	3	Linear velocity of torso subtree
qvel	Velocities of all joints and root	29	Including root’s 6 DOF
target_local	Target direction (local coordinates)	3	Target direction in torso local frame

Reward Function Design#

The Humanoid reward function varies according to task type (standing, walking, running), but all include the following core components:

Posture Reward#

# Head height reward: keep head above target height (95% of stand_height, ~1.33m)
stand_reward = tolerance(head_height, bounds=(stand_height * 0.95, inf), margin=0.5)

# Torso upright reward: keep torso upright
upright_reward = tolerance(torso_upright, bounds=(0.9, inf), sigmoid="linear", margin=0.9)

# Pelvis height reward: keep pelvis at reasonable height (60% of stand_height, ~0.84m)
pelvis_height_reward = tolerance(pelvis_height, bounds=(stand_height * 0.6, inf), sigmoid="linear", margin=stand_height * 0.6)

# Posture reward = head height reward × torso upright reward × pelvis height reward
posture_reward = stand_reward * upright_reward * pelvis_height_reward

Speed Reward#

The speed reward calculation differs by task type:

Standing Task (move_speed <= 0):

# Speed reward: maintain near-zero speed
speed_reward = tolerance(actual_speed, bounds=(0, 0), margin=1.0, value_at_margin=0.01)

Walking Task (0 < move_speed <= 3.0):

# Speed reward: achieve target speed (default 1.0 m/s) in target direction (positive X-axis)
actual_speed = dot(com_vel[:2], target_direction[:2])  # Projection of velocity onto target direction
speed_reward = tolerance(actual_speed, bounds=(move_speed, move_speed), margin=move_speed, sigmoid="linear")

Running Task (move_speed > 3.0):

# Speed reward: achieve target speed (default 10.0 m/s) or above in target direction
actual_speed = dot(com_vel[:2], target_direction[:2])
speed_reward = tolerance(actual_speed, bounds=(move_speed, inf), margin=move_speed, sigmoid="linear")

Energy Reward#

energy_reward = exp(-energy_coef * mean(ctrls ^ 2))

Gait Reward#

# Torso heading reward: torso faces target direction
torso_heading_reward = tolerance(dot(torso_forward, target_dir), bounds=(0.9, 1.0), margin=0.3, sigmoid="linear")

# Head heading reward: head faces target direction
head_heading_reward = tolerance(dot(head_forward, target_dir), bounds=(0.9, 1.0), margin=0.3, sigmoid="linear")

# Pelvis yaw reward: pelvis faces target direction
pelvis_yaw_reward = tolerance(dot(pelvis_forward, target_dir), bounds=(0.9, 1.0), margin=0.3, sigmoid="linear")

# Pelvis level reward: pelvis remains level
pelvis_level_reward = tolerance(pelvis_up, bounds=(0.9, 1.0), margin=0.3, sigmoid="linear")

# Feet height reward: feet stay close to ground
feet_height_reward = tolerance(max_foot_height, bounds=(0.0, 0.3), margin=0.5, sigmoid="quadratic")

# Gait reward = product of all heading and posture rewards
gait_reward = torso_heading_reward * head_heading_reward * pelvis_yaw_reward * pelvis_level_reward * feet_height_reward

Total Reward#

total_reward = posture_reward * speed_reward * energy_reward * gait_reward

Initial State#

Robot Position: Torso initial height is 1.33 meters (95% of standard stand height)
Robot Orientation: Torso remains upright, quaternion set to (1.0, 0.0, 0.0, 0.0)
Joint Angles: Randomly initialized within joint limits
- Torso/hip base joints: Randomized in small range (±15 degrees)
- Leg joints: Symmetrically initialized, ensuring left and right legs are independently randomized, knees initially bent
- Arm joints: Symmetrically initialized, using middle 80% of joint limit ranges
Initial Velocities: All joint velocities and linear velocities initialized to small random values near zero (-0.01 to 0.01)
Initial Controls: All actuator controls initialized to small random values near zero (-0.02 to 0.02)

Episode Termination Conditions#

Robot state observations contain abnormal values (NaN or Inf)
Head too low: Head height below 50% of standard stand height (0.7 meters)
Torso too tilted: Torso vertical component less than 0.2 (severe torso tilt)
Extreme velocity: Absolute value of any joint velocity exceeds 200.0 rad/s or m/s
Maximum episode duration: 25 seconds

Usage Guide#

1. Environment Preview#

uv run scripts/view.py --env dm-humanoid-stand
uv run scripts/view.py --env dm-humanoid-walk
uv run scripts/view.py --env dm-humanoid-run

2. Start Training#

uv run scripts/train.py --env dm-humanoid-stand
uv run scripts/train.py --env dm-humanoid-walk
uv run scripts/train.py --env dm-humanoid-run

3. View Training Progress#

uv run tensorboard --logdir runs/dm-humanoid-walk

4. Test Training Results#

uv run scripts/play.py --env dm-humanoid-stand
uv run scripts/play.py --env dm-humanoid-walk
uv run scripts/play.py --env dm-humanoid-run

Expected Training Results#

Standing Task (dm-humanoid-stand)#

Head height maintained in 1.3-1.5m range
Torso upright angle deviation less than 15 degrees
Able to stand stably without falling
Speed near zero, no significant movement

Walking Task (dm-humanoid-walk)#

Actual walking speed close to 1.0 m/s
Coordinated gait, no obvious falls
Able to walk continuously and stably
Torso and head facing target direction

Running Task (dm-humanoid-run)#

Running speed reaches 5.0-10.0 m/s
Flight phase appears (both feet off ground simultaneously)
Coordinated and stable gait
Able to maintain high-speed running posture