Quadruped Robot#
The Quadruped robot is a classic continuous control task in the DeepMind Control Suite. In MotrixLab, the motrix_envs/src/motrix_envs/basic/quadruped directory currently registers four directly trainable tasks: flat-ground walking dm-quadruped-walk, flat-ground running dm-quadruped-run, rough-terrain escape dm-quadruped-escape, and flat-ground ball pushing dm-quadruped-fetch.
Task Preview#
Walk#
Run#
Escape#
Fetch#
Task Overview#
Environment ID |
Task Goal |
Model File |
Target Speed |
Observation Dimension |
|---|---|---|---|---|
|
Walk forward stably on flat ground while maintaining heading |
|
0.5 m/s |
54 |
|
Run at high speed on flat ground while maintaining stable posture |
|
5.0 m/s |
54 |
|
Escape outward from the origin area as quickly as possible on uneven terrain |
|
3.0 m/s |
57 |
|
Push a ball into the target area on flat ground |
|
2.0 m/s |
66 |
Task Description#
Quadruped is a 3D quadruped robot task. The robot consists of one torso and four legs, and each leg has control dimensions related to yaw, lift, and extension. The underlying XML defines the hip, knee, and ankle joint structure, while the action layer uses a coupled actuator design with 3 actuators per leg:
yaw: controls leg yawlift: controls leg lifting through tendon couplingextend: controls leg extension/retraction through tendon coupling
walk and run use the same flat-ground model, with the main difference being the target speed. escape uses quadruped_escape.xml with a heightfield terrain and requires the robot to move away from the world origin quickly while maintaining an upright torso and stable locomotion. fetch uses quadruped_fetch.xml, which adds a free ball and a target region to the scene, requiring the robot to first approach a suitable position and then push the ball toward the goal.
Action Space#
Item |
Details |
|---|---|
Type |
|
Dimension |
12 |
Actions are arranged leg by leg, and each leg contains three actuators: yaw / lift / extend.
Index |
Action Meaning |
Min Value |
Max Value |
Corresponding Actuator |
|---|---|---|---|---|
0 |
Front-left leg yaw control |
-1.0 |
1.0 |
|
1 |
Front-left leg lift control |
-1.0 |
1.1 |
|
2 |
Front-left leg extension control |
-0.8 |
0.8 |
|
3 |
Front-right leg yaw control |
-1.0 |
1.0 |
|
4 |
Front-right leg lift control |
-1.0 |
1.1 |
|
5 |
Front-right leg extension control |
-0.8 |
0.8 |
|
6 |
Rear-right leg yaw control |
-1.0 |
1.0 |
|
7 |
Rear-right leg lift control |
-1.0 |
1.1 |
|
8 |
Rear-right leg extension control |
-0.8 |
0.8 |
|
9 |
Rear-left leg yaw control |
-1.0 |
1.0 |
|
10 |
Rear-left leg lift control |
-1.0 |
1.1 |
|
11 |
Rear-left leg extension control |
-0.8 |
0.8 |
|
Observation Space#
Environment |
Details |
|---|---|
|
|
|
|
|
|
|
|
All four tasks share nearly the same proprioceptive observations. escape adds 3 task-related dimensions associated with the origin, while fetch adds ball state and target position information:
Part |
Description |
Dimension |
|
|
|
|---|---|---|---|---|---|
egocentric dof pos |
Body generalized position state |
16 |
Yes |
Yes |
Yes |
egocentric dof vel |
Body generalized velocity state |
16 |
Yes |
Yes |
Yes |
actuator ctrl |
Current 12-dimensional actuator controls |
12 |
Yes |
Yes |
Yes |
torso velocity |
Torso linear velocity sensor |
3 |
Yes |
Yes |
Yes |
torso upright |
Scalar representing torso uprightness |
1 |
Yes |
Yes |
Yes |
imu |
IMU acceleration and angular velocity |
6 |
Yes |
Yes |
Yes |
origin |
World origin position in the body frame |
3 |
No |
Yes |
No |
ball state |
Ball position, relative linear velocity, angular velocity |
9 |
No |
No |
Yes |
target |
Relative target position in the body frame |
3 |
No |
No |
Yes |
The XML also defines foot force/torque sensors and a center-of-mass sensor, but these values are not directly concatenated into the default observation in the current implementation.
Reward Function Design#
All four tasks use torso uprightness as the core constraint. In the implementation, upright_reward is computed first from torso_upright, encouraging the robot to keep the body close to upright.
Walk / Run#
dm-quadruped-walk and dm-quadruped-run use the same reward structure, with different target speeds:
walktracks0.5 m/sruntracks5.0 m/s
The total reward is composed of:
# Speed reward: reach the target forward speed
# Posture reward: keep the torso upright
# Auxiliary rewards: height, lateral stability, heading alignment, action smoothness
# Penalties: backward motion, excessive vertical speed, excessive roll/pitch angular velocity, deviation from default posture
total_reward = upright_reward * move_reward + shaping_terms - penalty_terms
The main shaping and penalty terms include:
height_reward: encourages the torso to stay near the standing heightlateral_reward: suppresses excessive lateral velocityheading_reward: encourages forward motion along the +X directionsmooth_reward: penalizes large action changes between consecutive timestepsbackward_penalty: suppresses backward movementlin_vel_z_penaltyandang_vel_xy_penalty: suppress vertical bouncing and excessive torso roll/pitchsimilar_to_default_penalty: encourages joint posture to stay reasonably close to the default standing pose
Escape#
dm-quadruped-escape adds a task term for escaping away from the origin area on top of the locomotion reward. This task uses the heightfield terrain in quadruped_escape.xml:
# Base locomotion reward
# + reward for getting farther from the origin
# + reward for outward radial speed
total_reward = locomotion_reward + upright_reward * escape_reward + radial_speed_reward
Additional task terms include:
escape_reward: rewards the robot based on its distance from the origin arearadial_speed_reward: encourages acceleration along the outward direction away from the origin
This makes escape require not only fast locomotion, but also correct outward motion on rough terrain.
Fetch#
dm-quadruped-fetch uses a task-specific shaping structure centered on positioning and ball pushing. In the current implementation, the reward is mainly based on the geometric relationship between the robot, the ball, and the target:
# Positioning stage: encourage the robot to move behind or slightly behind the ball
# Ready stage: encourage facing the ball, getting close to it, and aligning with the ball-target line
# Pushing stage: encourage the ball to roll toward the target and eventually enter the target area
# Penalties: moving in the wrong direction, pushing the ball away from the target, getting the legs too close to the ball
total_reward = stage_terms + ready_terms + push_terms - penalty_terms
The main terms include:
stage_move: encourages movement toward the current stage waypointstage_reach: encourages reaching a suitable waypoint behind or to the side of the ballbehind_align: encourages the robot to position itself behind the ball relative to the targetface_ball: encourages the torso heading to point toward the ballnear_ball: encourages the robot to approach the ballreadyandready_gate: combine position, orientation, and distance into a readiness signal for active pushingfetch: encourages the ball to get closer to the target regionpush: encourages the ball to move along the target directionbackward: penalizes moving opposite to the current stage targetaway: penalizes pushing the ball away from the targetleg_ball: penalizes leg geometry getting too close to the ball, reducing ball trapping and squeezing behavior
In addition, fetch uses a stability gate on torso uprightness and torso height so the agent cannot easily exploit obviously collapsed poses to collect task reward.
Initial State#
walk,run, andescapereset from the default quadruped standing pose defined in the XMLFor these three tasks, the root orientation is fixed to the initial heading instead of being randomly rotated
fetchrandomizes the robot position and yaw on the plane, and also randomizes the ball position on the groundAll tasks initialize joint velocities, and in
fetchalso ball velocities, to zeroDuring reset, the robot is automatically lifted until there is no initial penetration/contact with the ground
Episode Termination Conditions#
Maximum episode duration is 20 seconds
The episode terminates when
NaNappears in the observationwalk,run, andescapedo not currently define a separate fall termination conditionfetchterminates early when the robot has clearly fallen, based on low torso uprightness or low torso heightThe current implementation does not yet define a separate success termination condition for “ball enters the target area”
Usage Guide#
1. Environment Preview#
uv run scripts/view.py --env dm-quadruped-walk
uv run scripts/view.py --env dm-quadruped-run
uv run scripts/view.py --env dm-quadruped-escape
uv run scripts/view.py --env dm-quadruped-fetch
2. Start Training#
uv run scripts/train.py --env dm-quadruped-walk
uv run scripts/train.py --env dm-quadruped-run
uv run scripts/train.py --env dm-quadruped-escape
uv run scripts/train.py --env dm-quadruped-fetch
3. View Training Progress#
uv run tensorboard --logdir runs/dm-quadruped-walk
uv run tensorboard --logdir runs/dm-quadruped-run
uv run tensorboard --logdir runs/dm-quadruped-escape
uv run tensorboard --logdir runs/dm-quadruped-fetch
4. Test Training Results#
uv run scripts/play.py --env dm-quadruped-walk
uv run scripts/play.py --env dm-quadruped-run
uv run scripts/play.py --env dm-quadruped-escape
uv run scripts/play.py --env dm-quadruped-fetch
Expected Training Results#
Walking Task (dm-quadruped-walk)#
Maintain a stable forward speed close to
0.5 m/sKeep body posture stable with small lateral sway
Sustain walking along the +X direction
Running Task (dm-quadruped-run)#
Increase speed to near or above
5.0 m/sProduce larger stride length and more explosive motions
Maintain good torso stability under high-speed locomotion
Escape Task (dm-quadruped-escape)#
Move away from the origin region quickly
Maintain stable footholds on the heightfield terrain without easily tipping over
Move primarily outward rather than spinning in place
Fetch Task (dm-quadruped-fetch)#
First move into a reasonable position along the ball-target line instead of randomly colliding with the ball from the side
Push the ball toward the target area consistently instead of kicking it away or repeatedly sending it off course
Maintain better body stability during pushing, with fewer collapsed poses, flips, or leg-ball entanglement behaviors