Single-Leg Hopping Robot#

Hopper is a classic single-leg hopping control task in dm-control, simulating a 2D single-leg hopping robot.

Task Description#

The 2D robot consists of four body segments: torso, pelvis, thigh, calf, and foot. Actions are generated through four articulated joints, including: waist, hip, knee, and ankle. Each joint is driven by a motor with different gear ratios, enabling behaviors such as standing, balancing, and hopping forward.

Action Space#

Item	Details
Type	`Box(-1.0, 1.0, (3,), float32)`
Dimension	3

Joint mapping:

Index	Action Description	Min	Max	XML Name
0	Torque applied on the thigh rotor	-1	1	thigh_joint
1	Torque applied on the leg rotor	-1	1	leg_joint
2	Torque applied on the foot rotor	-1	1	foot_joint

Observation Space#

Item	Details
Type	`Box(-inf, inf, (13,), float64)`
Dimension	13

Component	Description	Dim	Notes
qpos	Joint angles and torso height	5	torso x-position excluded by default
qvel	Joint and torso velocities	6	Velocity as derivative of position
contact sensors	Toe and heel ground sensors	2	Normalized by `log1p`

The observation vector consists of joint positions (qpos), velocities (qvel), and contact force sensors. The full dimension is 13, including two contact sensors:

Index	Observation	XML Name	Joint Type	Physical Meaning
0	torso z-position	`rootz`	slide	torso height
1	torso angle	`rooty`	hinge	body pitch angle
2	thigh joint angle	`thigh_joint`	hinge	thigh rotation
3	leg joint angle	`leg_joint`	hinge	calf rotation
4	foot joint angle	`foot_joint`	hinge	foot rotation
5	torso x-velocity	`rootx`	slide	forward velocity
6	torso z-velocity	`rootz`	slide	vertical velocity
7	torso angular velocity	`rooty`	hinge	torso angular velocity
8	thigh joint angular velocity	`thigh_joint`	hinge	thigh angular velocity
9	leg joint angular velocity	`leg_joint`	hinge	calf angular velocity
10	foot joint angular velocity	`foot_joint`	hinge	foot angular velocity
11	toe touch sensor	`touch_toe`	sensor	toe ground contact force
12	heel touch sensor	`touch_heel`	sensor	heel ground contact force

Reward Function Design#

The Hopper reward consists of the following terms:

Stand Task#

# Stand reward: maintain stable height

Hop Task#

# Stand reward: maintain stable height
# Hopping reward: achieve target forward velocity
# Leg movement reward: encourage moderate leg motion
# Knee extension reward: encourage proper knee extension
# Foot contact reward: encourage proper ground reaction forces
# Total reward = stand_reward + hop_reward + leg_motion_reward + knee_reward + contact_reward

Initial State#

Randomize joint angles within allowed ranges during reset

Episode Termination Conditions#

Observation values contain invalid numerical values (NaN)

Usage Guide#

1. Environment Preview#

uv run scripts/view.py --env dm-hopper-stand
uv run scripts/view.py --env dm-hopper-hop

2. Start Training#

uv run scripts/train.py --env dm-hopper-stand
uv run scripts/train.py --env dm-hopper-hop

3. View Training Progress#

uv run tensorboard --logdir runs/dm-hopper-stand

4. Test Training Results#

uv run scripts/play.py --env dm-hopper-stand
uv run scripts/play.py --env dm-hopper-hop

Expected Training Results#

Maintain stable standing behavior
Achieve a target hopping speed of 2.0