Two-Joint Robotic Arm Control#

Reacher is a classic robotic arm control task, simulating a robotic arm composed of two links. The goal is to bring the end effector (fingertip) as close as possible to a randomly generated target point.

Task Description#

The Reacher consists of two joints, with two links connected by hinge joints. The objective of the task is to move the end of the robotic arm to the target position. The target point is randomly sampled at the beginning of each episode.

Action Space#

Item	Details
Type	`Box(-1.0, 1.0, (2,), float32)`
Dimension	2

The actions correspond to:

Index	Action Description	Min Control	Max Control	XML Name	Joint Type
0	Torque applied to the first joint (root link)	-1	1	joint0	hinge
1	Torque applied to the second joint (middle link)	-1	1	joint1	hinge

Observation Space#

Item	Details
Type	`Box(-inf, inf, (6,), float32)`
Dimension	6

The observation vector contains the following parts (in order):

qpos: 2 joint angles
fingertip → target vector difference: x and y dimensions
qvel: 2 joint angular velocities

Index	Observation	Min	Max	XML Name	Joint	Unit
0	First joint angle	-inf	inf	joint0_pos	hinge	rad
1	Second joint angle	-inf	inf	joint1_pos	hinge	rad
2	fingertip - target x difference	-inf	inf	NA	slide	m
3	fingertip - target y difference	-inf	inf	NA	slide	m
4	First joint angular velocity	-inf	inf	joint0_vel	hinge	rad/s
5	Second joint angular velocity	-inf	inf	joint1_vel	hinge	rad/s

Reward Function Design#

The reward for this task is based on the distance between the fingertip and the target:

Distance Reward (tolerance reward)#

reward = tolerance(|| fingertip - target ||)

The closer the distance, the higher the reward

Initial State#

The initial state is sampled from random distributions:

Arm angles: uniform distribution
Arm angular velocities: small random values
Target point position: random position in a circular area

Episode Termination Conditions#

Termination#

If NaN appears in the observations

Termination Handling#

reward = 0
terminated = True

Usage Guide#

1. Environment Preview#

uv run scripts/view.py --env dm-reacher

2. Start Training#

uv run scripts/train.py --env dm-reacher

3. View Training Progress#

uv run tensorboard --logdir runs/dm-reacher

4. Test Training Results#

uv run scripts/play.py --env dm-reacher

Expected Training Results#

The robotic arm quickly and accurately reaches the target point