Point Mass Environment#

The Point Mass environment is a simple yet fundamental 2D navigation task where an agent controls a point mass to reach a target position. This environment serves as an excellent introduction to reinforcement learning concepts and continuous action spaces.

Task Description#

The Point Mass environment is a 2D navigation task. The agent needs to control a point mass by applying forces to move it to a randomly generated target position. This task requires the agent to learn efficient navigation strategies to reach the target with minimal control cost.

Action Space (Action Space)#

Item	Details
Type	`Box(-1.0, 1.0, (2,), float32)`
Dimension	2

Actions correspond to:

Index	Action Meaning (Applied Force)	Min	Max	XML Name
0	x-direction force	-1	1	`x_force`
1	y-direction force	-1	1	`y_force`

Observation Space#

Item	Details
Type	`Box(-inf, inf, (9,), float32)`
Dimension	9

The observation space of the Point Mass environment consists of the following components (in order):

Component	Description	Dimension
Position	Point mass x, y coordinates	2
Velocity	Point mass x, y velocities	2
Target	Target x, y coordinates	2
Distance	Distance vector to target	2
Distance	Euclidean distance to target	1

Reward Function Design#

The Point Mass environment’s reward function consists of the following components:

Distance Reward#

# Exponential distance reward - stronger as agent gets closer
distance_reward = np.exp(-10 * dist_to_target)

Target Arrival and Stay Reward#

# Large bonus for reaching target
target_bonus = 100.0 * in_target

# Continuous reward for staying in target
continuous_reward = 30.0 * in_target

Control and Path Optimization#

# Penalty for distance from target center when inside target
center_penalty = np.where(in_target, 10.0 * dist_to_target, 0.0)

# Control penalty to encourage smooth movement
control_penalty = 0.1 * vel_magnitude

# Path optimization reward for straight-line movement
path_reward = 0.5 * direction_alignment

Total Reward Calculation#

# Combine all reward components
rwd = distance_reward + target_bonus + continuous_reward + path_reward - center_penalty - control_penalty

Initial State#

Point mass position randomly initialized within [-1.0, 1.0]
Target position randomly initialized within [-1.5, 1.5]
Point mass velocity initialized to 0

Episode Termination Conditions#

Point mass reaches target and stays for 0.5 seconds
Simulation time reaches 10 seconds
Observation contains abnormal values (NaN)

Usage Guide#

1. Environment Preview#

uv run scripts/view.py --env point_mass

2. Start Training#

uv run scripts/train.py --env point_mass

3. View Training Progress#

uv run tensorboard --logdir runs/point_mass

4. Test Training Results#

uv run scripts/play.py --env point_mass