Point Mass Environment#
The Point Mass environment is a simple yet fundamental 2D navigation task where an agent controls a point mass to reach a target position. This environment serves as an excellent introduction to reinforcement learning concepts and continuous action spaces.
Task Description#
The Point Mass environment is a 2D navigation task. The agent needs to control a point mass by applying forces to move it to a randomly generated target position. This task requires the agent to learn efficient navigation strategies to reach the target with minimal control cost.
Action Space (Action Space)#
Item |
Details |
|---|---|
Type |
|
Dimension |
2 |
Actions correspond to:
Index |
Action Meaning (Applied Force) |
Min |
Max |
XML Name |
|---|---|---|---|---|
0 |
x-direction force |
-1 |
1 |
|
1 |
y-direction force |
-1 |
1 |
|
Observation Space#
Item |
Details |
|---|---|
Type |
|
Dimension |
9 |
The observation space of the Point Mass environment consists of the following components (in order):
Component |
Description |
Dimension |
Notes |
|---|---|---|---|
Position |
Point mass x, y coordinates |
2 |
|
Velocity |
Point mass x, y velocities |
2 |
|
Target |
Target x, y coordinates |
2 |
|
Distance |
Distance vector to target |
2 |
|
Distance |
Euclidean distance to target |
1 |
Reward Function Design#
The Point Mass environment’s reward function consists of the following components:
Distance Reward#
# Exponential distance reward - stronger as agent gets closer
distance_reward = np.exp(-10 * dist_to_target)
Target Arrival and Stay Reward#
# Large bonus for reaching target
target_bonus = 100.0 * in_target
# Continuous reward for staying in target
continuous_reward = 30.0 * in_target
Control and Path Optimization#
# Penalty for distance from target center when inside target
center_penalty = np.where(in_target, 10.0 * dist_to_target, 0.0)
# Control penalty to encourage smooth movement
control_penalty = 0.1 * vel_magnitude
# Path optimization reward for straight-line movement
path_reward = 0.5 * direction_alignment
Total Reward Calculation#
# Combine all reward components
rwd = distance_reward + target_bonus + continuous_reward + path_reward - center_penalty - control_penalty
Initial State#
Point mass position randomly initialized within [-1.0, 1.0]
Target position randomly initialized within [-1.5, 1.5]
Point mass velocity initialized to 0
Episode Termination Conditions#
Point mass reaches target and stays for 0.5 seconds
Simulation time reaches 10 seconds
Observation contains abnormal values (NaN)
Usage Guide#
1. Environment Preview#
uv run scripts/view.py --env point_mass
2. Start Training#
uv run scripts/train.py --env point_mass
3. View Training Progress#
uv run tensorboard --logdir runs/point_mass
4. Test Training Results#
uv run scripts/play.py --env point_mass
Expected Training Results#
Learning Progress#
Rapid initial learning phase as agent discovers basic navigation
Gradual refinement of control strategy
Stable performance across different target positions
Behavior Characteristics#
Efficient path planning towards target
Smooth approach to target center
Minimal overshooting or oscillatory behavior