Pendulum#
Pendulum is a single-joint swing-up and balance task. The goal is to swing the pole up and keep it inverted using one motor torque.
Task Description#
A single-link pendulum with one hinge joint is driven by a single motor (configurable gear). The motor’s torque rotates the rod in a plane, enabling swing-up from arbitrary initial angles, inverted balance, and maintenance. Torque is limited by the actuator ctrlrange; by modulating its magnitude and direction, the policy must accumulate energy to swing up and stabilize near the inverted position while damping angular-velocity-induced oscillations.
Action Space#
Item |
Details |
|---|---|
Type |
|
Dimension |
1 |
Observation Space#
Item |
Details |
|---|---|
Type |
|
Dimension |
3 |
Order: cos(theta), sin(theta), angular velocity.
Reward Function Design#
Upright reward: encourages angle near π (inverted)
Energy shaping: target energy near inverted position
Penalties:
ang_vel^2,ctrl^2,(ctrl - prev_ctrl)^2to reduce oscillation and aggressive actuation
Initial State#
Angle randomized in
[-pi, pi]Angular velocity small random noise (if configured)
Control history (
prev_ctrl) reset to zero
Episode Termination Conditions#
No fall/angle termination; only NaN check
Episode length limited by
max_episode_seconds
1. Environment Preview#
uv run scripts/view.py --env pendulum
2. Start Training#
# Train with default parameters
uv run scripts/train.py --env pendulum
# Customize parallel environments
uv run scripts/train.py --env pendulum --num-envs 1024
# Enable rendering during training
uv run scripts/train.py --env pendulum --render
3. View Training Progress#
uv run tensorboard --logdir runs/pendulum
4. Test Training Results#
# Auto-discover best policy (recommended)
uv run scripts/play.py --env pendulum
# Manually specify a policy file
uv run scripts/play.py --env pendulum --policy runs/pendulum/nn/best_policy.pickle
Tip: Policies are auto-selected from
runs/pendulum/. You can override with--policy.
Configuration Parameters#
Environment Configuration#
@dataclass
class PendulumEnvCfg(EnvCfg):
model_file: str = ".../pendulum.xml" # MJCF model (gear=5)
max_episode_seconds: float = 20.0
sim_dt: float = 0.0125
ctrl_dt: float = 0.025
Training Configuration (PPO example)#
@rlcfg("pendulum")
@dataclass
class PendulumPPO(PPOCfg):
seed: int = 42
max_env_steps: int = 10_000_000
num_envs: int = 1024
learning_rate: float = 3e-4
rollouts: int = 32
learning_epochs: int = 5
mini_batches: int = 4
policy_hidden_layer_sizes: tuple[int, ...] = (64, 64)
value_hidden_layer_sizes: tuple[int, ...] = (64, 64)
Expected Training Results#
Pendulum can swing up and stay near inverted
Oscillation around upright is reduced by angular-velocity and control-change penalties