ANYmal-C Navigation#
Overview#
This document describes in detail the navigation task environment based on the ANYmal-C quadruped robot. This environment is part of the navigation task collection in the MotrixLab project, providing a complete implementation for training quadruped robots to navigate to target positions and orientations using reinforcement learning.
Environment Description#
The ANYmal-C navigation task environment is built based on the real ANYmal-C quadruped robot, designed to train robots to navigate to specified target positions and orientations on flat terrain. This environment uses the MotrixSim physics engine for simulation, providing high-fidelity dynamic simulation.
Robot Structure#
ANYmal-C is a quadruped robot composed of the following main parts:
Base: The core torso of the robot, containing sensor modules such as IMU, camera, and lidar
Four Legs: Each leg contains three joints
HAA (Hip Abduction/Adduction): Hip abduction/adduction joint
HFE (Hip Flexion/Extension): Hip flexion/extension joint
KFE (Knee Flexion/Extension): Knee flexion/extension
Four Feet: Spherical contact geometries that generate frictional contact with the ground
Task Objective#
The robot needs to complete the following navigation objectives:
Position Navigation: Move to the specified target position (XY plane coordinates)
Orientation Control: Adjust robot orientation to the target heading angle (yaw angle)
Stable Stop: Maintain stable standing after reaching the target, with linear and angular velocities approaching zero
The environment provides visualization markers:
Green Arrow: Indicates target position and orientation
Green Arrow Above Robot: Current actual movement direction
Blue Arrow Above Robot: Desired movement direction
Action Space#
The action space is Box(-1.0, 1.0, (12,), float32), representing position control commands applied to 12 joints (offsets relative to the default standing posture).
Control Mode#
The environment uses position control mode. Actions are converted to joint target positions as follows:
Target Joint Angle = Default Joint Angle + (Action Value × Action Scale)
Where the action scale is specified by the configuration parameter control_config.action_scale.
Action Dimension Details#
Index |
Action Description |
Control Range |
Joint Name |
Joint Type |
|---|---|---|---|---|
0 |
Left Front Hip HAA |
-1.0 ~ 1.0 |
LF_HAA |
hinge |
1 |
Left Front Hip HFE |
-1.0 ~ 1.0 |
LF_HFE |
hinge |
2 |
Left Front Knee KFE |
-1.0 ~ 1.0 |
LF_KFE |
hinge |
3 |
Right Front Hip HAA |
-1.0 ~ 1.0 |
RF_HAA |
hinge |
4 |
Right Front Hip HFE |
-1.0 ~ 1.0 |
RF_HFE |
hinge |
5 |
Right Front Knee KFE |
-1.0 ~ 1.0 |
RF_KFE |
hinge |
6 |
Left Hind Hip HAA |
-1.0 ~ 1.0 |
LH_HAA |
hinge |
7 |
Left Hind Hip HFE |
-1.0 ~ 1.0 |
LH_HFE |
hinge |
8 |
Left Hind Knee KFE |
-1.0 ~ 1.0 |
LH_KFE |
hinge |
9 |
Right Hind Hip HAA |
-1.0 ~ 1.0 |
RH_HAA |
hinge |
10 |
Right Hind Hip HFE |
-1.0 ~ 1.0 |
RH_HFE |
hinge |
11 |
Right Hind Knee KFE |
-1.0 ~ 1.0 |
RH_KFE |
hinge |
PD Control Parameters#
The underlying system uses position actuators with PD control parameters defined in the XML file:
kp (Proportional Gain): 200
kv (Derivative Gain): 1
Torque Limit: -140 N·m ~ 140 N·m
Observation Space#
The observation space is Box(-inf, inf, (54,), float32), containing the robot’s proprioceptive information, task-related information, and action history.
Observation Components#
The observation vector consists of the following parts (in order):
Proprioceptive State (33 dimensions)
Base Linear Velocity (3 dim): Linear velocity of robot base in world frame [vx, vy, vz]
Angular Velocity (3 dim): Angular velocity from gyroscope [ωx, ωy, ωz]
Projected Gravity (3 dim): Gravity vector projected in robot body frame
Joint Angles (12 dim): 12 joint angle offsets relative to default standing posture
Joint Velocities (12 dim): 12 joint angular velocities
Action History (12 dimensions)
Action executed at previous timestep
Velocity Commands (3 dimensions)
Desired Linear Velocity XY (2 dim): Desired linear velocity calculated from position error
Desired Angular Velocity Z (1 dim): Desired angular velocity calculated from orientation error
Task State (6 dimensions)
Position Error Vector (2 dim): XY plane error vector to target position (normalized)
Orientation Error (1 dim): Angle difference to target orientation (normalized to [-1, 1])
Distance (1 dim): Euclidean distance to target (normalized)
Arrival Flag (1 dim): Whether both position and orientation arrival conditions are satisfied (0 or 1)
Stop Ready Flag (1 dim): Whether stop criteria are met (arrived and angular velocity near zero)
Observation Details#
Index |
Observation Content |
Min |
Max |
Normalization Coefficient |
Unit |
|---|---|---|---|---|---|
0-2 |
Base Linear Velocity (vx, vy, vz) |
-inf |
inf |
normalization.lin_vel |
m/s |
3-5 |
Angular Velocity (ωx, ωy, ωz) |
-inf |
inf |
normalization.ang_vel |
rad/s |
6-8 |
Projected Gravity (gx, gy, gz) |
-1 |
1 |
1.0 |
Dimensionless |
9-20 |
Joint Angle Offsets (12 joints) |
-inf |
inf |
normalization.dof_pos |
rad |
21-32 |
Joint Angular Velocities (12 joints) |
-inf |
inf |
normalization.dof_vel |
rad/s |
33-44 |
Previous Action |
-1 |
1 |
1.0 |
Dimensionless |
45-47 |
Velocity Commands (vx_cmd, vy_cmd, ωz_cmd) |
-inf |
inf |
commands_scale |
m/s, rad/s |
48-49 |
Position Error Vector (Δx, Δy) |
-inf |
inf |
1/5.0 |
m |
50 |
Orientation Error |
-1 |
1 |
1/π |
rad |
51 |
Distance to Target |
0 |
1 |
1/5.0 (clipped) |
m |
52 |
Arrival Flag |
0 |
1 |
1.0 |
Boolean |
53 |
Stop Ready Flag |
0 |
1 |
1.0 |
Boolean |
Sensor Information#
The environment uses the following sensors to obtain state:
framelinvel (name: base_linvel): Base linear velocity sensor
gyro (name: base_gyro): Gyroscope sensor, mounted at IMU site
Reward Function#
The reward function uses a composite design, employing different reward strategies based on whether the robot has reached the target.
Rewards Before Reaching Target#
Total Reward = Velocity Tracking Reward + Approach Reward - Penalty Terms
Main Reward Terms:
Linear Velocity Tracking Reward (Weight: 1.5)
Formula:
1.5 × exp(-||v_xy - v_cmd||² / 0.25)Encourages robot to track desired XY plane linear velocity
Angular Velocity Tracking Reward (Weight: 0.3)
Formula:
0.3 × exp(-(ωz - ωz_cmd)² / 0.25)Encourages robot to track desired yaw angular velocity
Approach Reward
Formula:
clip((Historical Minimum Distance - Current Distance) × 4.0, -1.0, 1.0)Rewards robot for progress when getting closer to target
Penalty Terms:
Z-axis Linear Velocity Penalty (Weight: 2.0):
-2.0 × vz²XY-axis Angular Velocity Penalty (Weight: 0.05):
-0.05 × (ωx² + ωy²)Torque Penalty (Weight: 0.00001):
-0.00001 × ||τ||²Action Rate Penalty (Weight: 0.001):
-0.001 × ||Δa||²
Rewards After Reaching Target#
Total Reward = Stop Reward + First Arrival Reward - Penalty Terms
Main Reward Terms:
Stop Base Reward
Formula:
2 × [0.8 × exp(-(v_xy/0.2)²) + 1.2 × exp(-(ωz/0.1)⁴)]Encourages robot to maintain low velocity and angular velocity after arrival
Zero Angular Velocity Reward (Extra Reward: 6.0)
Condition: Arrived at target and |ωz| < 0.05 rad/s
Encourages robot to completely stop rotation
First Arrival Reward (One-time: 10.0)
Condition: First time both position and orientation arrival conditions are satisfied
Provides clear signal for reaching target
Penalty Terms: (Same as before arrival)
Termination Condition Penalties#
An additional penalty of -20.0 is applied in the following cases:
Joint velocity exceeds limit (exceeds
max_dof_velconfiguration value)Joint velocity is NaN or Inf
Robot base contacts ground
Robot rollover (tilt angle exceeds 75°)
Arrival Criteria#
Position Arrival: Distance to target < 0.3 meters
Orientation Arrival: Orientation error < 15°
Complete Arrival: Both position arrival and orientation arrival conditions satisfied
Stop Ready: Complete arrival AND |ωz| < 0.05 rad/s
info Return Content#
The info dictionary returned each step contains the following debug information:
pose_commands: Current target position and orientation [x, y, yaw]last_actions: Previous actioncurrent_actions: Current actionsteps: Current episode step countever_reached: Whether target has ever been reachedmin_distance: Historical minimum distance (used to calculate approach reward)
Initial State#
Robot Initialization#
Position Initialization:
The robot’s initial position in world coordinates is randomly sampled within the range defined by configuration parameter init_state.pos_randomization_range:
X coordinate: Uniform random sampling in [x_min, x_max]
Y coordinate: Uniform random sampling in [y_min, y_max]
Z coordinate: Fixed at 0.56 meters (to avoid falling sensation)
Orientation Initialization:
Robot orientation (quaternion): Initialized to unit quaternion [0, 0, 0, 1], indicating forward orientation
No random noise added to quaternion (ensures initial stability)
Joint Initialization:
Joint angles are set to default standing posture, defined by configuration parameter init_state.default_joint_angles. No random noise added to joint angles (ensures stable standing initially).
Velocity Initialization:
All linear and angular velocities are initialized to zero, ensuring robot starts from stationary state.
Target Generation#
Target Position:
Target position is generated relative to robot’s initial position:
Target Position = Robot Initial Position + Random Offset
Random offset is sampled within the range defined by configuration parameter commands.pose_command_range:
X direction offset: [pose_command_range[0], pose_command_range[3]]
Y direction offset: [pose_command_range[1], pose_command_range[4]]
Target Orientation:
Target orientation (yaw angle) is randomly generated in absolute reference frame:
Orientation angle: [pose_command_range[2], pose_command_range[5]]
Visualization Marker Initialization#
Target Marker (green arrow): Set to target position and orientation
Movement Direction Arrows: Initialized at 0.76 meters above robot
Episode Termination#
Termination Conditions#
The environment terminates an episode when any of the following conditions are met:
Timeout Termination
Condition: Episode reaches
max_episode_secondconfiguration valueDescription: Prevents infinite episodes
Joint Velocity Anomaly
Condition: Absolute value of any joint velocity exceeds
max_dof_velCondition: Joint velocity is NaN, Inf, or exceeds 1e6
Description: Prevents numerical divergence and physical instability
Base Contacts Ground
Condition: Robot base (geometries defined by
terminate_after_contacts_onconfiguration parameter) contacts groundDescription: Robot fell or pose failure
Rollover
Condition: Robot tilt angle exceeds 75°
Calculation: Tilt angle calculated via projected gravity vector
arctan2(||g_xy||, |g_z|)Description: Robot severely rolled over
Success Conditions#
Although the environment does not terminate upon success, task success is defined as:
Robot reaches target position and orientation (both position threshold < 0.3m and orientation threshold < 15° satisfied)
Robot maintains stable stop (linear velocity < 0.05 m/s, angular velocity < 0.05 rad/s)