ANYmal-C Navigation#

Overview#

This document describes in detail the navigation task environment based on the ANYmal-C quadruped robot. This environment is part of the navigation task collection in the MotrixLab project, providing a complete implementation for training quadruped robots to navigate to target positions and orientations using reinforcement learning.

Environment Description#

The ANYmal-C navigation task environment is built based on the real ANYmal-C quadruped robot, designed to train robots to navigate to specified target positions and orientations on flat terrain. This environment uses the MotrixSim physics engine for simulation, providing high-fidelity dynamic simulation.

Robot Structure#

ANYmal-C is a quadruped robot composed of the following main parts:

Base: The core torso of the robot, containing sensor modules such as IMU, camera, and lidar
Four Legs: Each leg contains three joints
- HAA (Hip Abduction/Adduction): Hip abduction/adduction joint
- HFE (Hip Flexion/Extension): Hip flexion/extension joint
- KFE (Knee Flexion/Extension): Knee flexion/extension
Four Feet: Spherical contact geometries that generate frictional contact with the ground

Task Objective#

The robot needs to complete the following navigation objectives:

Position Navigation: Move to the specified target position (XY plane coordinates)
Orientation Control: Adjust robot orientation to the target heading angle (yaw angle)
Stable Stop: Maintain stable standing after reaching the target, with linear and angular velocities approaching zero

The environment provides visualization markers:

Green Arrow: Indicates target position and orientation
Green Arrow Above Robot: Current actual movement direction
Blue Arrow Above Robot: Desired movement direction

Action Space#

The action space is Box(-1.0, 1.0, (12,), float32), representing position control commands applied to 12 joints (offsets relative to the default standing posture).

Control Mode#

The environment uses position control mode. Actions are converted to joint target positions as follows:

Target Joint Angle = Default Joint Angle + (Action Value × Action Scale)

Where the action scale is specified by the configuration parameter control_config.action_scale.

Action Dimension Details#

Index	Action Description	Control Range	Joint Name	Joint Type
0	Left Front Hip HAA	-1.0 ~ 1.0	LF_HAA	hinge
1	Left Front Hip HFE	-1.0 ~ 1.0	LF_HFE	hinge
2	Left Front Knee KFE	-1.0 ~ 1.0	LF_KFE	hinge
3	Right Front Hip HAA	-1.0 ~ 1.0	RF_HAA	hinge
4	Right Front Hip HFE	-1.0 ~ 1.0	RF_HFE	hinge
5	Right Front Knee KFE	-1.0 ~ 1.0	RF_KFE	hinge
6	Left Hind Hip HAA	-1.0 ~ 1.0	LH_HAA	hinge
7	Left Hind Hip HFE	-1.0 ~ 1.0	LH_HFE	hinge
8	Left Hind Knee KFE	-1.0 ~ 1.0	LH_KFE	hinge
9	Right Hind Hip HAA	-1.0 ~ 1.0	RH_HAA	hinge
10	Right Hind Hip HFE	-1.0 ~ 1.0	RH_HFE	hinge
11	Right Hind Knee KFE	-1.0 ~ 1.0	RH_KFE	hinge

PD Control Parameters#

The underlying system uses position actuators with PD control parameters defined in the XML file:

kp (Proportional Gain): 200
kv (Derivative Gain): 1
Torque Limit: -140 N·m ~ 140 N·m

Observation Space#

The observation space is Box(-inf, inf, (54,), float32), containing the robot’s proprioceptive information, task-related information, and action history.

Observation Components#

The observation vector consists of the following parts (in order):

Proprioceptive State (33 dimensions)
- Base Linear Velocity (3 dim): Linear velocity of robot base in world frame [vx, vy, vz]
- Angular Velocity (3 dim): Angular velocity from gyroscope [ωx, ωy, ωz]
- Projected Gravity (3 dim): Gravity vector projected in robot body frame
- Joint Angles (12 dim): 12 joint angle offsets relative to default standing posture
- Joint Velocities (12 dim): 12 joint angular velocities
Action History (12 dimensions)
- Action executed at previous timestep
Velocity Commands (3 dimensions)
- Desired Linear Velocity XY (2 dim): Desired linear velocity calculated from position error
- Desired Angular Velocity Z (1 dim): Desired angular velocity calculated from orientation error
Task State (6 dimensions)
- Position Error Vector (2 dim): XY plane error vector to target position (normalized)
- Orientation Error (1 dim): Angle difference to target orientation (normalized to [-1, 1])
- Distance (1 dim): Euclidean distance to target (normalized)
- Arrival Flag (1 dim): Whether both position and orientation arrival conditions are satisfied (0 or 1)
- Stop Ready Flag (1 dim): Whether stop criteria are met (arrived and angular velocity near zero)

Observation Details#

Index	Observation Content	Min	Max	Normalization Coefficient	Unit
0-2	Base Linear Velocity (vx, vy, vz)	-inf	inf	normalization.lin_vel	m/s
3-5	Angular Velocity (ωx, ωy, ωz)	-inf	inf	normalization.ang_vel	rad/s
6-8	Projected Gravity (gx, gy, gz)	-1	1	1.0	Dimensionless
9-20	Joint Angle Offsets (12 joints)	-inf	inf	normalization.dof_pos	rad
21-32	Joint Angular Velocities (12 joints)	-inf	inf	normalization.dof_vel	rad/s
33-44	Previous Action	-1	1	1.0	Dimensionless
45-47	Velocity Commands (vx_cmd, vy_cmd, ωz_cmd)	-inf	inf	commands_scale	m/s, rad/s
48-49	Position Error Vector (Δx, Δy)	-inf	inf	1/5.0	m
50	Orientation Error	-1	1	1/π	rad
51	Distance to Target	0	1	1/5.0 (clipped)	m
52	Arrival Flag	0	1	1.0	Boolean
53	Stop Ready Flag	0	1	1.0	Boolean

Sensor Information#

The environment uses the following sensors to obtain state:

framelinvel (name: base_linvel): Base linear velocity sensor
gyro (name: base_gyro): Gyroscope sensor, mounted at IMU site

Reward Function#

The reward function uses a composite design, employing different reward strategies based on whether the robot has reached the target.

Rewards Before Reaching Target#

Total Reward = Velocity Tracking Reward + Approach Reward - Penalty Terms

Main Reward Terms:

Linear Velocity Tracking Reward (Weight: 1.5)
- Formula: 1.5 × exp(-||v_xy - v_cmd||² / 0.25)
- Encourages robot to track desired XY plane linear velocity
Angular Velocity Tracking Reward (Weight: 0.3)
- Formula: 0.3 × exp(-(ωz - ωz_cmd)² / 0.25)
- Encourages robot to track desired yaw angular velocity
Approach Reward
- Formula: clip((Historical Minimum Distance - Current Distance) × 4.0, -1.0, 1.0)
- Rewards robot for progress when getting closer to target

Penalty Terms:

Z-axis Linear Velocity Penalty (Weight: 2.0): -2.0 × vz²
XY-axis Angular Velocity Penalty (Weight: 0.05): -0.05 × (ωx² + ωy²)
Torque Penalty (Weight: 0.00001): -0.00001 × ||τ||²
Action Rate Penalty (Weight: 0.001): -0.001 × ||Δa||²

Rewards After Reaching Target#

Total Reward = Stop Reward + First Arrival Reward - Penalty Terms

Main Reward Terms:

Stop Base Reward
- Formula: 2 × [0.8 × exp(-(v_xy/0.2)²) + 1.2 × exp(-(ωz/0.1)⁴)]
- Encourages robot to maintain low velocity and angular velocity after arrival
Zero Angular Velocity Reward (Extra Reward: 6.0)
- Condition: Arrived at target and |ωz| < 0.05 rad/s
- Encourages robot to completely stop rotation
First Arrival Reward (One-time: 10.0)
- Condition: First time both position and orientation arrival conditions are satisfied
- Provides clear signal for reaching target

Penalty Terms: (Same as before arrival)

Termination Condition Penalties#

An additional penalty of -20.0 is applied in the following cases:

Joint velocity exceeds limit (exceeds max_dof_vel configuration value)
Joint velocity is NaN or Inf
Robot base contacts ground
Robot rollover (tilt angle exceeds 75°)

Arrival Criteria#

Position Arrival: Distance to target < 0.3 meters
Orientation Arrival: Orientation error < 15°
Complete Arrival: Both position arrival and orientation arrival conditions satisfied
Stop Ready: Complete arrival AND |ωz| < 0.05 rad/s

info Return Content#

The info dictionary returned each step contains the following debug information:

pose_commands: Current target position and orientation [x, y, yaw]
last_actions: Previous action
current_actions: Current action
steps: Current episode step count
ever_reached: Whether target has ever been reached
min_distance: Historical minimum distance (used to calculate approach reward)

Initial State#

Robot Initialization#

Position Initialization:

The robot’s initial position in world coordinates is randomly sampled within the range defined by configuration parameter init_state.pos_randomization_range:

X coordinate: Uniform random sampling in [x_min, x_max]
Y coordinate: Uniform random sampling in [y_min, y_max]
Z coordinate: Fixed at 0.56 meters (to avoid falling sensation)

Orientation Initialization:

Robot orientation (quaternion): Initialized to unit quaternion [0, 0, 0, 1], indicating forward orientation
No random noise added to quaternion (ensures initial stability)

Joint Initialization:

Joint angles are set to default standing posture, defined by configuration parameter init_state.default_joint_angles. No random noise added to joint angles (ensures stable standing initially).

Velocity Initialization:

All linear and angular velocities are initialized to zero, ensuring robot starts from stationary state.

Target Generation#

Target Position:

Target position is generated relative to robot’s initial position:

Target Position = Robot Initial Position + Random Offset

Random offset is sampled within the range defined by configuration parameter commands.pose_command_range:

X direction offset: [pose_command_range[0], pose_command_range[3]]
Y direction offset: [pose_command_range[1], pose_command_range[4]]

Target Orientation:

Target orientation (yaw angle) is randomly generated in absolute reference frame:

Orientation angle: [pose_command_range[2], pose_command_range[5]]

Visualization Marker Initialization#

Target Marker (green arrow): Set to target position and orientation
Movement Direction Arrows: Initialized at 0.76 meters above robot

Episode Termination#

Termination Conditions#

The environment terminates an episode when any of the following conditions are met:

Timeout Termination
- Condition: Episode reaches max_episode_second configuration value
- Description: Prevents infinite episodes
Joint Velocity Anomaly
- Condition: Absolute value of any joint velocity exceeds max_dof_vel
- Condition: Joint velocity is NaN, Inf, or exceeds 1e6
- Description: Prevents numerical divergence and physical instability
Base Contacts Ground
- Condition: Robot base (geometries defined by terminate_after_contacts_on configuration parameter) contacts ground
- Description: Robot fell or pose failure
Rollover
- Condition: Robot tilt angle exceeds 75°
- Calculation: Tilt angle calculated via projected gravity vector arctan2(||g_xy||, |g_z|)
- Description: Robot severely rolled over

Success Conditions#

Although the environment does not terminate upon success, task success is defined as:

Robot reaches target position and orientation (both position threshold < 0.3m and orientation threshold < 15° satisfied)
Robot maintains stable stop (linear velocity < 0.05 m/s, angular velocity < 0.05 rad/s)