Shadow Hand Cube Repose#
Overview#
This document describes in detail the Shadow Hand dexterous manipulation cube reorientation task environment. This is a classic benchmark test in the field of robotic manipulation, requiring the robot to reorient a cube in-hand to match a randomly sampled target orientation.
Environment Description#
The Shadow Hand cube reorientation task is built based on the realistic Shadow Hand 24-DOF dexterous hand, designed to train robots to rotate a cube in-hand to a target pose through fine multi-finger coordination.
Robot Structure#
Shadow Hand is a high-fidelity 24-DOF dexterous hand composed of the following main parts:
Palm: The base structure of the hand, fixed to the base
5 Fingers:
Thumb: 5 degrees of freedom, including CMC rotation, MCP, IP joints
Index: 4 degrees of freedom, including MCP, PIP, DIP joints
Middle: 4 degrees of freedom, including MCP, PIP, DIP joints
Ring: 4 degrees of freedom, including MCP, PIP, DIP joints
Little: 4 degrees of freedom, including MCP, PIP, DIP joints
Actuator Configuration: 20 actuated joints (4 coupled joints driven by other joints)
Finger Joint Details#
Each finger (except thumb) contains:
MCP Joint: Metacarpophalangeal joint, 2 degrees of freedom (abduction/adduction + flexion)
PIP Joint: Proximal interphalangeal joint, 1 degree of freedom (flexion)
DIP Joint: Distal interphalangeal joint, 1 degree of freedom (flexion)
Thumb contains:
CMC Joint: Carpometacarpal joint, 2 degrees of freedom
MCP Joint: Metacarpophalangeal joint, 1 degree of freedom
IP Joint: Interphalangeal joint, 1 degree of freedom
Environment Objects#
Cube: 50mm * 50mm * 50mm cube
Mass: approximately 0.028 kg
Friction coefficient: 1.2
Initial position: above palm center
(0.33, 0.00, 0.295)meters
Target Visualization: Semi-transparent target pose indicator (mocap body)
Task Goals#
The robot needs to complete the following operation goals:
Maintain Grasp: Maintain stable grip of the cube in-hand
Perceive Goal: Observe target pose (visualization indicator)
Fine Manipulation: Rotate cube through multi-finger coordination
Pose Alignment: Rotate cube pose to target pose (tolerance �0.1 radian)
Action Space#
Action space is Box(-1, 1, (20,), float32), representing position control commands applied to 20 actuated joints (normalized).
Control Mode#
The environment uses position control mode, actions are converted to joint target positions through:
1. Scale from [-1, 1] to actuator control range
2. Optional: action smoothing (moving average filter)
3. Clip to joint limits
4. Apply to simulator actuators
Action Processing Flow#
# 1. Scale to actuator limits
targets = scale(actions, lower_limits, upper_limits)
# 2. Action smoothing (optional)
if act_moving_average < 1.0:
targets = α * targets + (1-α) * prev_actions
# 3. Clip to limits
targets = clip(targets, lower_limits, upper_limits)
# 4. Apply control
actuator_ctrls = targets
Action Dimension Details#
Index |
Finger |
Joint |
DOF |
Description |
|---|---|---|---|---|
0-4 |
Thumb |
J0-J4 |
5 |
CMC, MCP, IP |
5-8 |
Index |
J0-J3 |
4 |
MCP, PIP, DIP |
9-12 |
Middle |
J0-J3 |
4 |
MCP, PIP, DIP |
13-16 |
Ring |
J0-J3 |
4 |
MCP, PIP, DIP |
17-20 |
Little |
J0-J3 |
4 |
MCP, PIP, DIP |
Observation Space#
Observation space is Box(-inf, inf, (157,), float32), containing robot proprioceptive information, cube state, target state, and fingertip state.
Observation Components#
Observation vector consists of the following parts (in order):
1. Hand Joint State (48 dimensions)#
Joint Position (24 dims): Unscaled raw joint angles
Joint Velocity (24 dims): Joint angular velocities scaled by 0.2
2. Cube State (17 dimensions)#
Position (3 dims): Cube position in world coordinates
(x, y, z)Orientation (4 dims): Quaternion
(x, y, z, w)Linear Velocity (3 dims): Cube linear velocity
Angular Velocity (3 dims): Angular velocity scaled by 0.2
Normalization Factor: Velocity observations multiplied by
vel_obs_scale = 0.2
3. Goal State (11 dimensions)#
Goal Position (3 dims): Fixed at
(0.33, 0.00, 0.295)Goal Orientation (4 dims): Randomly sampled target quaternion
Relative Rotation (4 dims): Relative quaternion from cube to goal
4. Fingertip State (65 dimensions)#
State of 5 fingertips, 13 dimensions per fingertip:
Position (3 dims): Fingertip position in Cartesian space
Orientation (4 dims): Fingertip quaternion
Velocity (6 dims): Linear and angular velocities
Fingertip Link Names:
rh_ffdistal: Index fingertiprh_mfdistal: Middle fingertiprh_rfdistal: Ring fingertiprh_lfdistal: Little fingertiprh_thdistal: Thumb fingertip
5. Action History (20 dimensions)#
Previous action values, for temporal context in the policy
Observation Details#
Index |
Observation Content |
Dimension |
Range |
Unit |
|---|---|---|---|---|
0-23 |
Hand joint position (unscaled) |
24 |
Joint limits |
rad |
24-47 |
Hand joint velocity (0.2) |
24 |
±π/2 |
rad/s |
48-50 |
Cube position |
3 |
Real |
m |
51-54 |
Cube orientation (quat w,x,y,z) |
4 |
Unit norm |
- |
55-57 |
Cube linear velocity |
3 |
Real |
m/s |
58-60 |
Cube angular velocity (0.2) |
3 |
Real |
rad/s |
61-63 |
Goal position |
3 |
Fixed |
m |
64-67 |
Goal orientation (quat w,x,y,z) |
4 |
Unit norm |
- |
68-71 |
Relative rotation (quat w,x,y,z) |
4 |
Unit norm |
- |
72-136 |
Fingertip state (5*13) |
65 |
- |
- |
137-156 |
Previous action |
20 |
[-1, 1] |
- |
Reward Function#
The reward function uses a composite design with multiple reward and penalty terms.
Main Reward Terms#
Rotation Alignment Reward (core objective)
rot_reward = rot_reward_scale / (|rot_dist| + rot_eps)
Scaling factor:
1.0Epsilon:
0.1Rotation distance calculation: Using quaternion rotation distance formula
Incentive: Reward grows inversely as cube orientation approaches target
Position Distance Penalty
dist_reward = dist_reward_scale � goal_dist
Scaling factor:
-10.0Distance calculation: Euclidean distance from cube to goal position
Incentive: Prevent cube from dropping, keep near target position
Action Regularization Penalty
action_penalty = action_penalty_scale � ||actions||�
Scaling factor:
-0.0002Purpose: Encourage smooth, energy-efficient motion
Conditional Rewards#
Success Reward
if |rot_dist| d success_tolerance: reward += reach_goal_bonus
Reward value:
2.0Tolerance:
0.1radian (approximately 5.7�)Purpose: Sparse reward for achieving goal alignment
Drop Penalty
if goal_dist e fall_dist: reward += fall_penalty terminated = True
Penalty value:
0.0(termination only, no additional penalty)Distance threshold:
0.24metersPurpose: Terminate episode when cube is dropped
Initial State#
Hand Initialization#
Position Initialization:
The palm is fixed in the world coordinate system, position determined by the model file.
Joint Angle Initialization:
Use model default joint positions
Add uniform random noise:
[-0.2, 0.2]radiansRange: All 24 hand degrees of freedom
Velocity Initialization:
All joint velocities are initialized to zero.
Cube Initialization#
Position Initialization:
Fixed position:
(0.33, 0.00, 0.295)above palm centerAdd uniform random noise:
[-0.01, 0.01](�1cm)
Orientation Initialization:
Use Shoemake method to generate uniformly distributed random quaternions
Ensure uniform sampling on SO(3) space
Velocity Initialization:
All linear and angular velocities are initialized to zero.
Goal Initialization#
Position Initialization:
Fixed position:
(0.33, 0.00, 0.295)(same as cube initial position)
Orientation Initialization:
Use Shoemake method to generate uniformly distributed random target quaternions
Resampled each reset
Termination Conditions#
Episodes terminate under the following conditions:
Drop Termination: Cube distance from goal position e
fall_dist(0.24m)Timeout Termination: Reaching
max_episode_steps(default 1000 steps)NaN Protection: Detecting rotation distance or position distance as NaN
Success Holding Mechanism#
The environment uses a consecutive success counter:
When rotation tolerance is satisfied, counter increments
When
max_consecutive_successes(50) is reached, trigger success termination and reset goalRotation tolerance:
0.1radian
Usage#
Training#
uv run scripts/train.py --env shadow-hand-repose
Policy Evaluation#
uv run scripts/play.py --env shadow-hand-repose
Environment Visualization#
uv run scripts/view.py --env shadow-hand-repose
TensorBoard#
uv run tensorboard --logdir runs/shadow-hand-repose
Configuration Parameters#
Environment Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
10.0 |
Maximum episode length (s) |
|
0.01 |
Control timestep (s) |
|
1000 |
Maximum episode steps |
|
24 |
Total hand DOFs |
|
20 |
Number of actuated joints |
Reward Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
-10.0 |
Position distance reward |
|
1.0 |
Rotation alignment reward |
|
0.1 |
Rotation reward epsilon |
|
-0.0002 |
Action regularization |
|
0.1 |
Success tolerance (rad) |
|
2.0 |
Success reward |
|
0.24 |
Drop distance threshold |
|
0.0 |
Drop penalty |
Reset Noise Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
0.01 |
Cube position noise (m) |
|
0.2 |
Joint position noise (rad) |
|
0.0 |
Joint velocity noise (rad/s) |
Observation Scaling Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
0.2 |
Velocity observation scale |
References#
This environment is based on the following classic works:
OpenAI Dactyl (2018): First successful in-hand manipulation sim-to-real transfer
Isaac Gym (2021): High-performance GPU-accelerated physics simulation
Isaac Lab (2023): Modular robot learning framework