Shadow Hand Cube Repose#

Overview#

This document describes in detail the Shadow Hand dexterous manipulation cube reorientation task environment. This is a classic benchmark test in the field of robotic manipulation, requiring the robot to reorient a cube in-hand to match a randomly sampled target orientation.

Environment Description#

The Shadow Hand cube reorientation task is built based on the realistic Shadow Hand 24-DOF dexterous hand, designed to train robots to rotate a cube in-hand to a target pose through fine multi-finger coordination.

Robot Structure#

Shadow Hand is a high-fidelity 24-DOF dexterous hand composed of the following main parts:

Palm: The base structure of the hand, fixed to the base
5 Fingers:
- Thumb: 5 degrees of freedom, including CMC rotation, MCP, IP joints
- Index: 4 degrees of freedom, including MCP, PIP, DIP joints
- Middle: 4 degrees of freedom, including MCP, PIP, DIP joints
- Ring: 4 degrees of freedom, including MCP, PIP, DIP joints
- Little: 4 degrees of freedom, including MCP, PIP, DIP joints
Actuator Configuration: 20 actuated joints (4 coupled joints driven by other joints)

Finger Joint Details#

Each finger (except thumb) contains:

MCP Joint: Metacarpophalangeal joint, 2 degrees of freedom (abduction/adduction + flexion)
PIP Joint: Proximal interphalangeal joint, 1 degree of freedom (flexion)
DIP Joint: Distal interphalangeal joint, 1 degree of freedom (flexion)

Thumb contains:

CMC Joint: Carpometacarpal joint, 2 degrees of freedom
MCP Joint: Metacarpophalangeal joint, 1 degree of freedom
IP Joint: Interphalangeal joint, 1 degree of freedom

Environment Objects#

Cube: 50mm * 50mm * 50mm cube
- Mass: approximately 0.028 kg
- Friction coefficient: 1.2
- Initial position: above palm center (0.33, 0.00, 0.295) meters
Target Visualization: Semi-transparent target pose indicator (mocap body)

Task Goals#

The robot needs to complete the following operation goals:

Maintain Grasp: Maintain stable grip of the cube in-hand
Perceive Goal: Observe target pose (visualization indicator)
Fine Manipulation: Rotate cube through multi-finger coordination
Pose Alignment: Rotate cube pose to target pose (tolerance �0.1 radian)

Action Space#

Action space is Box(-1, 1, (20,), float32), representing position control commands applied to 20 actuated joints (normalized).

Control Mode#

The environment uses position control mode, actions are converted to joint target positions through:

Scale from [-1, 1] to actuator control range
Optional: action smoothing (moving average filter)
Clip to joint limits
Apply to simulator actuators

Action Processing Flow#

# 1. Scale to actuator limits
targets = scale(actions, lower_limits, upper_limits)

# 2. Action smoothing (optional)
if act_moving_average < 1.0:
    targets = α * targets + (1-α) * prev_actions

# 3. Clip to limits
targets = clip(targets, lower_limits, upper_limits)

# 4. Apply control
actuator_ctrls = targets

Action Dimension Details#

Index	Finger	Joint	DOF	Description
0-4	Thumb	J0-J4	5	CMC, MCP, IP
5-8	Index	J0-J3	4	MCP, PIP, DIP
9-12	Middle	J0-J3	4	MCP, PIP, DIP
13-16	Ring	J0-J3	4	MCP, PIP, DIP
17-20	Little	J0-J3	4	MCP, PIP, DIP

Observation Space#

Observation space is Box(-inf, inf, (157,), float32), containing robot proprioceptive information, cube state, target state, and fingertip state.

Observation Components#

Observation vector consists of the following parts (in order):

1. Hand Joint State (48 dimensions)#

Joint Position (24 dims): Unscaled raw joint angles
Joint Velocity (24 dims): Joint angular velocities scaled by 0.2

2. Cube State (17 dimensions)#

Position (3 dims): Cube position in world coordinates (x, y, z)
Orientation (4 dims): Quaternion (x, y, z, w)
Linear Velocity (3 dims): Cube linear velocity
Angular Velocity (3 dims): Angular velocity scaled by 0.2
Normalization Factor: Velocity observations multiplied by vel_obs_scale = 0.2

3. Goal State (11 dimensions)#

Goal Position (3 dims): Fixed at (0.33, 0.00, 0.295)
Goal Orientation (4 dims): Randomly sampled target quaternion
Relative Rotation (4 dims): Relative quaternion from cube to goal

4. Fingertip State (65 dimensions)#

State of 5 fingertips, 13 dimensions per fingertip:

Position (3 dims): Fingertip position in Cartesian space
Orientation (4 dims): Fingertip quaternion
Velocity (6 dims): Linear and angular velocities

Fingertip Link Names:

rh_ffdistal: Index fingertip
rh_mfdistal: Middle fingertip
rh_rfdistal: Ring fingertip
rh_lfdistal: Little fingertip
rh_thdistal: Thumb fingertip

5. Action History (20 dimensions)#

Previous action values, for temporal context in the policy

Observation Details#

Index	Observation Content	Dimension	Range	Unit
0-23	Hand joint position (unscaled)	24	Joint limits	rad
24-47	Hand joint velocity (0.2)	24	±π/2	rad/s
48-50	Cube position	3	Real	m
51-54	Cube orientation (quat w,x,y,z)	4	Unit norm	-
55-57	Cube linear velocity	3	Real	m/s
58-60	Cube angular velocity (0.2)	3	Real	rad/s
61-63	Goal position	3	Fixed	m
64-67	Goal orientation (quat w,x,y,z)	4	Unit norm	-
68-71	Relative rotation (quat w,x,y,z)	4	Unit norm	-
72-136	Fingertip state (5*13)	65	-	-
137-156	Previous action	20	[-1, 1]	-

Reward Function#

The reward function uses a composite design with multiple reward and penalty terms.

Main Reward Terms#

Rotation Alignment Reward (core objective)
```
rot_reward = rot_reward_scale / (|rot_dist| + rot_eps)
```
- Scaling factor: 1.0
- Epsilon: 0.1
- Rotation distance calculation: Using quaternion rotation distance formula
- Incentive: Reward grows inversely as cube orientation approaches target
Position Distance Penalty
```
dist_reward = dist_reward_scale � goal_dist
```
- Scaling factor: -10.0
- Distance calculation: Euclidean distance from cube to goal position
- Incentive: Prevent cube from dropping, keep near target position
Action Regularization Penalty
```
action_penalty = action_penalty_scale � ||actions||�
```
- Scaling factor: -0.0002
- Purpose: Encourage smooth, energy-efficient motion

Conditional Rewards#

Success Reward
```
if |rot_dist| d success_tolerance:
    reward += reach_goal_bonus
```
- Reward value: 2.0
- Tolerance: 0.1 radian (approximately 5.7�)
- Purpose: Sparse reward for achieving goal alignment
Drop Penalty
```
if goal_dist e fall_dist:
    reward += fall_penalty
    terminated = True
```
- Penalty value: 0.0 (termination only, no additional penalty)
- Distance threshold: 0.24 meters
- Purpose: Terminate episode when cube is dropped

Initial State#

Hand Initialization#

Position Initialization:

The palm is fixed in the world coordinate system, position determined by the model file.

Joint Angle Initialization:

Use model default joint positions
Add uniform random noise: [-0.2, 0.2] radians
Range: All 24 hand degrees of freedom

Velocity Initialization:

All joint velocities are initialized to zero.

Cube Initialization#

Position Initialization:

Fixed position: (0.33, 0.00, 0.295) above palm center
Add uniform random noise: [-0.01, 0.01] (�1cm)

Orientation Initialization:

Use Shoemake method to generate uniformly distributed random quaternions
Ensure uniform sampling on SO(3) space

Velocity Initialization:

All linear and angular velocities are initialized to zero.

Goal Initialization#

Position Initialization:

Fixed position: (0.33, 0.00, 0.295) (same as cube initial position)

Orientation Initialization:

Use Shoemake method to generate uniformly distributed random target quaternions
Resampled each reset

Termination Conditions#

Episodes terminate under the following conditions:

Drop Termination: Cube distance from goal position e fall_dist (0.24m)
Timeout Termination: Reaching max_episode_steps (default 1000 steps)
NaN Protection: Detecting rotation distance or position distance as NaN

Success Holding Mechanism#

The environment uses a consecutive success counter:

When rotation tolerance is satisfied, counter increments
When max_consecutive_successes (50) is reached, trigger success termination and reset goal
Rotation tolerance: 0.1 radian

Usage#

Training#

uv run scripts/train.py --env shadow-hand-repose

Policy Evaluation#

uv run scripts/play.py --env shadow-hand-repose

Environment Visualization#

uv run scripts/view.py --env shadow-hand-repose

TensorBoard#

uv run tensorboard --logdir runs/shadow-hand-repose

Configuration Parameters#

Environment Parameters#

Parameter	Default	Description
`max_episode_seconds`	10.0	Maximum episode length (s)
`ctrl_dt`	0.01	Control timestep (s)
`max_episode_steps`	1000	Maximum episode steps
`num_hand_dofs`	24	Total hand DOFs
`num_actuators`	20	Number of actuated joints

Reward Parameters#

Parameter	Default	Description
`dist_reward_scale`	-10.0	Position distance reward
`rot_reward_scale`	1.0	Rotation alignment reward
`rot_eps`	0.1	Rotation reward epsilon
`action_penalty_scale`	-0.0002	Action regularization
`success_tolerance`	0.1	Success tolerance (rad)
`reach_goal_bonus`	2.0	Success reward
`fall_dist`	0.24	Drop distance threshold
`fall_penalty`	0.0	Drop penalty

Reset Noise Parameters#

Parameter	Default	Description
`reset_position_noise`	0.01	Cube position noise (m)
`reset_dof_pos_noise`	0.2	Joint position noise (rad)
`reset_dof_vel_noise`	0.0	Joint velocity noise (rad/s)

Observation Scaling Parameters#

Parameter	Default	Description
`vel_obs_scale`	0.2	Velocity observation scale

References#

This environment is based on the following classic works:

OpenAI Dactyl (2018): First successful in-hand manipulation sim-to-real transfer
Isaac Gym (2021): High-performance GPU-accelerated physics simulation
Isaac Lab (2023): Modular robot learning framework