Franka Open Cabinet#
Overview#
This document describes in detail the cabinet opening task environment based on the Franka Emika Panda robotic arm.
Environment Description#
The Franka open cabinet task environment is built based on the real Franka Emika Panda 7-DOF robotic arm, designed to train robots to approach cabinet door handles, grasp them, and pull open drawers.
Robot Structure#
Franka Emika Panda is a 7-DOF robotic arm composed of the following main parts:
Base: Robot base fixed to the ground
7 Joints:
joint1 ~ joint4: Shoulder and arm rotation joints
joint5 ~ joint7: Wrist rotation joints
Gripper: Two-finger gripper, containing two finger joints
finger_joint1: Left finger joint, with contact pad (left_finger_pad)
finger_joint2: Right finger joint, with contact pad (right_finger_pad)
End Effector (TCP): Center point of gripper, used for grasping operations
Environment Objects#
Cabinet: Contains one openable drawer
Drawer Handle (drawer_top_handle): Target part the robot needs to grasp
Drawer Joint (drawer_top_joint): Sliding joint of drawer, 1 DOF
Task Objective#
The robot needs to complete the following operation objectives:
Approach Handle: Move from initial position to drawer handle position
Pose Alignment: Adjust end-effector pose to align with handle
Grasp Handle: Close gripper to grasp drawer handle
Open Drawer: Pull backward to open drawer
Action Space#
The action space is Box(-inf, inf, (8,), float32), representing position control commands applied to 8 joints (offsets relative to current joint positions).
Control Mode#
The environment uses position control mode. Actions are converted to joint target positions as follows:
Target Joint Angle = Current Joint Angle + Action Value
Action Dimension Details#
Index |
Action Description |
Control Range |
Joint Name |
Joint Type |
|---|---|---|---|---|
0 |
Joint 1 Offset |
-inf ~ inf |
joint1 |
revolve |
1 |
Joint 2 Offset |
-inf ~ inf |
joint2 |
hinge |
2 |
Joint 3 Offset |
-inf ~ inf |
joint3 |
hinge |
3 |
Joint 4 Offset |
-inf ~ inf |
joint4 |
hinge |
4 |
Joint 5 Offset |
-inf ~ inf |
joint5 |
hinge |
5 |
Joint 6 Offset |
-inf ~ inf |
joint6 |
hinge |
6 |
Joint 7 Offset |
-inf ~ inf |
joint7 |
hinge |
7 |
Gripper Action (Prob) |
-inf ~ inf |
finger_joint* |
hinge |
Gripper Control#
The gripper action uses probabilistic control:
Sigmoid Mapping: Map action value to probability in [0, 1] interval
p = 1 / (1 + exp(-action))
Bernoulli Sampling: Random sampling based on probability p
Sample result < p: Gripper closes (0.0)
Sample result >= p: Gripper opens (0.04)
Observation Space#
The observation space is Box(-5, 5, (25,), float32), containing the robot’s proprioceptive information, task-related information, and drawer state.
Observation Components#
The observation vector consists of the following parts (in order):
Joint Angles (8 dimensions)
7 robot arm joint angles (normalized to [-1, 1])
Normalization formula:
2 × (Joint Angle - Lower Bound) / (Upper Bound - Lower Bound) - 1
Joint Velocities (8 dimensions)
Angular velocities of 8 joints (divided by 2 for scaling)
Target Relative Pose (7 dimensions)
Position Offset (3 dim): Handle position - End-effector position [Δx, Δy, Δz]
Orientation Offset (4 dim): Handle orientation - End-effector orientation (quaternion)
Drawer Joint Position (1 dimension)
Current open distance of drawer
Drawer Joint Velocity (1 dimension)
Current opening velocity of drawer
Observation Details#
Index |
Observation Content |
Dimensions |
Range |
Unit |
|---|---|---|---|---|
0-7 |
Normalized Joint Angles (8 joints) |
8 |
[-1, 1] |
Dimensionless |
8-15 |
Normalized Joint Velocities (8 joints) |
8 |
≈[-π/2, π/2] |
rad/s |
16-18 |
Relative Position to Handle |
3 |
[-5, 5] |
m |
19-22 |
Relative Orientation to Handle (quaternion) |
4 |
[-5, 5] |
Dimensionless |
23 |
Drawer Joint Position |
1 |
[-5, 5] |
m |
24 |
Drawer Joint Velocity |
1 |
[-5, 5] |
m/s |
All observation values are clipped to [-5, 5] range for numerical stability.
Reward Function#
The reward function uses a composite design with multiple reward and penalty terms.
Main Reward Terms#
Distance Reward (Weight: 10)
Formula:
10 × (1 - tanh(d_gripper_handle / 0.1))Encourages robot end-effector to approach drawer handle
d_gripper_handle: Euclidean distance from end-effector to handle
Orientation Matching Reward
Formula: Quaternion similarity function
Encourages robot end-effector orientation to align with handle orientation
Gripper Close Reward (Conditional Reward)
When distance < 0.025m: Closing gripper receives +100 reward
When distance >= 0.025m: Closing gripper receives -20 penalty
Opening gripper: No reward (0)
Encourages robot to close gripper to grasp when approaching
Open Drawer Reward (Exponential Reward)
Formula:
20 × (exp(open_dist) - 1)open_dist: Drawer open distance (clipped to [0, 1] range)
Reward grows exponentially as drawer opens more
Prevent Illegal Opening
When drawer is already open (open_dist > 0) but end-effector not contacting handle (distance > 0.03m), cancel open reward
Prevents robot from using other methods to force open drawer
Penalty Terms#
Action Rate Penalty
Formula:
||current_action - last_action||²
Joint Velocity Penalty
Formula:
||joint_vel||²
Finger Position Penetration Penalty
Applied when finger contact pads are below handle surface
Prevents finger model from penetrating drawer
Penalty Coefficient Scheduling#
Penalty coefficients adjust with training progress:
Penalty Term |
Early Weight (steps < 8000) |
Late Weight (steps >= 8000) |
|---|---|---|
Action Rate |
1e-3 |
2e-3 |
Joint Velocity Squared |
0 |
2e-7 |
Termination Penalty#
When termination condition is triggered, additional -10.0 penalty is applied.
Initial State#
Robot Initialization#
Position Initialization:
The robot’s initial position in world coordinates is fixed:
Base position: Fixed on ground
Joint angles: Set to default pose
Default Joint Pose:
[0.0, -30°, 0°, -156°, 0.0, 186°, -45°, 0.04, 0.04] (radians)
Joint Angle Noise:
Each joint angle has uniform random noise added in range [-0.125, 0.125] radians.
Velocity Initialization:
All linear and angular velocities are initialized to zero.
Cabinet Initialization#
Cabinet is fixed on ground with drawer in closed state (joint position at 0).
Usage#
Training#
uv run scripts/train.py --env franka-open-cabinet
Policy Evaluation#
uv run scripts/play.py --env franka-open-cabinet
TensorBoard#
uv run tensorboard --logdir runs/franka-open-cabinet