Franka Open Cabinet#

Overview#

This document describes in detail the cabinet opening task environment based on the Franka Emika Panda robotic arm.

Environment Description#

The Franka open cabinet task environment is built based on the real Franka Emika Panda 7-DOF robotic arm, designed to train robots to approach cabinet door handles, grasp them, and pull open drawers.

Robot Structure#

Franka Emika Panda is a 7-DOF robotic arm composed of the following main parts:

Base: Robot base fixed to the ground
7 Joints:
- joint1 ~ joint4: Shoulder and arm rotation joints
- joint5 ~ joint7: Wrist rotation joints
Gripper: Two-finger gripper, containing two finger joints
- finger_joint1: Left finger joint, with contact pad (left_finger_pad)
- finger_joint2: Right finger joint, with contact pad (right_finger_pad)
End Effector (TCP): Center point of gripper, used for grasping operations

Environment Objects#

Cabinet: Contains one openable drawer
Drawer Handle (drawer_top_handle): Target part the robot needs to grasp
Drawer Joint (drawer_top_joint): Sliding joint of drawer, 1 DOF

Task Objective#

The robot needs to complete the following operation objectives:

Approach Handle: Move from initial position to drawer handle position
Pose Alignment: Adjust end-effector pose to align with handle
Grasp Handle: Close gripper to grasp drawer handle
Open Drawer: Pull backward to open drawer

Action Space#

The action space is Box(-inf, inf, (8,), float32), representing position control commands applied to 8 joints (offsets relative to current joint positions).

Control Mode#

The environment uses position control mode. Actions are converted to joint target positions as follows:

Target Joint Angle = Current Joint Angle + Action Value

Action Dimension Details#

Index	Action Description	Control Range	Joint Name	Joint Type
0	Joint 1 Offset	-inf ~ inf	joint1	revolve
1	Joint 2 Offset	-inf ~ inf	joint2	hinge
2	Joint 3 Offset	-inf ~ inf	joint3	hinge
3	Joint 4 Offset	-inf ~ inf	joint4	hinge
4	Joint 5 Offset	-inf ~ inf	joint5	hinge
5	Joint 6 Offset	-inf ~ inf	joint6	hinge
6	Joint 7 Offset	-inf ~ inf	joint7	hinge
7	Gripper Action (Prob)	-inf ~ inf	finger_joint*	hinge

Gripper Control#

The gripper action uses probabilistic control:

Sigmoid Mapping: Map action value to probability in [0, 1] interval
```
p = 1 / (1 + exp(-action))
```
Bernoulli Sampling: Random sampling based on probability p
- Sample result < p: Gripper closes (0.0)
- Sample result >= p: Gripper opens (0.04)

Observation Space#

The observation space is Box(-5, 5, (25,), float32), containing the robot’s proprioceptive information, task-related information, and drawer state.

Observation Components#

The observation vector consists of the following parts (in order):

Joint Angles (8 dimensions)
- 7 robot arm joint angles (normalized to [-1, 1])
- Normalization formula: 2 × (Joint Angle - Lower Bound) / (Upper Bound - Lower Bound) - 1
Joint Velocities (8 dimensions)
- Angular velocities of 8 joints (divided by 2 for scaling)
Target Relative Pose (7 dimensions)
- Position Offset (3 dim): Handle position - End-effector position [Δx, Δy, Δz]
- Orientation Offset (4 dim): Handle orientation - End-effector orientation (quaternion)
Drawer Joint Position (1 dimension)
- Current open distance of drawer
Drawer Joint Velocity (1 dimension)
- Current opening velocity of drawer

Observation Details#

Index	Observation Content	Dimensions	Range	Unit
0-7	Normalized Joint Angles (8 joints)	8	[-1, 1]	Dimensionless
8-15	Normalized Joint Velocities (8 joints)	8	≈[-π/2, π/2]	rad/s
16-18	Relative Position to Handle	3	[-5, 5]	m
19-22	Relative Orientation to Handle (quaternion)	4	[-5, 5]	Dimensionless
23	Drawer Joint Position	1	[-5, 5]	m
24	Drawer Joint Velocity	1	[-5, 5]	m/s

All observation values are clipped to [-5, 5] range for numerical stability.

Reward Function#

The reward function uses a composite design with multiple reward and penalty terms.

Main Reward Terms#

Distance Reward (Weight: 10)
- Formula: 10 × (1 - tanh(d_gripper_handle / 0.1))
- Encourages robot end-effector to approach drawer handle
- d_gripper_handle: Euclidean distance from end-effector to handle
Orientation Matching Reward
- Formula: Quaternion similarity function
- Encourages robot end-effector orientation to align with handle orientation
Gripper Close Reward (Conditional Reward)
- When distance < 0.025m: Closing gripper receives +100 reward
- When distance >= 0.025m: Closing gripper receives -20 penalty
- Opening gripper: No reward (0)
- Encourages robot to close gripper to grasp when approaching
Open Drawer Reward (Exponential Reward)
- Formula: 20 × (exp(open_dist) - 1)
- open_dist: Drawer open distance (clipped to [0, 1] range)
- Reward grows exponentially as drawer opens more
Prevent Illegal Opening
- When drawer is already open (open_dist > 0) but end-effector not contacting handle (distance > 0.03m), cancel open reward
- Prevents robot from using other methods to force open drawer

Penalty Terms#

Action Rate Penalty
- Formula: ||current_action - last_action||²
Joint Velocity Penalty
- Formula: ||joint_vel||²
Finger Position Penetration Penalty
- Applied when finger contact pads are below handle surface
- Prevents finger model from penetrating drawer

Penalty Coefficient Scheduling#

Penalty coefficients adjust with training progress:

Penalty Term	Early Weight (steps < 8000)	Late Weight (steps >= 8000)
Action Rate	1e-3	2e-3
Joint Velocity Squared	0	2e-7

Termination Penalty#

When termination condition is triggered, additional -10.0 penalty is applied.

Initial State#

Robot Initialization#

Position Initialization:

The robot’s initial position in world coordinates is fixed:

Base position: Fixed on ground
Joint angles: Set to default pose

Default Joint Pose:

[0.0, -30°, 0°, -156°, 0.0, 186°, -45°, 0.04, 0.04] (radians)

Joint Angle Noise:

Each joint angle has uniform random noise added in range [-0.125, 0.125] radians.

Velocity Initialization:

All linear and angular velocities are initialized to zero.

Cabinet Initialization#

Cabinet is fixed on ground with drawer in closed state (joint position at 0).

Usage#

Training#

uv run scripts/train.py --env franka-open-cabinet

Policy Evaluation#

uv run scripts/play.py --env franka-open-cabinet

TensorBoard#

uv run tensorboard --logdir runs/franka-open-cabinet