Franka Lift Cube#

Overview#

This document describes in detail the cube grasping task environment based on the Franka Emika Panda robotic arm.

Environment Description#

The Franka lift cube task environment is built based on the real Franka Emika Panda 7-DOF robotic arm, designed to train robots to grasp a cube on a table and lift it to a specified target position.

Robot Structure#

Franka Emika Panda is a 7-DOF robotic arm composed of the following main parts:

Base: Robot base fixed to the table
7 Joints:
- joint1 ~ joint4: Shoulder and arm rotation joints
- joint5 ~ joint7: Wrist rotation joints
Gripper: Two-finger gripper, containing two finger joints
- finger_joint1: Left finger joint
- finger_joint2: Right finger joint
End Effector (TCP): Center point of gripper, used for grasping operations

Task Objective#

The robot needs to complete the following operation objectives:

Approach Target: Move from initial position to cube position
Grasp Cube: Close gripper to grasp cube
Lift Cube: Lift cube to target height
Precise Positioning: Move cube to specified target position (XYZ 3D coordinates)

The environment provides visualization aids:

Cube: Red cube that can be grasped, initially at random position on table
Target Position: 3D position where the cube should finally reach

Action Space#

The action space is Box(-inf, inf, (8,), float32), representing position control commands applied to 8 joints (offsets relative to current joint positions).

Control Mode#

The environment uses position control mode. Actions are converted to joint target positions as follows:

Target Joint Angle = Current Joint Angle + Action Value

Action Dimension Details#

Index	Action Description	Control Range	Joint Name	Joint Type
0	Joint 1 Offset	-inf ~ inf	joint1	revolve
1	Joint 2 Offset	-inf ~ inf	joint2	hinge
2	Joint 3 Offset	-inf ~ inf	joint3	hinge
3	Joint 4 Offset	-inf ~ inf	joint4	hinge
4	Joint 5 Offset	-inf ~ inf	joint5	hinge
5	Joint 6 Offset	-inf ~ inf	joint6	hinge
6	Joint 7 Offset	-inf ~ inf	joint7	hinge
7	Gripper Action (Prob)	-inf ~ inf	finger_joint*	hinge

Gripper Control#

The gripper action uses probabilistic control:

Sigmoid Mapping: Map action value to probability in [0, 1] interval

p = 1 / (1 + exp(-action))

Bernoulli Sampling: Random sampling based on probability p

Sample result < p: Gripper closes (0.0)
Sample result >= p: Gripper opens (0.04)

Joint Position Limits#

All joint positions are clamped to the following ranges after execution:

Joint	Min	Max
1	-2.8973	2.8973
2	-1.7628	1.7628
3	-2.8973	2.8973
4	-3.0718	-0.0698
5	-2.8973	2.8973
6	-0.0175	3.7525
7	-π/2	π/2
Gripper	0	0.04

Observation Space#

The observation space is Box(-inf, inf, (36,), float32), containing the robot’s proprioceptive information, object state, and action history.

Observation Components#

The observation vector consists of the following parts (in order):

Joint Angles (9 dimensions)

7 robot arm joint angle offsets relative to default pose
2 gripper joint angles

Joint Velocities (9 dimensions)

Angular velocities of 9 joints

Cube Current Pose (9 dimensions)

Position (3 dim): [x, y, z]
Quaternion (4 dim): [qx, qy, qz, qw]
Rotation (Euler, 2 dim): [roll, pitch]

Target Position Command (7 dimensions)

Target XYZ coordinates (3 dim)
Target quaternion (4 dim)

Previous Action (8 dimensions)

Observation Details#

Index	Observation Content	Dimensions	Unit
0-8	Joint Angle Offsets (9 joints)	9	rad
9-17	Joint Angular Velocities (9 joints)	9	rad/s
18-26	Cube Current Pose (position + orientation)	9	rad
27-33	Target Position Command (position + quaternion)	7	Dimensionless
34-41	Previous Action (8 dimensions)	8	Dimensionless

Reward Function#

The reward function uses a composite design with multiple reward and penalty terms.

Main Reward Terms#

Approach Reward (Weight: 1.5)

Formula: 1.5 × (1 - tanh(d_hand_cube / 0.1))
Encourages robot end-effector to approach cube
d_hand_cube: Euclidean distance from end-effector to cube

Lifting Reward (Weight: 30)

Condition: Cube height > 0.04m AND end-effector to cube distance < 0.05m
Encourages robot to grasp and lift cube

Target Tracking Reward (Variable Weight)

Coarse Tracking (Weight: 10): Uses Sigmoid function, center distance 0.3m
Fine Tracking (Weight: 20): Uses tanh function, scale factor 0.4m
Approach Reward (Weight: 10): Used when distance < 0.2m, scale factor 0.05m
Approach Bonus (Weight: 200): Extra reward, encourages approaching target
All tracking rewards only active when cube height > 0.04m and grasp successful

Penalty Terms#

Penalty coefficients adjust with training progress:

Penalty Term	Early Weight (steps < 10000)	Late Weight (steps >= 10000)
Action Rate Penalty	1e-4	1e-1
Joint Velocity Squared Sum Penalty	1e-4	1e-1

Calculation Formulas#

Action Rate = ||current_action - last_action||²
Joint Velocity Squared Sum = ||joint_vel||²

Initial State#

Robot Initialization#

Position Initialization:

The robot’s initial position in world coordinates is fixed:

Base position: Fixed on table
Joint angles: Set to default pose with random noise added

Joint Angle Noise:

Each joint angle has uniform random noise added in range [-0.125, 0.125] radians.

Velocity Initialization:

All linear and angular velocities are initialized to zero.

Cube Initialization#

Cube position on table is randomly sampled:

X coordinate: [-0.1, 0.1]
Y coordinate: [-0.25, 0.25]
Z coordinate: Fixed at 0.05 (above table)

Target Position Generation#

Target position is randomly sampled in the following range:

X coordinate: [0.4, 0.6]
Y coordinate: [-0.25, 0.25]
Z coordinate: [0.25, 0.5]

Usage#

Training#

uv run scripts/train.py --env franka-lift-cube

Policy Evaluation#

uv run scripts/play.py --env franka-lift-cube

TensorBoard#

uv run tensorboard --logdir runs/franka-lift-cube