Rviz Simulation grasp attempt
Examples of grasping attempts
Examples of grasping attempts
Examples of grasping attempts
Examples of grasping attempts
Examples of grasping attempts
Examples of grasping attempts
Examples of grasping attempts

Project information

Abstract

A Foundational ROS2 Grasping Pipeline for Modular, Vision-Based Manipulation

GAM is a modular grasping pipeline designed to combine modularity, robustness, flexibility, and extensibility within the ROS2 ecosystem. Developed during my Master's thesis at the UniPD in collaboration with PAL Robotics, the project integrates independent modules for Perception, 3D reconstruction, and Motion Planning, enabling each component to be replaced or upgraded without disrupting the overall system; a feature introduced by the BehaviorTree library. The aim was not to build the single most performant grasping solution, but to deliver a foundational, interpretable, and deployable architecture for vision-based manipulation — one that can adapt to evolving technologies and operate in realistic, unstructured environments.

Overall Flow of the BT
Behavior Tree overall high-level structure.

Perception & Object Segmentation

The Perception module leverages zero-shot segmentation models — integrating GroundingDINO with the Segment Anything Model (SAM) — to detect objects from natural language prompts such as “blue mug” or “water bottle on the right”. This enables operation on both simulated and real-world data, handling variations in shape, size, and texture without the need for dataset-specific retraining.

Perception Behavior Tree Node logic.
Perception Behavior Tree Node logic.

3D Reconstruction

After segmentation, the point cloud is processed to generate a clean, geometrically accurate 3D reconstruction of the target object. Filtering, noise reduction, and surface refinement ensure the output is optimized for the grasp detection stage, even under challenging conditions like partial occlusion or sensor inaccuracies.

3D Reconstruction of a mug 3D Reconstruction of a mug
Example of a 3D Reconstruction output. Fig. on the left shows granularity = 0.05; Fig. on the right granularity = 0.1.

Grasping Pose Detection

The Grasp Pose Detection (GPD) module was customized to address practical constraints of tabletop manipulation. Initially, grasps were generated from all directions — including from under the table — leading to infeasible plans. By augmenting the input cloud with a portion of the table surface and tuning the approach vector, grasps are now biased toward top/front approaches, reducing collision risks and improving execution success.

Food Image after the meal Food Image after the meal Food Image before the meal
Example of a Grasp Pose Detection output. Fig. on the left shows n = 5; Fig. in the middle shows n = 15 Fig. on the right n = 30; where n represents the number of grasping poses to detect.

Motion Planning & Execution

Integration with MoveIt Task Constructor (MTC) enables the pipeline to build transparent, modular grasping tasks. MTC decomposes the process into logical stages (pre-grasp, approach, grasp, retreat), each validated independently, allowing for fallback strategies in case of failure. This structure greatly simplifies debugging and increases execution reliability.

Attempt of TIAGo to grasp a cylinder
Attempt of TIAGo to grasp a cylinder in simulation (Shown in Rviz).

Testing & Evaluation

The pipeline was validated on PAL Robotics' Tiago robots, first in simulation (Gazebo + RViz) and later in real-world experiments. Test objects ranged from simple cylinders and bottles to irregular shapes such as pears and joysticks. Despite the modular design being in its early stages, GAM demonstrated strong generalization to unseen objects and a reliable workflow from perception to execution.

Food Image after the meal Food Image before the meal
Some quantitative results. Fig. on the left shows the results in Simulation, while the Fig. on the right results from the real world.

Demo

Food Image after the meal
GIF Showing the GAM pipeline in action in simulation. The robot detects drill, pear, a mug, and a shampoo bottle.

Since GAM is a huge project and I cannot explain all the details, check my Thesis to see the full project documentation.