Project information
- Category: Master Thesis Project
- Project date: Apr 2025
- Client: PAL Robotics
- GitHub: cannot share..
- Full Thesis: GAM
- Main tools: ROS2 - Behavior Tree
Abstract
A Foundational ROS2 Grasping Pipeline for Modular, Vision-Based Manipulation
GAM is a modular grasping pipeline designed to combine modularity, robustness, flexibility, and extensibility within the ROS2 ecosystem. Developed during my Master's thesis at the UniPD in collaboration with PAL Robotics, the project integrates independent modules for Perception, 3D reconstruction, and Motion Planning, enabling each component to be replaced or upgraded without disrupting the overall system; a feature introduced by the BehaviorTree library. The aim was not to build the single most performant grasping solution, but to deliver a foundational, interpretable, and deployable architecture for vision-based manipulation — one that can adapt to evolving technologies and operate in realistic, unstructured environments.

Perception & Object Segmentation
The Perception module leverages zero-shot segmentation models — integrating GroundingDINO with the Segment Anything Model (SAM) — to detect objects from natural language prompts such as “blue mug” or “water bottle on the right”. This enables operation on both simulated and real-world data, handling variations in shape, size, and texture without the need for dataset-specific retraining.

3D Reconstruction
After segmentation, the point cloud is processed to generate a clean, geometrically accurate 3D reconstruction of the target object. Filtering, noise reduction, and surface refinement ensure the output is optimized for the grasp detection stage, even under challenging conditions like partial occlusion or sensor inaccuracies.


Grasping Pose Detection
The Grasp Pose Detection (GPD) module was customized to address practical constraints of tabletop manipulation. Initially, grasps were generated from all directions — including from under the table — leading to infeasible plans. By augmenting the input cloud with a portion of the table surface and tuning the approach vector, grasps are now biased toward top/front approaches, reducing collision risks and improving execution success.
.png)
.png)
.png)
Motion Planning & Execution
Integration with MoveIt Task Constructor (MTC) enables the pipeline to build transparent, modular grasping tasks. MTC decomposes the process into logical stages (pre-grasp, approach, grasp, retreat), each validated independently, allowing for fallback strategies in case of failure. This structure greatly simplifies debugging and increases execution reliability.

Testing & Evaluation
The pipeline was validated on PAL Robotics' Tiago robots, first in simulation (Gazebo + RViz) and later in real-world experiments. Test objects ranged from simple cylinders and bottles to irregular shapes such as pears and joysticks. Despite the modular design being in its early stages, GAM demonstrated strong generalization to unseen objects and a reliable workflow from perception to execution.


Demo

Since GAM is a huge project and I cannot explain all the details, check my Thesis to see the full project documentation.