/Multimodal Robotics Foundation Models for Safety-Critical Physical AI Applications

Multimodal Robotics Foundation Models for Safety-Critical Physical AI Applications

PhD - Brussel | Just now

Turning multimodal data into safer robotic intelligence.

Accelerating training and deployment of robotic AI

As the generative and reasoning capabilities of artificial intelligence continue to evolve, the next anticipated breakthrough lies in enabling intelligent agents to interact physically with the real world. “Physical AI and robotics will bring about the next industrial revolution,” stated Jensen Huang, founder and CEO of NVIDIA. (https://nvidianews.nvidia.com/news/nvidia-powers-humanoid-robot-industry-with-cloud-to-robot-computing-platforms-for-physical-ai)


In line with this vision, one of the activities at imec investigates how to overcome one of the main limitations for robotic AI: the scarcity of training data. Insufficient training data often results in unreliable results or hallucinations of AI models, which in robotic systems, and especially safety-critical scenarios, could lead to accidents, physical harm or unexpected system failures. Typically, those models use video only data, a minority of them also include tactile data.

To address this challenge, we aim to enrich training datasets through multimodal data integration and explore its effect on the performance and robustness of AI-based robotic agents. A potential solution is the use of multimodal occupancy map data, which could enhance training efficiency and reduce resource intensity, enabling faster deployment of robotic agents in real-world industrial environments. An essential step in this direction includes the generation of synthetic multimodal datasets, which would allow for training and testing of the models even before physical systems are deployed—thus accelerating the timeline from design to implementation in the target workspace.

To achieve this goal, multimodal occupancy map training datasets will be created using both physical sensor setups and its digital twin commplement. Synthetic data will be generated through robotic digital twins, enabling extensive scenario exploration. We target resource-efficient training workflows as well as continuous learning strategies to adapt models during deployment. Special attention will be given to safety-critical event handling, where existing vision-based methods—such as those relying on RGB-D data—often fall short due to occlusions, lighting conditions, or dead zones. We aim to evaluate how multimodal representations can improve performance in terms of safety, reduced training data, performances in such scenarios.


For this PhD, we expect the student to focus on the following aspects:

(i) The student understands and explores the principles and mechanisms of AI model training, with the goal of reducing the time and resource demand of robotics foundation model development using multimodal datasets.

(ii) The student has or is willing to acquire a solid conceptual understanding of transformer-based architectures, which form the basis of modern generative AI systems.

(iii) The student has or is willing to gain experience in developing virtual environments and robotic simulations, enabling the creation of synthetic occupancy map datasets that mirror real-world multimodal data.

(iv) The student contributes to the design and implementation of training workflows that leverage both synthetic and real data to enable scalable and generalizable robotics AI solutions.

(v) The student applies the developed models to a set of test cases, assessing performance in generalization and safety-critical events, and identifying limitations compared to conventional, vision-only approaches, the metholody should be implemented in an industry relevant use case.

 

In summary, we are looking for an ambitious student with a creative mindset who would like to work on foundation robotic AI models trained using multimodal real and synthetic data—contributing to faster deployment, greater safety, and improved generalization of intelligent robotic systems.

 



Required background: We seek an ambitious, creative student to advance robotic AI foundation models using multimodal real and synthetic data—accelerating deployment, enhancing safety, and improving generalization for intelligent, industry-ready robotic systems.

Type of work: AI

Supervisor: Bram Vanderborght

Co-supervisor: Constantin Scholz

Daily advisor: Hamed Firouzipouyaei

The reference code for this position is 2026-009. Mention this reference code on your application form.

Who we are
Accept analytics-cookies to view this content.
imec's cleanroom
Accept analytics-cookies to view this content.

Send this job to your email