/Data Authenticity at the Edge in the Age of Generative AI

Data Authenticity at the Edge in the Age of Generative AI

PhD - Gent | More than two weeks ago

How can we leverage data authenticity mechanisms on chip for sharing trustworthy derived insights?

Abstract

We invite applications for a PhD investigating how to guarantee the authenticity and provenance of sensor data produced at the network edge in an era where generative AI can fabricate highly convincing signals. Edge devices (smartphones, cameras, IoT sensors) initiate a data point’s lifecycle under tight resource constraints and heightened exposure to tampering, making early verification and end-to-end traceability critical. This project will design and evaluate methods that (i) harden the sensing pipeline so software can verify the trustworthiness of raw sensor outputs, (ii) construct provenance chains that preserve verifiable authenticity as data are transformed into higher-level statements (e.g., turning GPS readings into “Person X is in Ghent now”), and (iii) implement these mechanisms efficiently on resource-constrained edge platforms. Outcomes will include a state-of-the-art review; a robust framework for labeling, fingerprinting, and watermarking sensor streams; and prototype implementations validated on real and synthetic datasets across representative edge hardware. The research will advance scientific understanding at the intersection of edge computing, generative AI, and data governance, and aims to inform interoperable practices aligned with emerging content provenance approaches. The project is supervised by Prof. Pieter Colpaert and Dr. Tanguy Coenen in collaboration with imec’s AI & Algorithms research groups, with results to be disseminated via leading conferences, journals, and demonstrators.

Introduction

Generative AI is a branch of artificial intelligence that aims to create novel and realistic data, such as images, text, audio, or video, from existing data or latent variables. Generative AI has many potential applications, such as data augmentation, content creation, anomaly detection, and privacy preservation. However, generative AI also poses significant challenges for data authenticity and integrity, especially in the context of sensor data collected at the edge of the network. Edge computing is a paradigm that enables data processing and analysis near the source of the data, rather than in the cloud or a centralized server. Edge computing can offer benefits such as low latency, high bandwidth, reduced cost, and enhanced privacy. However, edge devices, such as smartphones, cameras, or IoT sensors, are often constrained by limited resources, such as memory, battery, and computing power. Moreover, edge devices are more vulnerable to physical attacks, tampering, or spoofing, which can compromise the quality and reliability of the sensor data. Therefore, it is crucial to develop methods and techniques to ensure data authenticity and integrity at the edge, in the presence of generative AI.

Data authenticity ensures that data is genuine, original, and not fabricated or manipulated by unauthorized parties. Data authenticity is important for many reasons, such as establishing trust, accountability, and transparency among data producers, consumers, and intermediaries; protecting intellectual property rights and privacy; and preventing fraud, misinformation, and cyberattacks. Data authenticity becomes even more important in the context of generative AI, which can create realistic and convincing data that can be hard to distinguish from human data. Generative AI data can be used for malicious purposes, such as impersonating or deceiving people, spreading false or misleading information, or compromising the security or performance of systems or networks. Therefore, it is essential to develop methods and techniques to verify and validate the source, origin, and quality of the data, and to detect and reject any generative AI data that is not authorized or intended. This is the main motivation and challenge of this PhD project, which focuses on sensor data collected at the edge of the network, where the risk of generative AI data is higher and the resources for data authenticity are lower, yet the compute resides closer to the sensor and therefore constitutes the beginning of the life cycle of a data point. After the inception of the data point, the question becomes how to track the different computations that have been executed on the data point, to make observable and transparent the lineage and provenance of the data at the time of its usage. Techniques that can improve the transparency of this flow, which also being able to retain a trace of the authenticity of the data are at the center of this PhD topic.

Research Objectives and Questions

The main objective of this PhD project is to investigate how to distinguish synthetic, generative AI sensor data from real sensor data, and how to build a data governance that keeps this authenticity verifiable even when derivations of the data are created. E.g., the data coming from a location sensor that measures longitude and latitude will be derived into a statement like “Person X is in Ghent at this moment”.

The project will be supervised by Prof. Pieter Colpaert and Dr. Tanguy Coenen, and will be conducted in collaboration with the imec research groups in the “AI and Algorithms department. The project will address the following research questions:

How can we adapt hardware design so that the software level has a means to verify the authenticity of sensory output.
Can we build data provenance chains in which the authenticity of derived data can still be verified?
How could this be implemented on edge devices, i.e. close to the sensor? Edge devices have limited resources and capabilities, which pose challenges for implementing data authenticity and integrity mechanisms. The project will study how to optimize the performance, scalability, and security of the proposed methods and techniques, and how to leverage existing edge computing frameworks and platforms.

Expected Outcomes and Contributions

The expected outcomes of this PhD project are:

A comprehensive literature review and state-of-the-art analysis of the existing methods and techniques for data authenticity and integrity at the edge, in the context of generative AI.
A novel and robust framework for data labeling, fingerprinting, and watermarking.
A prototype implementation and evaluation of the proposed framework on real-world edge devices and sensors, and on synthetic and real data sets.
A dissemination of the research results through publications in peer-reviewed journals and conferences, and through presentations and demonstrations in academic and industrial events.

The expected contributions of this PhD project are:

A new and original perspective on the problem of data authenticity and integrity at the edge, in the age of generative AI.
A practical and scalable solution for implementing and evaluating the proposed framework on edge devices and sensors, and on synthetic and real data sets.
A significant advancement of the scientific knowledge and the state-of-the-art in the fields of edge computing, generative AI, and data integration.

Related papers

Van Assche, D., Min Oo, S., Rojas Melendez, J. A., & Colpaert, P. (2022). Continuous generation of versioned collections’ members with RML and LDES. In CEUR workshop, the 3rd International Workshop on Knowledge Graph Construction (KGCW 2022) co-located with 19th Extended Semantic Web Conference (ESWC 2022) (Vol. 3141).

Akaichi, I., Slabbinck, W., Rojas, J. A., Van Gheluwe, C., Bozzi, G., Colpaert, P., ... & Kirrane, S. (2024, May). Interoperable and continuous usage control enforcement in dataspaces. In The Second International Workshop on Semantics in Dataspaces, co-located with the Extended Semantic Web Conference.

Termont, W., Dedecker, R., Slabbinck, W., Esteves, B., De Meester, B., & Verborgh, R. (2024). From Resource Control to Digital Trust with User-Managed Access. arXiv preprint arXiv:2411.05622.

Slabbinck, W., Rojas, J., Esteves, B., Verborgh, R., & Colpaert, P. May the FORCE be with you? A Framework for ODRL Rule Compliance through Evaluation. In NeXt-generation Data Governance workshop 2025.

The C2PA specification for media content: https://spec.c2pa.org/specifications/specifications/2.2/index.html

Required background: Information Engineering Technology, Computer Science or equivalent

Type of work: 70% data specification modeling, 30% literature on hardware mechanisms

Supervisor: Pieter Colpaert

Co-supervisor: Tanguy Coenen

Daily advisor: Pieter Colpaert

The reference code for this position is 2026-113. Mention this reference code on your application form.

Apply

Who we are

Accept analytics-cookies to view this content.

imec's cleanroom

Accept analytics-cookies to view this content.

Related jobs

Computer Vision for Defect Inspection and Metrology: Solving Semiconductor Manufacturing Challenges towards Advanced Process Control using Machine Learning

This internship project addresses fundamental computer vision challenges arising from this domain, with the semiconductor process flow providing the boundary conditions and practical constraints.

ICT Solution architect data platforms

Explainable Neurosymbolic Algorithms for Process Optimization in Semiconductor Manufacturing

From Tacit Expertise to Transparent AI: Redefining Semiconductor Process Intelligence

Job opportunities

Share this article on

Data Authenticity at the Edge in the Age of Generative AI

Who we are

imec's cleanroom

Related jobs

Computer Vision for Defect Inspection and Metrology: Solving Semiconductor Manufacturing Challenges towards Advanced Process Control using Machine Learning

Applied AI Researcher

Fab Reporting Business Analyst

Senior Software Engineer for Discrete Event Simulations

ICT Solution architect data platforms

Explainable Neurosymbolic Algorithms for Process Optimization in Semiconductor Manufacturing

Send this job to your email

Expertise

What we offer

Applications

Jobs

About imec

More imec