PhD - Antwerpen | More than two weeks ago
Generative AI is a branch of artificial intelligence that aims to create novel and realistic data,
such as images, text, audio, or video, from existing data or latent variables. Generative AI has many potential applications, such as data augmentation, content creation, anomaly detection, and privacy preservation. However, generative AI also poses significant challenges for data authenticity and integrity, especially in the context of sensor data collected at the edge of the network. Edge computing is a paradigm that enables data processing and analysis near the source of the data, rather than in the cloud or a centralized server. Edge computing can offer benefits such as low latency, high bandwidth, reduced cost, and enhanced privacy. However, edge devices, such as smartphones, cameras, or IoT sensors, are often constrained by limited resources, such as memory, battery, and computing power. Moreover, edge devices are more vulnerable to physical attacks, tampering, or spoofing, which can compromise the quality and reliability of the sensor data. Therefore, it is crucial to develop methods and techniques to ensure data authenticity and integrity at the edge, in the presence of generative AI.
Data authenticity is the property that ensures that data is genuine, original, and not fabricated or manipulated by unauthorized parties. Data authenticity is important for many reasons, such as establishing trust, accountability, and transparency among data producers, consumers, and intermediaries; protecting intellectual property rights and privacy; and preventing fraud, misinformation, and cyberattacks. Data authenticity becomes even more important in the context of generative AI, which can create realistic and convincing data that can be hard to distinguish from human data. Generative AI data can be used for malicious purposes, such as impersonating or deceiving people, spreading false or misleading information, or compromising the security or performance of systems or networks. Therefore, it is essential to develop methods and techniques to verify and validate the source, origin, and quality of the data, and to detect and reject any generative AI data that is not authorized or intended. This is the main motivation and challenge of this PhD project, which focuses on sensor data collected at the edge of the network, where the risk of generative AI data is higher and the resources for data authenticity are lower, yet the compute resides closer to the sensor and therefore constitutes the beginning of the life cycle of a data point. After the inception of the data point, the question becomes how to track the different computations that have been executed on the data point, to make observable and transparent the lineage and provenance of the data at the time of its usage. Techniques that can improve the transparence of this flow, which also being able to retain a trace of the authenticity of the data are at the center of this PhD topic.
The main objective of this PhD project is to investigate how to distinguish "synthetic", generative AI sensor data from "real" sensor data, and how to build data provenance and lineage chains that retain and/or augment the authenticity and integrity of the data. The project will be supervised by Prof Pieter Colpaert and Dr. Tanguy Coenen, and will be conducted in collaboration with the imec research groups in the “AI and Data’ department. The project will address the following research questions:
The expected outcomes of this PhD project are:
The expected contributions of this PhD project are:
Required background: computer science, engineering sciences or equivalent
Type of work: 50% literature, 50% development
Supervisor: Pieter Colpaert
Co-supervisor: Tanguy Coenen
Daily advisor: Tanguy Coenen
The reference code for this position is 2024-088. Mention this reference code on your application form.