/Data Authenticity at the Edge in the Age of Generative AI

Data Authenticity at the Edge in the Age of Generative AI

PhD - Antwerpen | More than two weeks ago

Data Authenticity at the Edge in the Age of Generative AI

Introduction

Generative AI is a branch of artificial intelligence that aims to create novel and realistic data,

such as images, text, audio, or video, from existing data or latent variables. Generative AI has many potential applications, such as data augmentation, content creation, anomaly detection, and privacy preservation. However, generative AI also poses significant challenges for data authenticity and integrity, especially in the context of sensor data collected at the edge of the network. Edge computing is a paradigm that enables data processing and analysis near the source of the data, rather than in the cloud or a centralized server. Edge computing can offer benefits such as low latency, high bandwidth, reduced cost, and enhanced privacy. However, edge devices, such as smartphones, cameras, or IoT sensors, are often constrained by limited resources, such as memory, battery, and computing power. Moreover, edge devices are more vulnerable to physical attacks, tampering, or spoofing, which can compromise the quality and reliability of the sensor data. Therefore, it is crucial to develop methods and techniques to ensure data authenticity and integrity at the edge, in the presence of generative AI.

Data authenticity is the property that ensures that data is genuine, original, and not fabricated or manipulated by unauthorized parties. Data authenticity is important for many reasons, such as establishing trust, accountability, and transparency among data producers, consumers, and intermediaries; protecting intellectual property rights and privacy; and preventing fraud, misinformation, and cyberattacks. Data authenticity becomes even more important in the context of generative AI, which can create realistic and convincing data that can be hard to distinguish from human data. Generative AI data can be used for malicious purposes, such as impersonating or deceiving people, spreading false or misleading information, or compromising the security or performance of systems or networks. Therefore, it is essential to develop methods and techniques to verify and validate the source, origin, and quality of the data, and to detect and reject any generative AI data that is not authorized or intended. This is the main motivation and challenge of this PhD project, which focuses on sensor data collected at the edge of the network, where the risk of generative AI data is higher and the resources for data authenticity are lower, yet the compute resides closer to the sensor and therefore constitutes the beginning of the life cycle of a data point. After the inception of the data point, the question becomes how to track the different computations that have been executed on the data point, to make observable and transparent the lineage and provenance of the data at the time of its usage. Techniques that can improve the transparence of this flow, which also being able to retain a trace of the authenticity of the data are at the center of this PhD topic.

Research Objectives and Questions

The main objective of this PhD project is to investigate how to distinguish "synthetic", generative AI sensor data from "real" sensor data, and how to build data provenance and lineage chains that retain and/or augment the authenticity and integrity of the data. The project will be supervised by Prof Pieter Colpaert and Dr. Tanguy Coenen, and will be conducted in collaboration with the imec research groups in the “AI and Data’ department. The project will address the following research questions:

  • Can we use physical unclonable functions (PUFs) to label/fingerprint/watermark data points? PUFs are hardware features that produce unique and unpredictable responses to a given challenge, and can be used to generate cryptographic keys or identifiers. PUFs can be embedded in edge devices or sensors, and can be used to sign, encrypt, or tag the data points generated by the device or sensor. The project will explore how to design, implement, and evaluate PUF-based schemes for data labeling, fingerprinting, and watermarking.
  • Can we build data provenance and lineage chains that retain and/or augment these labels? Data provenance and lineage are metadata that describe the origin, history, and transformations of the data. Data provenance and lineage can help to verify the authenticity and integrity of the data, as well as to trace the data flows and dependencies. The project will investigate how to construct and maintain data provenance and lineage chains that incorporate the PUF-based labels, and how to use them to detect and prevent data manipulation, fabrication, or falsification.
  • How could this be implemented on edge devices, i.e. close to the sensor? Edge devices have limited resources and capabilities, which pose challenges for implementing data authenticity and integrity mechanisms. The project will study how to optimize the performance, scalability, and security of the proposed methods and techniques, and how to leverage existing edge computing frameworks and platforms.

Expected Outcomes and Contributions

The expected outcomes of this PhD project are:

  • A comprehensive literature review and state-of-the-art analysis of the existing methods and techniques for data authenticity and integrity at the edge, in the context of generative AI.
  • A novel and robust framework for data labeling, fingerprinting, and watermarking using PUFs, and for data provenance and lineage using PUF-based labels.
  • A prototype implementation and evaluation of the proposed framework on real-world edge devices and sensors, and on synthetic and real data sets.
  • A dissemination of the research results through publications in peer-reviewed journals and conferences, and through presentations and demonstrations in academic and industrial events.

The expected contributions of this PhD project are:

  • A new and original perspective on the problem of data authenticity and integrity at the edge, in the age of generative AI.
  • A novel and robust framework for data labeling / fingerprinting / watermarking using PUFs, and for data provenance and lineage using PUF-based labels.
  • A practical and scalable solution for implementing and evaluating the proposed framework on edge devices and sensors, and on synthetic and real data sets.
  • A significant advancement of the scientific knowledge and the state-of-the-art in the fields of edge computing, generative AI, and data integration.

 

 

Related papers

  • Sun, N., Chen, Z., Wang, Y., Wang, S., Xie, Y., & Liu, Q. (2023). Random fractal-enabled physical unclonable functions with dynamic AI authentication. Nature Communications14(1), 2185.
  • Mursi, K. T., Thapaliya, B., Zhuang, Y., Aseeri, A. O., & Alkatheiri, M. S. (2020). A fast deep learning method for security vulnerability study of XOR PUFs. Electronics9(10), 1715.
  • Van Assche, D., Min Oo, S., Rojas Melendez, J. A., & Colpaert, P. (2022). Continuous generation of versioned collections’ members with RML and LDES. In CEUR workshop, the 3rd International Workshop on Knowledge Graph Construction (KGCW 2022) co-located with 19th Extended Semantic Web Conference (ESWC 2022) (Vol. 3141).
  • Gebali, F., & Mamun, M. (2022). Review of physically unclonable functions (pufs): structures, models, and algorithms. Frontiers in Sensors2, 751748.


Required background: computer science, engineering sciences or equivalent

Type of work: 50% literature, 50% development

Supervisor: Pieter Colpaert

Co-supervisor: Tanguy Coenen

Daily advisor: Tanguy Coenen

The reference code for this position is 2024-088. Mention this reference code on your application form.

Who we are
Accept marketing-cookies to view this content.
Cookie settings
imec's cleanroom
Accept marketing-cookies to view this content.
Cookie settings

Send this job to your email