/Exploration of NPU architectures on FPGA platforms

Exploration of NPU architectures on FPGA platforms

Master projects/internships - Leuven | Just now

Design, prototype, and optimize Neural Processing Units for tomorrow’s AI workloads

Edge and mobile devices face unique challenges in meeting the demands of modern AI workloads due to their limited resources and power constraints. To address these challenges, specialized accelerators, Neural Processing Units (NPUs), are integrated into the SoCs to deliver high performance while maintaining low power consumption. However, the design, optimization and integration of these accelerators is a complex task: they need to deliver optimal performance for current algorithms while remaining flexible and future-proof for emerging AI models.

Traditionally, architectural exploration relies on simulators, which tend to simplify the software stack or abstract away hardware complexity. While useful for targeted research questions, these simulators often fail to capture the full intricacies of mobile SoCs and do not provide a holistic view of system-level design trade-offs.

This internship focuses on exploring NPU architectures on a more realistic setting through hardware system design and FPGA-based prototyping. By integrating the key components of a mobile SoC (CPU, NPU and memory controllers) into a unified platform, we aim to analyze system-level bottlenecks and evaluate trade-offs among different architectural choices and optimizations. While hardware-based system design is more complex than simulation, it enables more accurate modelling and faster exploration of architectural parameters. By instrumenting the hardware with performance counters, we can capture the system state in greater detail, providing deeper insights into bottlenecks and their root causes.

As part of this internship, you will:

  • Investigate NPU architectures tailored for mobile and edge computing platforms.
  • Port existing NPU implementations to an FPGA-based prototype system.
  • Evaluate system-level trade-offs in performance, memory requirements, and power consumption across different implementations and micro-architectural configurations.
  • Extend the system with specialized operators and/or optimize existing operators to improve efficiency.
  • Gain an in-depth understanding of the computational and memory characteristics of modern AI workloads.
  • Propose software-hardware co-design solutions for workload scheduling policies across heterogeneous cores and accelerators.
  • Optimize the data placement and movement across various memory technologies at the system-level considering complex AI workloads and agents executed concurrently.

Ideal candidate profile:

  • MSc student in Computer Science, Electrical Engineering, or related program.
  • Familiarity with GPU/NPU architectures, including vector and tensor processing arrays.
  • Understanding of the memory hierarchy, memory controller interfaces and the trade-offs in power, capacity, bandwidth and latency across memory technologies.
  • Previous experience with hardware description languages (Chisel and/or Verilog) and FPGA design flow.
  • Preferably, experience with FPGA-accelerated simulation frameworks (e.g., FireSim).
  • Proficiency with Linux environments and software development practices.
  • Available for a 1-year internship and eligible to do an internship in Belgium.
  • Strong written and verbal communication skills in English.

Master's degree: Master of Science, Master of Engineering Science, Master of Engineering Technology

Required educational background: Computer Science, Electrotechnics/Electrical Engineering

Duration: 12 months

For more information or application, please contact the supervising scientists Tommaso Marinelli (tommaso.marinelli@imec.be) and Konstantinos Tovletoglou (konstantinos.tovletoglou@imec.be).

Who we are
Accept analytics-cookies to view this content.
imec's cleanroom
Accept analytics-cookies to view this content.

Send this job to your email