/Exploration of NPU architectures on FPGA platforms

Exploration of NPU architectures on FPGA platforms

Master projects/internships - Leuven | Just now

Design, prototype, and optimize Neural Processing Units for tomorrow’s AI workloads

Edge and mobile devices face unique challenges in meeting the demands of modern AI workloads due to their limited resources and power constraints. To address these challenges, specialized accelerators, Neural Processing Units (NPUs), are integrated into the SoCs to deliver high performance while maintaining low power consumption. However, the design, optimization and integration of these accelerators is a complex task: they need to deliver optimal performance for current algorithms while remaining flexible and future-proof for emerging AI models.

Traditionally, architectural exploration relies on simulators, which tend to simplify the software stack or abstract away hardware complexity. While useful for targeted research questions, these simulators often fail to capture the full intricacies of mobile SoCs and do not provide a holistic view of system-level design trade-offs.

This internship focuses on exploring NPU architectures on a more realistic setting through hardware system design and FPGA-based prototyping. By integrating the key components of a mobile SoC (CPU, NPU and memory controllers) into a unified platform, we aim to analyze system-level bottlenecks and evaluate trade-offs among different architectural choices and optimizations. While hardware-based system design is more complex than simulation, it enables more accurate modelling and faster exploration of architectural parameters. By instrumenting the hardware with performance counters, we can capture the system state in greater detail, providing deeper insights into bottlenecks and their root causes.

As part of this internship, you will:

Investigate NPU architectures tailored for mobile and edge computing platforms.
Port existing NPU implementations to an FPGA-based prototype system.
Evaluate system-level trade-offs in performance, memory requirements, and power consumption across different implementations and micro-architectural configurations.
Extend the system with specialized operators and/or optimize existing operators to improve efficiency.
Gain an in-depth understanding of the computational and memory characteristics of modern AI workloads.
Propose software-hardware co-design solutions for workload scheduling policies across heterogeneous cores and accelerators.
Optimize the data placement and movement across various memory technologies at the system-level considering complex AI workloads and agents executed concurrently.

Ideal candidate profile:

MSc student in Computer Science, Electrical Engineering, or related program.
Familiarity with GPU/NPU architectures, including vector and tensor processing arrays.
Understanding of the memory hierarchy, memory controller interfaces and the trade-offs in power, capacity, bandwidth and latency across memory technologies.
Previous experience with hardware description languages (Chisel and/or Verilog) and FPGA design flow.
Preferably, experience with FPGA-accelerated simulation frameworks (e.g., FireSim).
Proficiency with Linux environments and software development practices.
Available for a 1-year internship and eligible to do an internship in Belgium.
Strong written and verbal communication skills in English.

Master's degree: Master of Science, Master of Engineering Science, Master of Engineering Technology

Required educational background: Computer Science, Electrotechnics/Electrical Engineering

Duration: 12 months

For more information or application, please contact the supervising scientists Tommaso Marinelli (tommaso.marinelli@imec.be) and Konstantinos Tovletoglou (konstantinos.tovletoglou@imec.be).

Apply

Who we are

Accept analytics-cookies to view this content.

imec's cleanroom

Accept analytics-cookies to view this content.

Explore our other vacancies

Event-based modeling of high-performance data converters and frequency synthesizers.

Explore the framework for system-level design of high-speed integrated circuits.

Investigation of the Thermal Properties of Sputter-Deposited AlN Thin Films

Linking Thin-Film Growth to Thermal Performance in Advanced Technologies

Row hammer effect for IGZO based eDRAM using TCAD

Job opportunities

Share this article on

Exploration of NPU architectures on FPGA platforms

Who we are

imec's cleanroom

Explore our other vacancies

R&D Process Integration Engineer (Pathfinding Integration)

Biomedical Researcher for Organ on chip applications

Wet Process Development Coordinator

Event-based modeling of high-performance data converters and frequency synthesizers.

Investigation of the Thermal Properties of Sputter-Deposited AlN Thin Films

Row hammer effect for IGZO based eDRAM using TCAD

Send this job to your email

Expertise

What we offer

Applications

Jobs

About imec

More imec