/Assessing memory performance of AI applications under resource contention

Assessing memory performance of AI applications under resource contention

Master projects/internships - Leuven | Just now

Discover how memory bottlenecks shape the future of AI on mobile devices, and what it takes to overcome them.

The growing integration of AI into everyday devices, from smartphones to edge systems, places increasing pressure on memory subsystems to handle demanding workloads efficiently. DRAM performance is critical for these applications, as memory bandwidth often becomes the limiting factor. 

While workloads with regular and predictable access patterns, such as traditional LLM inference, can achieve near-optimal memory utilization under ideal conditions, real-world execution introduces two major challenges that significantly degrade memory efficiency. First, resource contention arises when memory bandwidth is shared across multiple actors, including hardware components (CPUs and AI accelerators) as well as concurrent software processes and background tasks, leading to unpredictable latency and reduced bandwidth. Second, modern AI workloads such as diffusion models, multi-agent systems, and retrieval-augmented generation (RAG) exhibit irregular memory access patterns that reduce row-buffer locality and prefetch effectiveness. These behaviors not only lower effective bandwidth but also amplify contention when combined with other traffic. 

Understanding the combined impact of these challenges is essential for designing next-generation platforms that deliver consistent AI performance under realistic conditions. Insights from this study will guide improvements in memory hierarchies, scheduling policies, and system-level optimizations, enabling more responsive and power-efficient AI applications on resource-constrained devices.

As part of this internship, you will: 

  • Identify realistic workloads for mobile platforms, focusing on:  
    • AI applications, both traditional and emerging.
    • Commonly used mobile applications and OS services, including user apps with low-latency requirements.
  • Set up a simulation environment to assess DRAM performance under multi-source utilization (mobile CPU and dedicated AI accelerators).
  • Define appropriate workload representations for the simulation environment.
  • Evaluate accuracy-speed trade-offs to make the exploration feasible.
  • Analyze the impact of target AI applications and concurrent tasks on DRAM performance.
  • Propose system-level optimizations to mitigate performance loss or improve energy efficiency. 

Ideal candidate profile: 

  • MSc student in Computer Science, Electrical Engineering, or a related program.
  • Strong understanding of computer hardware architecture, memory systems, and AI accelerators.
  • Familiarity with neural network architectures, especially LLMs, and awareness of emerging paradigms such as multi-agent systems, retrieval-augmented generation (RAG), and diffusion-based models.
  • Proficiency in C/C++ and Python programming languages.
  • Experience with performance analysis tools or simulation frameworks is a plus.
  • Self-starter with the ability to work independently and think critically.
  • Available for a 1-year internship and eligible to work in Belgium.
  • Strong written and verbal English communication skills.

Master's degree: Master of Science, Master of Engineering Science, Master of Engineering Technology

Required educational background: Computer Science, Electrotechnics/Electrical Engineering

Duration: 12 months

For more information or application, please contact the supervising scientists Tommaso Marinelli (tommaso.marinelli@imec.be) and Konstantinos Tovletoglou (konstantinos.tovletoglou@imec.be).

Who we are
Accept analytics-cookies to view this content.
imec's cleanroom
Accept analytics-cookies to view this content.

Send this job to your email