Leuven | More than two weeks ago
AI language translation models have improved from previous state of the art learning mappings between sequences via neural networks and attention mechanisms. Due to more complex and deep layers in models it becomes expensive to train such models with huge parameters. To train such models, it is necessary to share workload over multiple GPUs to reduce cost and time it takes. At this point it is essential to understand the computational load and data flow load on the current GPUs. This work aims to understand computational and communication load on the GPU while training such models. GPU profiling has been used widely to optimize such workloads to have better resource utilization.
As part of this work, the candidate will be understanding large language models (LLM) and execute such models to realize the deeper insight of GPU resources utilization. The data from profiling tools can then be used with analytical/cycle accurate simulator tools to estimate the computation cost of such model’s at large scale. Understanding these parameters from the profiling tools will be used later insight on impact of GPU architecture for certain workloads which can then be used to upgrade analytical tools.
Computer science, parallel computing on cluster, GPU system level architecture, programming (C++/Python/CUDA)
Type of work: 30% literature, 70% modelling
Daily advisor: Aakash Patel, Dwaipayan Biswas
Type of project: Combination of internship and thesis
Duration: 6-9 months
Required degree: Master of Engineering Technology, Master of Science, Master of Engineering Science
Required background: Computer Science, Electrotechnics/Electrical Engineering
Imec allowance will be provided for students studying at a non-Belgian university.