/Smart Data Movement in Hybrid Storage Systems

Smart Data Movement in Hybrid Storage Systems

Master projects/internships - Leuven | More than two weeks ago

Explore hybrid storage systems from a hybrid location (imec Leuven - LIRMM Montpellier) 

Computers are becoming the basis of the modern world. They are everywhere, from the Cloud to the 20 billion connected devices worldwide, and they will become even more ubiquitous in the future. However, to fulfill their role, computers still need to further evolve and become far more efficient. New applications, such as machine learning, push for more computation power, but we can only increase performance if we also keep cost and power consumption at affordable levels.

The growing disparity of speed between CPU and off-chip memory has motivated the use of deep memory hierarchies to reduce the average latency of memory accesses. As a result, a typical memory hierarchy combines heterogeneous memory technologies, such as SRAM for on-chip caches, DRAM for main memory, and flash for storage. Each technology offers a different tradeoff between storage density and access time. Typically, CPU memory requests trigger data to move reactively in this hierarchy. For example, when the CPU executes a load instruction accessing data that is not yet present in the main memory, it triggers a transfer of that data from the extremely slow flash storage (about 10,000 times slower than the CPU) to the main memory. In consequence, the CPU may need to wait for the transfer to complete before continuing execution, which can hurt performance and energy efficiency significantly.   

Alternatively, a smart hybrid storage system could anticipate CPU needs and proactively move data (i.e., prefetch) into a smaller but much faster storage partition made from a fast emerging non-volatile memory technology, such as Phase-Change Memory (PCM). If we can predict future requests accurately, such a storage system will behave as an idyllic memory that combines the capacity of flash and the access time of PCM.

In this master thesis internship, the student will study storage prefetching opportunities of long-term patterns in smart hybrid storage systems. The work will be confined to the storage layer without affecting data management at the main memory layer. The student will extend an existing open-source Solid-State Drive (SSD) C++ simulator, named MQSim and developed by the SAFARI group at ETH Zurich. These extensions will enable us to model hybrid storage systems composed of flash and fast non-volatile memory technology. The student will also devise methods to analyze data access patterns and evaluate different prediction strategies, including Machine Learning techniques, such as Recurrent Neural Networks (RNN) or Reinforcement Learning (RL). This activity will happen partly at imec, Leuven (BE), in the compute system architecture department, and partly at LIRMM, Montpellier (FR), in close cooperation with the team of Dr. David Novo.

Prerequisites: Excellent C/C++ and Python programming skills and a strong background in computer architecture.

Type of Project: Combination of internship and thesis; Internship; Thesis 

Master's degree: Master of Engineering Technology; Master of Science; Master of Engineering Science 

Duration: 9 months 

Master program: Computer Science 

KU Leuven supervisor: Francky Catthoor (EE, Nano) 

Supervising scientist: for further information or for application, please contact Timon Evenblij (timon.evenblij@imec.be). 

Imec allowance will be provided for students studying at a non-Belgian university.