
- This event has passed.
Yuhui Bao PhD Dissertation Defense
September 3, 2024 @ 10:00 am - 11:00 am
Name:
Yuhui Bao
Title:
A Design Methodology for Producing Highly-Adaptable and High-Performance Simulation Frameworks
Date:
9/3/2024
Time:
10:00:00 AM
Committee Members:
Prof. David Kaeli (Advisor)
Prof. Ningfang Mi
Prof. Yifan Sun (William and Mary)
Abstract:
Computer architecture simulators play an essential role in the development and optimization of computer hardware. A variety of simulators have been developed to explore the design space of CPUs, GPUs, and customer accelerators. As GPUs continue to grow in popularity for accelerating demanding applications, such as high-performance computing and machine learning, GPU architects have been pushing the envelope of GPU performance in every new GPU generation. GPU vendors (e.g., NVIDIA and AMD) have been introducing subsequent generations of GPU architectures and products with updated instruction set architectures (ISAs) and new microarchitectural features every 2-3 years. Modeling the state-of-the-art architecture is a crucial feature of GPU simulators, which are used to characterize and accelerate challenging workloads facilitating performance evaluation and design exploration. However, the effort required to design and construct an accurate and performant simulator is huge. Due to the rapid rate of innovation in GPU technology, any simulator that is over-customized to capture the design of a specific architecture will quickly become outdated. Thus, we need to develop a design methodology for simulators that can guard against this trend, embracing future architectures.
In this dissertation, we propose a design methodology for producing highly-adaptable and high-performance simulation frameworks. We aim to design simulators featuring high adaptability, being able to accommodate future alterations or extensions, high performance and high fidelity. We leverage the Akita simulator framework to enable the modular and extensible design of various GPU components. To fulfill the goal of high fidelity, we design a set of microbenchmarks to evaluate individual GPU subsystems. We demonstrate how we follow our design methodology to achieve a highly-adaptable and accurate simulator — NaviSim, which provides the flexibility to support simulation of three different ISAs. To demonstrate the full utility of the NaviSim simulator, we conduct a performance study of the impact of individual architecture features revealing the high flexibility and configurability of NaviSim. In addition, we showcase how NaviSim’s high adaptability contributes to design space exploration, offering solutions to enhance the performance of real-world demanding applications.
Fast simulation speed is one of the key requirements of any simulators. NaviSim is designed to support multi-threaded execution, which is able to leverage the parallel capabilities offered by today’s multi-core CPUs, enabling parallel simulation. In this thesis we identify key performance bottlenecks in terms of both serial and parallel simulation execute modes and optimize simulation speed. We also present lessons learned about efficient simulator design and provide guidance for future simulator developers.