Loading Events

« All Events

  • This event has passed.

Kai Huang’s PhD Dissertation Defense

November 2, 2022 @ 2:00 pm - 3:00 pm

“Partitioning Data Across Multiple, Network Connected FPGAs with High Bandwidth Memory to Accelerate Non-streaming Applications”

Abstract:
Field Programmable Gate Arrays (FPGAs) are increasingly used in cloud computing to increase the run time of various applications. Flexibility, efficiency and lower power enable FPGAs to be important components in modern data centers. Applications such as Secure Function Evaluation (SFE), graph processing, and machine learning are increasingly mapped to FPGA-based adaptable cloud computing platforms. However, due to resource limitations, it is difficult to map applications to only one FPGA. Applications with a streaming data processing pattern can be mapped to a multiple-FPGA platform where the FPGAs are connected in a 1-D or ring topology, thus communications overhead can be pipelined with computations. The communication, merely passing data from boards to boards, will not significantly affect the system performance if the bandwidth is sufficient. In a more general processing pattern involving non-streaming applications, each FPGA is responsible for only a portion of the computation and the FPGAs must keep exchanging data during the run time of the application. The communication cost can be the bottleneck of such a system. The challenge is how to map and parallelize these applications to a multi-FPGA cloud computing platform in such a way that communication is minimized and speedup is maximized.
In this research, we build a framework to map garbled circuit applications, an implementation of SFE, to a cloud computing platform that has FPGA cards attached to computing nodes. The FPGAs on the node are able to communicate directly through the network. The framework consists of two parts: hardware design and software preprocessing. The hardware design integrates with the Xilinx UDP network stack enabling the capability to exchange data through the network and thus bypassing the processor and its software stack. The framework also takes advantage of High Bandwidth Memory (HBM) for high off-chip memory throughput. The levels of memory hierarchy available on the FPGA are used for caching both local data and incoming and outgoing network data. Preprocessing will generate the reordered batches of each layer needed for processing, efficient memory allocation and final memory layout. We also applied an effective partitioning algorithm to schedule executions to different FPGAs to minimize the communication between FPGAs. By generating different size of problems from the EMP-toolkit, we can demonstrate that this hardware-software co-design framework achieves nearly optimal two times speedup on a two-FPGA setup compared to a one-FPGA implementation. We explore extremely large examples that cannot be mapped to one-FPGA, proving that it is achievable to map large examples of billions of operations to this distributed heterogeneous system.

Committee:

Prof. Miriam Leeser(advisor)

Prof. Stratis Ioannidis(co-advisor)

Prof. Mieczyslaw Kokar

Details

Date:
November 2, 2022
Time:
2:00 pm - 3:00 pm

Other

Department
Electrical and Computer Engineering
Topics
MS/PhD Thesis Defense
Audience
Faculty, Staff