Loading Events

« All Events

  • This event has passed.

Mengshu Sun’s PhD Dissertation Defense

August 17, 2022 @ 12:00 pm - 1:00 pm

“Deep Learning Acceleration on Edge Devices with Algorithm/Hardware Co-Design”

Abstract:

As deep learning has succeeded in a broad range of applications in recent years, there is an increasing trend towards deploying deep neural networks (DNNs) on edge devices such as FPGAs and mobile phones. However, there exists a significant gap between the extraordinary accuracy of state-of-the-art DNNs and efficient implementations on edge devices, due to their limited resources for DNNs with high computation and memory intensity. With the target of simultaneously accelerating the inference and maintaining the accuracy of DNNs, efficient implementations are investigated of deep learning on low-power and resource-constrained devices, by presenting algorithm/hardware co-design frameworks that incorporate hardware-friendly DNN compression algorithms with hardware design optimizations.
First, the DNN compression algorithms are explored, leveraging quantization and weight pruning techniques. As for quantization, intra-layer mixed precision/scheme weight quantization is proposed to boost utilization of heterogeneous FPGA resources and therefore improving the FPGA throughput, by assigning multiple precisions and/or multiple schemes at the filter level within each layer and maintaining the same ratio of filters across all the layers for each type of quantization assignment. As for weight pruning, novel structured and fined-grained sparsity schemes are proposed and obtained with the reweighted regularization pruning algorithm, and then incorporated into acceleration frameworks on FPGAs to make the acceleration rate of sparse models approach the pruning rate of the number of operations.
Second, the hardware implementations are studied, proposing an automatic DNN acceleration framework to generate DNN accelerators to satisfy a target frame rate (FPS). Unlike previous approaches that start from model compression and then optimizing the FPS for hardware implementations, this automatic framework will provide an estimation of the FPS with the FPGA resource utilization analysis and performance analysis modules, and the bit-width is reduced until the target FPS is met and the mixing ratio for quantization precisions/schemes is automatically determined to guide the quantization process and the accelerator implementation on hardware. A resource utilization model is developed to overcome the difficulty in estimating the LUT consumption, and a novel computing engine for DNNs is designed with various optimization techniques in support of DNN compression to improve the computation parallelism and resource utilization efficiency.

Committee:

Prof. Xue Lin (Advisor)
Prof. Miriam Leeser
Prof. Xiaolin Xu

Details

Date:
August 17, 2022
Time:
12:00 pm - 1:00 pm
Website:
https://northeastern.zoom.us/j/96514568080

Other

Department
Electrical and Computer Engineering
Topics
MS/PhD Thesis Defense
Audience
Faculty, Staff