Loading Events

« All Events

  • This event has passed.

ECE PhD Proposal Review: Xiaolong Ma

July 7, 2021 @ 2:00 pm - 3:00 pm

PhD Proposal Review: Towards Efficient Deep Neural Network Execution with Model Compression and Platform-specific Optimization

Xiaolong Ma

Location: Zoom

Abstract: Deep learning or deep neural networks (DNNs) have become the fundamental element and core enabler of ubiquitous artificial intelligence. Recently, with the emergence of a spectrum of high-end mobile devices, many deep learning applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing DNN inference is still challenging considering the high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss. To solve the problem, we propose a compression-compilation co-optimization framework, which includes 1) a new dimension, fine-grained pruning patterns inside the coarse-grained structures that achieves accuracy enhancement and preserve the structural regularity that can be leveraged for hardware acceleration, 2) a pattern-aware pruning framework that achieves pattern library extraction, pattern selection, pattern and connectivity pruning and weight training simultaneously, and 3) a set of thorough architecture-aware compiler/code generation-based optimizations, i.e., filter kernel reordering, compressed weight storage, register load redundancy elimination, and parameter auto-tuning for real-time execution of the mainstream DNN applications on the mobile platforms. Evaluation results demonstrate that our framework outperforms three state-of-the-art end-to-end DNN frameworks, TensorFlow Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 44.5x, 11.4x, and 7.1x, respectively, with no accuracy compromise. Real-time inference of representative large-scale DNNs (e.g., VGG-16, ResNet-50) can be achieved using mobile devices.

Details

Date:
July 7, 2021
Time:
2:00 pm - 3:00 pm
Website:
https://northeastern.zoom.us/j/97213420494#success

Other

Department
Electrical and Computer Engineering
Topics
MS/PhD Thesis Defense