Wang Is Part of a Collaborative NSF Grant for Deep Neural Networks

Portrait of Wang

ECE Assistant Professor Yanzhi Wang is the Northeastern lead as part of a $1.2M NSF grant, in collaboration with Xuehai Qian and Viktor Prasanna from the University of Southern California, to create a “FASTLEAP: FPGA based compact Deep Learning Platform”.

Computers Can’t Keep Up with Today’s AI. Yanzhi Wang is Working to Change That.

Until deep neural networks (DNNs) can process information within 20 milliseconds or less, they’re simply not that useful for practical applications of machine learning and artificial intelligence on widespread, edge devices, says Assistant Professor Yanzhi Wang, electrical and computer engineering. He is the Northeastern University lead of a project: “FASTLEAP,” a $1.2M National Science Foundation grant that will—as the name implies—get computer processing down to the breakneck speeds needed to make DNNs viable.

A DNN is best described as the “brain” powering AI. Developing beyond its predecessor, the artificial neural network (ANN), this technology can collect and interpret more complex data at speeds comparable to the human brain. To demonstrate this, Wang uses the example of Joseph Redmon’s You Only Look Once (YOLO)—a multi-object detection system. Just as the human brain is able to immediately identify objects based on their appearance, YOLO surveys a scene and labels each piece of it in real-time. Little boxes appear almost immediately around each object as it enters the frame, marked “car,” “motorbike,” “person,” and so on.

The potential applications for DNNs in this way are numerous. For example, Wang says, it may be used to offer smartphone users a real-time translation of speech or text in a language unknown to them. DNNs may also be the superior way to ensure that self-driving cars keep passengers and pedestrians safe. While current autonomous driving systems piloted by Tesla and others combine cameras and radar sensors to operate, the object recognition by those cameras is limited compared to a system powered by a DNN.

As impressive as the technology is, the processing power currently needed to run these systems is immense, making consumer use of the technology unfeasible. “Most people don’t have the number or power of machines needed to compute that fast,” Wang says. “We have to think of other ways to make machines run this AI to bring it to as many people as possible.” Even something as deceptively simple as YOLO couldn’t possibly run at an acceptable level of speed or accuracy on today’s smartphones—and that’s where Wang’s research comes in.

In collaboration with researchers from the University of Southern California, he’s working to help computers catch up to the ever-growing complexity of DNNs. The answer isn’t to make computers more powerful, but to make their processing more efficient. By optimizing the performance of circuits in the hardware, and both compressing and accelerating the model output of the DNN—essentially, paring down the network’s “thinking” to only the absolute necessities—the FASTLEAP platform will enable the use of DNNs on ordinary consumer systems with limited memory.

Wang says getting computers up to speed is just the beginning. Once consumer electronics are able to utilize DNNs with consistent speed and accuracy, software engineers can begin to explore ways to integrate them into our lives—and who knows what they might come up with? “When the iPhone was released, even Apple had no idea how it would eventually be used,” he says. “It’s the same with this technology. We have no idea what it might make possible in the future.”

Abstract Source: NSF

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used because of their high accuracy, excellent scalability, and self-adaptiveness properties. Many applications employ DNNs as the core technology, such as face detection, speech recognition, scene parsing. To meet the high accuracy requirement of various applications, DNN models are becoming deeper and larger, and are evolving at a fast pace. They are computation and memory intensive and pose intensive challenges to the conventional Von Neumann architecture used in computing. The key problem addressed by the project is how to accelerate deep learning, not only inference, but also training and model compression, which have not received enough attention in the prior research. This endeavor has the potential to enable the design of fast and energy-efficient deep learning systems, applications of which are found in our daily lives — ranging from autonomous driving, through mobile devices, to IoT systems, thus benefiting the society at large.

The outcome of this project is FASTLEAP – an Field Programmable Gate Array (FPGA)-based platform for accelerating deep learning. The platform takes in a dataset as an input and outputs a model which is trained, pruned, and mapped on FPGA, optimized for fast inferencing. The project will utilize the emerging FPGA technologies that have access to High Bandwidth Memory (HBM) and consist of floating-point DSP units. In a vertical perspective, FASTLEAP integrates innovations from multiple levels of the whole system stack algorithm, architecture and down to efficient FPGA hardware implementation. In a horizontal perspective, it embraces systematic DNN model compression and associated FPGA-based training, as well as FPGA-based inference acceleration of compressed DNN models. The platform will be delivered as a complete solution, with both the software tool chain and hardware implementation to ensure the ease of use. At algorithm level of FASTLEAP, the proposed Alternating Direction Method of Multipliers for Neural Networks (ADMM-NN) framework, will perform unified weight pruning and quantization, given training data, target accuracy, and target FPGA platform characteristics (performance models, inter-accelerator communication). The training procedure in ADMM-NN is performed on a platform with multiple FPGA accelerators, dictated by the architecture-level optimizations on communication and parallelism. Finally, the optimized FPGA inference design is generated based on the trained DNN model with compression, accounting for FPGA performance modeling. The project will address the following SPX research areas: 1) Algorithms: Bridging the gap between deep learning developments in theory and their system implementations cognizant of performance model of the platform. 2) Applications: Scaling of deep learning for domains such as image processing. 3) Architecture and Systems: Automatic generation of deep learning designs on FPGA optimizing area, energy-efficiency, latency, and throughput.

Related Faculty: Yanzhi Wang

Related Departments:Electrical & Computer Engineering