Loading Events

« All Events

  • This event has passed.

ECE PhD Proposal Review: Kunpeng Li

October 27, 2020 @ 3:00 pm - 4:00 pm

PhD Proposal Review: Attention Mechanism in Deep Learning for Visual Recognition

Kunpeng Li

Location: Zoom Link

Abstract: Deep learning models have achieved great success in various tasks for visual recognition such as image classification, semantic segmentation, visual semantic matching etc. Instead of just treating them as black boxes, recently, a tremendous of efforts have been put into the explanations of how these models work and bridging the gap between deep neural networks and human cognition systems. Visual attention is one of the efficient ways to explain the network’s decision by highlighting the regions of images that are responsible for it. It is inspired by the attention mechanism of the human vision system to selectively focus on the salient features in a visual scene.

This thesis is on the visual attention in deep learning for visual recognition. For the first time, we make gradient-based attention maps a natural and explicit component in the training pipeline, such that they are end-to-end trainable. Then, we can provide guidance on the attention maps and guide the network to focus on correct things when learning concepts. Under mild assumptions, our method can be understood as a plug-in to existing convolutional neural networks to improve their generalization performance. Besides, the improved attention maps also help to provide better localization cues for weakly-supervised semantic segmentation task.

Moving a step toward higher-level visual understanding with natural language, we study the effectives of building visual reasoning models on top of the bottom-up attention regions, so that the learnt visual representations can better capture semantic concepts as in its corresponding text caption. Specifically, we first build up connections between attention regions and perform reasoning with Graph Convolutional Networks to generate region features with semantic relationships. Then, we propose to use the gate and memory mechanism to perform global semantic reasoning on these relationship-enhanced region features, select the discriminative information and gradually generate the representation for the whole scene. Evaluations have been conducted on MS-COCO and Flickr30K datasets for the image-text matching task.

Details

Date:
October 27, 2020
Time:
3:00 pm - 4:00 pm
Website:
https://northeastern.zoom.us/j/91474933895

Other

Department
Electrical and Computer Engineering
Topics
MS/PhD Thesis Defense