- This event has passed.
ECE PhD Proposal Review: Wenqian Liu
October 21, 2020 @ 9:00 am - 10:00 am
PhD Proposal Review: Explainable Efficient Models for Computer Vision Applications
Location: Zoom Link
Abstract: State of the art deep learning based models, such as Convolutional Neural Networks (CNNs) and generative models, achieve impressive results, but with their great performance comes great complexity and opacity, huge parametric spaces and little explainability. The criticality of model explainability and output interpretability, manifests clearly in real-time critical decision making processes and human-centred applications, such as in healthcare, security and insurance.
Explainability and interpretability are tackled in this thesis, as intrinsic qualities in the model architecture as well as post-hoc improvement on existing models.
In the area of frame prediction in video sequences, we introduce DYAN, a novel network with very few parameters, that is easy to train and produces accurate high quality frame predictions and more compact than previous approaches. Another key aspect of DYAN is interpretability, as its encoder-decoder architecture is designed following concepts from systems identification theory and exploits the dynamics-based invariants of the data. We also introduce KW-DYAN, an extension of DYAN that tackles the issue of time lagging in video predictions, by implementing a novel way of quantifying prediction timeliness and proposing a new recurrent network for adaptive temporal sequence prediction that employs a warping module to reduce dynamic changes and a Kalman filtering module to detect dynamic changes in video frames. The experimental results show the reduced lagging across the tested Caltech dataset and the UCF dataset, while also performing well in other commonly used metrics.
In the area of image classification, categorization and scene understanding, we observe that techniques such as gradient-based visual attentions have driven much recent efforts in using visual attention maps as a mean for visual explanations of Convolution Neural Networks (CNNs), with impressive results but fail to extend to explaining generative models, e.g Variational Autoencoders (VAEs) as efficiently. In this thesis we bridge this crucial gap, and propose the first technique to visually explain VAEs by means of gradient-based attentions, with methods to generate visual attentions from the learned latent space, and also demonstrate such attention explanations serve more than just explaining VAEs. We show how these attention maps can be used to localize anomalies in images, conducting state-of-the-art performance on the MVTec-AD dataset. We also show how they can be infused into model training, helping bootstrap the VAEs into learning disentangled latent space, as proved on the Dsprites dataset.