Neural Network Modules for Computer Vision Systems
ECE Professors Octavia Camps and Mario Sznaier were awarded a $500K NSF grant for "Dynamic and Statistical Based Invariants on Manifolds for Video Analysis."
Abstract Source: NSF
Computer vision systems can benefit society in many ways. For example, spatially distributed vision sensors endowed with activity analysis capabilities can prevent crime, help optimize resource use in smart buildings, and give early warning of serious medical conditions. The most powerful computer vision systems employ an approach called "deep learning", in which simulated networks of neurons transform the input video pixels into high-level concepts. For example, in the crime example, the high-level concept might be "someone breaking into a building". A major impediment to building computer vision is that great expertise and trial-and-error is required for a programmer to design a neural network that can teach itself to recognize the goal concepts. This project will reduce this barrier by creating a set of well-designed neural network modules, or "layers", that a programmer can snap together to build a working computer vision system. Education is proactively integrated into this project, starting with STEM summer camps projects for urban middle school students and continuing at the college level with a multi-disciplinary program that uses the grand challenge of aware environments to link a full range of distinct subjects ranging from computer vision and machine learning to systems theory and optimization. At the graduate level, these activities are complemented by recruitment efforts that leverage the resources at Northeastern's University Program in Multicultural Engineering to broaden the participation of underrepresented groups in research.
Computer vision has made tremendous progress in the era of deep learning. However, training of deep architectures requires learning the optimal value of a very large number of parameters through the numerical minimization of a non-convex loss function. While in practice, using stochastic gradient descent to solve this problem often "works", the analysis of what the network learned or why it failed to do so, remains an a-posteriori task requiring visualization tools to inspect which neurons are firing and possibly to look at intermediate results. This research seeks to address this issue by incorporating a set of structured layers to current deep architectures, designed using dynamical systems theory and statistics fundamentals, which capture spatio-temporal information across multiple scales. At its core is a unified vision, invariants on latent space manifolds as information encapsulators, that emphasizes robustness and computational complexity issues. Advantages of the proposed layers include the ability to easily understand what they learn, since they are based on first principles; shallower networks with a reduction of the number of parameters that needs to be learned due to the high expressive power of the new layers; and requiring less annotated data, by providing efficient ways to transfer knowledge between domains and to synthesize realistic data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.