Using Visual Correspondences To See the World in Motion
ECE Associate Professor Xue “Shelley” Lin, in collaboration with Khoury Assistant Professor Huaizu Jiang, was awarded a $600,000 NSF grant for “Toward Efficient and Robust Dynamic Scene Understanding Based on Visual Correspondences.”
Abstract Source: NSF
Finding correspondences is a fundamental problem in computer vision; visual correspondences provide useful cues for a machine to understand its dynamic surroundings in a manner similar to what humans do. For instance, as an agent moves around, it may learn that objects that are far away like mountains typically do not move much, whereas nearby buildings and bushes appear to move rapidly in the environments as the agent changes position relative to them. Although significant advances have been made in solving various forms of visual correspondence problems, different correspondence models maintain different designs despite their inherent similarity, making the effective design principles and the learned representations difficult to transfer from one problem to another. In response to this challenge, this project aims to solve disparate visual correspondence problems with a unified model. In doing so, the project will also address two practical aspects of implementation of the developed models in scenarios with diverse visual appearance and significant resource constraints. These advances are expected to unlock novel applications and improve dynamic scene understanding in the areas of Augmented Reality, sports broadcasting, sports analytics, robotics, etc. The project outcomes may also unveil new markets and economic opportunities through solutions that augment cognitive and physical abilities of users in their daily lives. The team of researchers will actively integrate proposed research into the curriculum development and attract undergraduate researchers to the project. This project is particularly well-suited for outreach activities to broaden participation of underrepresented and K-12 students, by connecting abstract technical concepts with tangible research demonstrations.
The project has three tightly connected thrusts, presenting fundamental advances in correspondence determination, in applications of these correspondences, and in making these algorithms efficient and robust in deployment. Concretely, first, a unified model to solve all the visual correspondence problems, ranging from 2D to 3D, will be developed, taking advantage of recent progress of the Transformer model and self-supervised learning from large-scale unlabeled data. The Transformer model naturally captures the correspondences of candidates with less inductive bias, making it a better choice to learn from the large-scale data and improve accuracy of data-poor domains when transferred from data-rich ones. Second, with the correspondences, novel applications will be unlocked to advance dynamic scene understanding, particularly for slow-motion video synthesis and robotic obstacle avoidance. Finally, the investigators will study mechanisms to improve efficiency and robustness when deploying the models on edge computing devices. The developed algorithms will be rigorously evaluated on standard benchmarks and in real-world deployment on edge devices.