- This event has passed.
ECE PhD Dissertation Defense: Joseph Robinson
November 24, 2020 @ 2:00 pm - 3:00 pm
PhD Dissertation Defense: Automatic Face Understanding: Recognizing Families in Photos
Location: Zoom Link
Abstract: Visual kinship recognition has an abundance of practical uses. For this, we built the largest database for kinship recognition, FIW. Built entirely in-house with no cost using a semi-automatic labeling scheme. Specifically, we first aligned faces detected in family photos with names in the corresponding text metadata to mine the label proposals with high confidence. The remaining data were labeled using a novel clustering algorithm that used label proposals as side information to guide more accurate clusters. Great savings in time and human input was had. Statistically, FIW shows enormous gains over its predecessors. We have several benchmarks in kinship verification, family classification, tri-subject verification, and large-scale search & retrieval. We also trained CNNs on FIW and deployed the model on the renowned KinWild I and II to gain state-of-the-art (SOTA). Most recently, we further augmented FIW with multimedia (MM) for 200 of its 1,000 families- a labeled collection we dubbed FIW-MM. Now, video dynamics, audio, and text captions can be used in the decision making of kinship recognition systems.
FIW continues to pave the way for this research track: (1) advanced SOTA (e.g., marginalized denoising auto-encoder based on metric learning that preserves intrinsic structures of kin-data and encapsulates discriminating information in learned features); (2) introduced generative models to predict a child’s appearance from a parent pair (i.e., proposed an adversarial autoencoder conditioned on age and gender to map between facial appearance and these higher-level features for control of age and gender); (3) designed evaluations with benchmarks to support challenges, workshops, and tutorials at top tier conferences (e.g., CVPR, MM, FG, ICME), and a premiere Kaggle Competition. We expect FIW will significantly impact research and reality.
Additionally, we tackled the classic problem of facial landmark localization in images. This is a task that has been in focus for decades, and many solutions have been proposed. However, there are revamped interests in pushing facial landmark detection technologies to handle more challenging data with deep networks now prevailing throughout machine learning. A majority of these networks have objectives based on L1 or L2 norms, which inherit several disadvantages. First of all, the locations of landmarks are determined from generated heatmaps (i.e., confidence maps) from which predicted landmark locations (i.e., the means) get penalized without accounting for the spread: a high scatter corresponds to low confidence and vice-versa. To address this, we introduced a LaplaceKL objective that penalizes for low confidence. Another issue is a dependency on labeled data, which is expensive to collect and susceptible to error. We addressed both issues by proposing an adversarial training framework that leverages unlabeled data to improve model performance. Our method claims SOTA on renowned benchmarks. Furthermore, our model is robust with a reduced size: 1/8 the number of channels (i.e., 0.0398 MB) is comparable to state-of-that-art in real-time on a CPU. Thus, our method is of high practical value to real-life applications.
Finally, we built the Balanced Faces in the Wild (BFW) dataset to serve as a proxy to measure bias across ethnicity and gender subgroups, allowing us to characterize FR performances per subgroup. We show performances are non-optimal when a single score threshold is used to determine whether sample pairs are genuine or imposter. Furthermore, actual performance ratings vary greatly from the reported across subgroups. Thus, claims of specific error rates only hold for populations matching that of the validation data. We mitigate the imbalanced performances using a novel domain adaptation learning scheme on the facial encodings extracted using SOTA deep nets. Not only does this technique balance performance, but it also boosts the overall performance. A benefit of the proposed is to preserve identity information in facial features while removing demographic knowledge in the lower dimensional features. The removal of demographic knowledge prevents future potential biases from being injected into decision making. Additionally, privacy concerns are satisfied by this removal. We explore why this works qualitatively with hard samples. We also show quantitatively that subgroup classifiers can no longer learn from the encodings mapped by the proposed.