BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Northeastern University College of Engineering - ECPv6.15.20//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://coe.northeastern.edu
X-WR-CALDESC:Events for Northeastern University College of Engineering
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240819T100000
DTEND;TZID=America/New_York:20240819T110000
DTSTAMP:20260518T130823
CREATED:20240820T141408Z
LAST-MODIFIED:20240820T141408Z
UID:45127-1724061600-1724065200@coe.northeastern.edu
SUMMARY:Yuexi Zhang PhD Dissertation Defense
DESCRIPTION:Name:\nYuexi Zhang \nTitle:\nHuman Action and Event Detection by Leveraging Multi-modality Techniques \nDate:\n8/19/2024 \nTime:\n10:00:00 AM \nCommittee Members:\nProf. Octavia Camps (Advisor) \nProf. Mario Sznaier \nProf. Sarah Ostadabbas \nAbstract:\nHuman Action and Event Analysis with multi-modalities has emerged as a critical area of research in computer vision and machine learning\, driven by the need to understand complex human behaviors in diverse environments. \nA significant advantage of multi-modal analysis is its application in cross-view action recognition\, where activities are observed from different viewpoints. To tackle such a problem\, we propose a flexible frame which is able to integrate diverse modalities(RGB pixels\, 2D/3D key points\, etc.) to overcome the limitations of single-modal approaches. It consists of two branches where a Dynamic Invariant Representation branch (DIR) concentrates on identifying view-invariant properties through key points trajectories while Context Invariant Representation branch(CIR) is to capture the pixel-level view-invariant features. In the meantime\, our approach leverages contrastive learning techniques to enhance the effectiveness of recognition accuracy\, where it enables the model to learn more discriminative and view-invariant features by contrastive positive pairs against negative pairs. The fusion of multi-modal data\, coupled with contrastive learning\, leads to improved accuracy in recognizing actions across various views and environments. Extensive experiments demonstrate the effectiveness of our approach on diverse modalities. Furthermore\, another promising application with multi-modal techniques is zero-shot action detection\, which aims to recognize actions that the model has not been explicitly trained on. Recently\, with language models are quickly developed\, leveraging LLMs in this context has shown significant potentials\, as these models can bridge the gap between seen and unseen actions by understanding and generalizing from textual descriptions. To further explore the problem\, we propose a transformer encoder-decoder architecture with global and local text prompt\, which allowing the model to infer the characteristics of unseen actions based on different textual attributes. We evaluate our approach on different benchmarks to demonstrate advantages.
URL:https://coe.northeastern.edu/event/yuexi-zhang-phd-dissertation-defense/
END:VEVENT
END:VCALENDAR