Activity Speaker Detection ASD

Name: Activity Speaker Detection ASD
Uploaded: May 14, 2026
Duration: 1738 s

Vikash Kumar8 subscribers

16 views

May 14, 2026

28:58

Active Speaker Detection Beyond AVA: Investigating Spurious Correlations and Domain Generalization In this presentation, I discuss Group 10 research on Active Speaker Detection (ASD), focusing on why models trained on the Google AVA Active Speaker Dataset often fail to generalize to real-world datasets such as the Columbia Active Speaker Dataset. Research Motivation Most state-of-the-art ASD models achieve over 90% mAP on AVA, but performance drops significantly when evaluated on Columbia. This work investigates the root cause of this domain gap and identifies inter-face co-occurrence bias as a major shortcut learned by models trained on AVA. 🔬 Techniques Explored The presentation covers: Baseline LR-ASD and TalkNet architectures TalkNCE (supervised contrastive learning) Invariant Risk Minimization (IRM) CaMIB Correlation Independence Regularization (CIR) Attention-based contextual modeling Key Findings AVA contains a strong multi-face co-occurrence pattern. Models may rely on contextual shortcuts instead of true lip-audio synchrony. Explicit regularization alone is often insufficient. Architectural exposure to neighboring faces is the primary source of spurious learning. #IITMandi #DeepLearning #MachineLearning # CS 671

Download

0 formats

No download links available.