Activity Speaker Detection ASD
Active Speaker Detection Beyond AVA: Investigating Spurious Correlations and Domain Generalization In this presentation, I discuss Group 10 research on Active Speaker Detection (ASD), focusing on why models trained on the Google AVA Active Speaker Dataset often fail to generalize to real-world datasets such as the Columbia Active Speaker Dataset. Research Motivation Most state-of-the-art ASD models achieve over 90% mAP on AVA, but performance drops significantly when evaluated on Columbia. This work investigates the root cause of this domain gap and identifies inter-face co-occurrence bias as a major shortcut learned by models trained on AVA. 🔬 Techniques Explored The presentation covers: Baseline LR-ASD and TalkNet architectures TalkNCE (supervised contrastive learning) Invariant Risk Minimization (IRM) CaMIB Correlation Independence Regularization (CIR) Attention-based contextual modeling Key Findings AVA contains a strong multi-face co-occurrence pattern. Models may rely on contextual shortcuts instead of true lip-audio synchrony. Explicit regularization alone is often insufficient. Architectural exposure to neighboring faces is the primary source of spurious learning. #IITMandi #DeepLearning #MachineLearning # CS 671
Download
0 formatsNo download links available.