Presented by Wei-Ning Hsu (Meta AI) on February 11, 2022.
Abstract:
Self-supervised learning (SSL) has garnered wide interest in the speech community recently. By leveraging large amounts of unlabeled raw data that can be easily obtained, it shows the potential of greatly reducing the amount of annotation required for building many speech applications like ASR, and even creating new applications that were thought impossible such as text-free speech-to-speech translation.
In this talk, I will share a simple and effective self-supervised learning framework for unimodal and multimodal speech called Hidden unit BERT (HuBERT), which combines iterative acoustic unit discovery with masked prediction. I will then present a series of applications that are built upon HuBERT, including inference tasks (ASR, SUPERB), reconstruction tasks (speech codec), and generative tasks (text-free spoken language modeling and speech-to-speech translation).
Download
0 formats
No download links available.
LTI Colloquium: Self-Supervised Learning for Speech | NatokHD