This is Lecture 24 of our series in prosody. For synthesized speech, proper prosody is not just nice-to-have, but essential for good intelligibility. We survey 40 years of work, including recent deep learning approaches whose output can be better than human speech for some purposes. Nevertheless, more work is needed to attain controllability, pragmatic expressiveness, and effectiveness for dialog applications.
00:00 Prosody for Intelligibility
01:48 Rule-Based Synthesizers
02:44 Statistical Synthesis with Hidden Markov Models
03:30 End-to-End Speech Synthesizers
04:42 Machine Learning Process Overview
05:33 Loss Functions and Synthesis Quality
06:10 Variant Architectures
06:53 Limitations of the State of the Art
07:34 Beyond Text-to-Speech
08:01 Toward Synthesis for Dialog Applications
09:04 Toward Better Models through Disentanglement