Microsoft VibeVoice Tutorial: 90-Minute Multi-Speaker TTS
In this video, I take a first hands-on look at @Microsoft VibeVoice, an open-source text-to-speech (TTS / Speech synthesis / AI Voice) framework designed for expressive, long-form, multi-speaker audio generation. VibeVoice supports coherent speech generation of up to 90 minutes in a single run, multiple speakers, and real-time streaming input. In December 2025, Microsoft released several important updates, including the open-sourcing of VibeVoice Realtime 0.5B and the addition of new experimental multilingual and English style voices. 00:00 Intro 00:50 Github Repo 05:00 Audio Samples on Github Page 06:24 Huggingface README 07:35 Using Google Colab for TTS 10:20 Generate test audio samples 12:40 Local Installation 16:20 How to use multiple speaker in text script?! 18:00 Summary and Outro * https://github.com/microsoft/VibeVoice/ * https://microsoft.github.io/VibeVoice/ * https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B * https://colab.research.google.com/github/microsoft/VibeVoice/blob/main/demo/vibevoice_realtime_colab.ipynb * https://github.com/microsoft/VibeVoice/issues/53 * https://github.com/microsoft/VibeVoice/blob/main/docs/vibevoice-realtime-0.5b.md#usages --- - https://www.Thorsten-Voice.de - https://github.com/thorstenMueller/Thorsten-Voice/
Download
0 formatsNo download links available.