Audio Machine Learning: Waveforms, Spectrograms and Feature Extraction
Audio machine learning; waveforms vs spectrograms; mel spectrograms; STFT; MFCC; audio signal processing; speech recognition and feature extraction for AI models In this Tech Experts Webinar, Mateusz Szymański, Senior Machine Learning Engineer, explains how different audio representations shape the behavior and performance of machine learning models. 🔉 The talk walks through the two most common approaches — raw waveforms and spectrograms — and discusses how representation choice affects model quality, training cost, reconstruction ability, and downstream task performance 🔊 The session also discusses where large multimodal models fit into audio processing workflows — and where specialized audio models still make more sense. If you have questions for Mateusz, feel free to comment below. 💭 🔗 Check out our website: https://deepsense.ai/?utm_source=YouTube&utm_medium=Video_14_05_2026&utm_campaign=Opis 🔗 Linkedin: https://www.linkedin.com/showcase/applied-ai-insider 00:00 Audio representations in ML 02:01 Why representation choice matters for audio AI 05:31 Understanding sound, waveforms and Fourier transform 07:53 Working with raw waveforms for audio models 12:04 Spectrograms, STFT and frequency-time trade-offs 17:13 Practical guide: choosing waveforms vs spectrograms #AudioAI #MachineLearning #AudioProcessing #DeepLearning
Download
0 formatsNo download links available.