[Open DMQA Seminar] Multimodal Learning

Name: [Open DMQA Seminar] Multimodal Learning
Uploaded: Nov 7, 2021
Duration: 1814 s

‍김성범[ 교수 / 산업경영공학부 ]28.8K subscribers

7.3K views

Nov 7, 2021

30:14

최근 딥러닝 알고리즘과 컴퓨팅 파워의 발전으로 vision, text, audio 등 다양한 데이터 형태에서 우수한 분류/인식 성능을 보여주고 있다. 그러나 인간 행동 인식(human activity recognition) 문제나 감정인식 문제 등에서는 비디오, 오디오, 텍스트 등의 여러 데이터 형태(multimodal data)를 함께 활용해야 더욱 정교한 분석이 가능하다. 본 세미나에서는 multimodal learning의 연구흐름을 파악해보고, 최근 어떠한 방식으로 학습하며 각 데이터 형태의 특징을 어떻게 병합하는 지에 대해 소개하도록 하겠다. 참고 문헌: [1] Hou, J. C., Wang, S. S., Lai, Y. H., Tsao, Y., Chang, H. W., & Wang, H. M. (2018). Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(2), 117-128. [2] Rastgoo, M. N., Nakisa, B., Maire, F., Rakotonirainy, A., & Chandran, V. (2019). Automatic driver stress level classification using multimodal deep learning. Expert Systems with Applications, 138, 112793. [3] Ma, Y., Hao, Y., Chen, M., Chen, J., Lu, P., & Košir, A. (2019). Audio-visual emotion fusion (AVEF): A deep efficient weighted approach. Information Fusion, 46, 184-192. [4] Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L. P. (2017). Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250. [5] Akbari, H., Yuan, L., Qian, R., Chuang, W. H., Chang, S. F., Cui, Y., & Gong, B. (2021). Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. arXiv preprint arXiv:2104.11178.

Download

0 formats

No download links available.