How Vision LLMs Reasoning: Inside LLaVa CoT

Name: How Vision LLMs Reasoning: Inside LLaVa CoT
Uploaded: Dec 10, 2024
Duration: 2234 s

Oxen6.91K subscribers

515 views

Dec 10, 2024

37:14

Here we go into the data and training of LLaVa-CoT including multiple datasets, synthetic data generation, and inference-time scaling. -- Image-CoT-1m Repo https://www.oxen.ai/datasets/Image-CoT-1m Visual LLMs Repo https://www.oxen.ai/collections/datasets/visual-llms Paper 📜 https://arxiv.org/abs/2411.10440 Links, Data, + Notes 📝 https://www.oxen.ai/blog/llava-cot-let-vision-language-models-reason-step-by-step-2 Join Arxiv Dives 🤿 https://oxen.ai/community Discord 🗿 https://discord.com/invite/s3tBEn7Ptg -- Oxen AI 🐂 https://oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. -- Chapters 0:00 Intro 1:00 Overview of VLLMs 2:20 Why VLLMs Need Reasoning 3:19 LLaVa Chain of Thought 5:16 Synthetic Data Generation 9:19 Generating Datasets 10:35 Where to find the Datasets 11:39 How we Generated the Synthetic Data 17:57 Questions 21:54 What is Inference-Time Scaling? 32:50 Model Training

Download

0 formats

No download links available.