How Vision LLMs Reasoning: Inside LLaVa CoT
Here we go into the data and training of LLaVa-CoT including multiple datasets, synthetic data generation, and inference-time scaling. -- Image-CoT-1m Repo https://www.oxen.ai/datasets/Image-CoT-1m Visual LLMs Repo https://www.oxen.ai/collections/datasets/visual-llms Paper 📜 https://arxiv.org/abs/2411.10440 Links, Data, + Notes 📝 https://www.oxen.ai/blog/llava-cot-let-vision-language-models-reason-step-by-step-2 Join Arxiv Dives 🤿 https://oxen.ai/community Discord 🗿 https://discord.com/invite/s3tBEn7Ptg -- Oxen AI 🐂 https://oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. -- Chapters 0:00 Intro 1:00 Overview of VLLMs 2:20 Why VLLMs Need Reasoning 3:19 LLaVa Chain of Thought 5:16 Synthetic Data Generation 9:19 Generating Datasets 10:35 Where to find the Datasets 11:39 How we Generated the Synthetic Data 17:57 Questions 21:54 What is Inference-Time Scaling? 32:50 Model Training
Download
0 formatsNo download links available.