Back to Browse

Inside OpenCoder: The Data and Cookbook

581 views
Dec 24, 2024
54:45

Paper 📜 https://arxiv.org/abs/2411.04905 OpenCoder Data 🧠 https://www.oxen.ai/OpenCoder-LLM/opc-sft-stage1 Links + Notes 📝 https://www.oxen.ai/blog/opencoder-the-open-cookbook-for-top-tier-code-llms Join Arxiv Dives 🤿 https://oxen.ai/community Discord 🗿 https://discord.com/invite/s3tBEn7Ptg -- Use Oxen AI 🐂 https://oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. -- Chapters 0:00 Intro 2:19 OpenCoder 8:43 OpenCoder Goals 10:27 Pre-Training Data 11:07 RefineCode 12:41 Raw Code for Pre-Training 13:20 Data Preprocessing 14:05 Data Deduplication 17:05 How Data Deduplication Improved OpenCoder 18:12 Data Transformation 19:05 Data Filtering 31:58 Sampling 36:36 Code-Related Data 39:53 Post Training 48:49 The Two Stages of Instruct Tuning 51:20 Evaluation 53:47 Conclusion & Future Work

Download

0 formats

No download links available.

Inside OpenCoder: The Data and Cookbook | NatokHD