Training Recursive Models - A Frontier in Adaptive Compute
Github Repo (see `recursive` branch): https://github.com/TrelisResearch/nanochat One-click Runpod Template: https://console.runpod.io/deploy?template=ikas3s2cii&ref=jmfkcdio Original Karpathy Repo: https://github.com/karpathy/nanochat Wandb logs: https://wandb.ai/trelis/nanochat Model repo: https://huggingface.co/Trelis/nanochat-recursive References: - Recursion & Adaptive Compute: https://arxiv.org/abs/2502.05171 - TRM Paper: https://arxiv.org/abs/2510.04871 Tip: If you subscribe here on YouTube, click the bell to be notified of new vids Learn more about Trelis ADVANCED-fine-tuning repo: https://trelis.com/advanced-fine-tuning 💡 Done-for-you Custom Fine-tuning Services Learn More: https://trelis.com/fine-tuning-services/ 💸 Starting a New Project/Venture? Apply for a Trelis Grant: https://trelis.com/trelis-ai-grants/ 📧 Get Trelis AI Tutorials by Email Subscribe on Substack: https://trelis.substack.com TIMESTAMPS: 0:00 Recursive Nanochat - a comparison with Karpathy’s 500M parameter model 3:07 Benchmark Results on Recursive Nanochat 5:54 What are the benefits of recursive models? 6:52 Recursive Models allow inference on smaller devices and fewer GPUs 8:07 Recursive Models open a pathway to adaptive compute 9:39 Recursive vs Non-recursive Architecture 13:48 How to handle the recursive stream via an adapter 16:49 Training for adaptive compute / recursions - Poisson log-normal recursion sampling 18:14 Handling torch.compile with recursive models 20:07 Implementing adaptive compute (stopping recursions early) 22:14 kv cache strategies for recursive models 24:16 Inference engine (vLLM) implications for recursive models 26:20 Training dynamics of recursive models (Wandb overview), incl. flops utilisation 31:21 Code Review of Trelis/nanochat 32:05 Truncated backpropagation through time 33:56 Recursive loop adapter initialisation 35:25 Dynamic torch compile 37:06 Wrap up
Download
0 formatsNo download links available.