One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms
Advanced Inference Repo (Paid Lifetime Membership): https://trelis.com/enterprise-server-api-and-inference-guide/
Affiliate Links (support the channel):
- Vast AI - https://cloud.vast.ai/?ref_id=98762
- Runpod - https://tinyurl.com/4b6ecbbn
LLM Updates Newsletter: Trelis.Substack.com
Chapters
0:00 Faster inference with Speculative Decoding
0:22 Video Overview
1:09 How speculative decoding works?
5:45 Naive speculative decoding
7:16 Prompt based n-gram speculation
9:31 Lookahead decoding
11:48 Assisted decoding
13:41 Summary of Decoding Techniques
15:20 Performance Testing
32:28 Summary of Results
34:24 Tips for faster inference