Llama 1-bit quantization - why NVIDIA should be scared

Name: Llama 1-bit quantization - why NVIDIA should be scared
Uploaded: Mar 1, 2024
Duration: 368 s
Description: New research has dropped showing how the Llama model can be drastically shrunk without reducing output quality. This new method means it can take advantages of specialized hardware and perform so much faster than before that Nvidia should be scared. This video is based on this paper: https://arxiv.org/pdf/2402.17764.pdf

George Xian1.22K subscribers

26.8K views

Mar 1, 2024

6:08

New research has dropped showing how the Llama model can be drastically shrunk without reducing output quality. This new method means it can take advantages of specialized hardware and perform so much faster than before that Nvidia should be scared. This video is based on this paper: https://arxiv.org/pdf/2402.17764.pdf

Download

0 formats

No download links available.