Second-order pruning methods enable higher sparsity while maintaining accuracy by removing weights that directly affect the loss function the least. The end result is a sparse model with much smaller files, lower latency, and higher throughput. For example, using second-order pruning algorithms, a ResNet-50 image classification model can be pruned 95% while maintaining 99% of the baseline accuracy, decreasing the size of the file from the original 90.8MB to 9.3MB
In this video, we walk through the research, production results, and intuition for how second-order pruning algorithms work. We run through how to apply second-order pruning algorithms for SOTA model compression to your current ML projects.
Speaker: Eldar Kurtić, Research Consultant, Neural Magic
If you have any questions, join us in the Neural Magic Slack community: https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ