Discussion about this post

User's avatar
Grigory Sapunov's avatar

That's a good point, thanks!

16 bits may be way too much, many new models are discretized down to almost 1 bit, so an order of magnitude may lie here.

It's a pity the lottery ticket hypothesis had not yet lead to significant advances. There might be a good combination of distillation + sparsity + quantization

Expand full comment
1 more comment...

No posts