2 Comments
User's avatar
Grigory Sapunov's avatar

That's a good point, thanks!

16 bits may be way too much, many new models are discretized down to almost 1 bit, so an order of magnitude may lie here.

It's a pity the lottery ticket hypothesis had not yet lead to significant advances. There might be a good combination of distillation + sparsity + quantization

Expand full comment
Misha Belkin's avatar

Right, we may be pretty close already.

I was actually considering whether to suggest that 100 million is too much and 10 million may be enough but decided it was unnecessarily speculative.

Expand full comment