16 bits may be way too much, many new models are discretized down to almost 1 bit, so an order of magnitude may lie here.
It's a pity the lottery ticket hypothesis had not yet lead to significant advances. There might be a good combination of distillation + sparsity + quantization
That's a good point, thanks!
16 bits may be way too much, many new models are discretized down to almost 1 bit, so an order of magnitude may lie here.
It's a pity the lottery ticket hypothesis had not yet lead to significant advances. There might be a good combination of distillation + sparsity + quantization
Right, we may be pretty close already.
I was actually considering whether to suggest that 100 million is too much and 10 million may be enough but decided it was unnecessarily speculative.