Abstract: Post-training quantization (PTQ) is an effective solution for deploying deep neural networks on edge devices with limited resources. PTQ is especially attractive because it does not require ...
Abstract: Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these ...
We tried out Google’s new family of multi-modal models with variants compact enough to work on local devices. They work well.
Beats Q8_0 perplexity at half the size -- and even beats F16. APEX outperforms Unsloth Dynamic 2.0 (UD) quantizations on perplexity, HellaSwag, and inference speed while being 2x smaller: APEX ...