Quantization - Search News

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

Tether successfully integrated Google’s TurboQuant into the inference engine of its local AI framework, QVAC. It is the ...

29d

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Using special tags embedded in the output, the model directly links every factual claim it makes to the specific source document or database row it pulled the information from.

Morning Overview on MSN

Google unveiled TurboQuant, a method that cuts the memory bottleneck slowing large AI models

Companies running large language models face a persistent bottleneck: the memory consumed by key-value caches during ...

13d

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

You can now download Gemma 4 models with quantization-aware training to reduce the amount of mobile memory required to 1GB.

Nota AI Has Two MoE Quantization Papers Accepted at ICML 2026 Workshop, Demonstrating Global Competitiveness in Large-Scale AI Optimization

Nota AI, a company specializing in AI model compression and optimization, announced that two of its papers on MoE-specific ...

InfoWorld

Show inaccessible results

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Google unveiled TurboQuant, a method that cuts the memory bottleneck slowing large AI models

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

Nota AI Has Two MoE Quantization Papers Accepted at ICML 2026 Workshop, Demonstrating Global Competitiveness in Large-Scale AI Optimization

What is model quantization? Smaller, faster LLMs

Elastic Introduces Better Binary Quantization Technique in Elasticsearch

How Mixed-Precision Quantization Could Break AI’s Power Addiction