Quantization Python - Search News

Endee Launches Managed Cloud for its Open-Source Vector Database with Generous Free Tier

The open-source vector database Endee.io, that is well known for its Ultra High performance with 10x lower Infra, is ...

Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot

Shadow AI 2.0 isn’t a hypothetical future, it’s a predictable consequence of fast hardware, easy distribution, and developer ...

eWeek

Gemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models

Google unveils Gemma 4 under an Apache 2.0 license, boosting enterprise adoption of efficient, multimodal AI models across ...

GitHub

Python implementation of the TurboQuant and QJL vector quantization algorithms.

turboquant-py implements the TurboQuant and QJL vector quantization algorithms from Google Research (ICLR 2026 / AISTATS 2026). It compresses high-dimensional floating-point vectors to 1-4 bits per ...

Ars Technica

Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

Even if you don’t know much about the inner workings of generative AI models, you probably know they need a lot of memory. Hence, it is currently almost impossible to buy a measly stick of RAM without ...

IEEE

Lightweight Adaptive Quantization Algorithms for Federated Learning With Heterogeneous Clients

Abstract: Quantization is a common method to improve communication efficiency in federated learning (FL) by compressing the gradients that clients upload. Currently, most application scenarios involve ...

Semiconductor Engineering

Balancing Training, Quantization, And Hardware Integration In NPUs

Experts At The Table: AI/ML is driving a steep ramp in neural processing unit (NPU) design activity for everything from data centers to edge devices such as PCs and smartphones. Semiconductor ...

GitHub

Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization

Oaken is an accleration solution that achieves high accuracy and high performance simultaneously through co-designing algorithm and hardware, leveraging online ...

Forbes

How Mixed-Precision Quantization Could Break AI’s Power Addiction

It turns out the rapid growth of AI has a massive downside: namely, spiraling power consumption, strained infrastructure and runaway environmental damage. It’s clear the status quo won’t cut it ...

Hackaday

Making The Smallest And Dumbest LLM With Extreme Quantization

The reason why large language models are called ‘large’ is not because of how smart they are, but as a factor of their sheer size in bytes. At billions of parameters at four bytes each, they pose a ...

Ars Technica

2025 Nobel Prize in Physics awarded for macroscale quantum tunneling

The 2025 Nobel Prize in Physics has been awarded to John Clarke, Michel H. Devoret, and John M. Martinis “for the discovery of macroscopic quantum tunneling and energy quantization in an electrical ...

VentureBeat

Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware

Huawei’s Computing Systems Lab in Zurich has introduced a new open-source quantization method for large language models (LLMs) aimed at reducing memory demands without sacrificing output quality.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results