Tether’s AI Research Group released a production implementation of TurboQuant on Monday, a Google Research memory compression technique that lets large AI models run on consumer hardware by shrinking the working memory they use during long sessions
What’s the Scoop?
-
What TurboQuant Does: TurboQuant compresses the KV cache, the working memory an AI model uses to remember context during a session, by up to 5x while keeping output quality largely intact. The KV cache is one of the main reasons long AI sessions often need to run in the cloud.
-
Practical Impact: With TurboQuant, AI assistants on laptops, phones, and edge devices can work with long documents, large codebases, and extended conversations without sending everything to cloud providers. Tether says this could support use cases like reviewing legal documents, running coding assistants across full repositories, keeping memory in tutoring or research sessions, and processing sensitive files locally.
-
The QVAC Stack: TurboQuant is part of QVAC SDK 0.12.0, Tether’s open-source toolkit for running AI locally. The SDK gives developers tools to shrink models, connect them to common inference frameworks, follow documentation, and pick settings for different workloads outside data centers.
-
Tether’s AI Strategy: Tether has become one of the crypto companies most focused on local and open-source AI tooling, with QVAC now covering local inference, mobile fine-tuning, health apps, and medical AI models. Recent releases include QVAC Workbench for private on-device AI, QVAC Health for local wellness tracking, and QVAC MedPsy, a medical AI model family built to run on phones, wearables, and other edge devices without depending on the cloud.



















