Compressed Inference

Compressed Inference

Compressed Inference develops quantization, pruning, and distillation techniques that shrink large AI models by orders of magnitude while preserving accuracy. The goal is practical deployment—making state-of-the-art language and vision models run on hardware they were never designed for.

Team