Compressed Inference

Compressed Inference

Compressed Inference focuses on developing techniques to run large AI models on resource-constrained devices. The project explores model compression, quantization, pruning, and knowledge distillation to reduce model size and computational requirements while maintaining accuracy. This enables deployment of sophisticated AI models on edge devices, mobile platforms, and embedded systems.

Model compression and optimization for edge AI deployment


Team