Machine Perception and Cognitive Robotics Laboratory

Efficient Neural Network Inference via Structured Pruning on Edge Devices

MPCR Poster Session · Spring 2026 2026

Ganesh Shiwakoti, William Edward Hahn, Ph.D.,

Abstract

We present a structured pruning pipeline that reduces transformer model size by 70% while retaining 95% of baseline accuracy on ImageNet. Our approach combines magnitude-based filter pruning with knowledge distillation, enabling real-time inference on Raspberry Pi 4 and Jetson Nano platforms. We benchmark latency, memory footprint, and accuracy across five model architectures (ResNet-50, MobileNetV3, EfficientNet-B0, ViT-Small, DeiT-Tiny) and demonstrate that structured pruning consistently outperforms unstructured methods for edge deployment.

View Project

pruning edge-computing transformers efficiency

Machine Perception and Cognitive Robotics Laboratory