Machine Perception and Cognitive Robotics Laboratory

AI Alignment

AI Alignment investigates failure modes in large language models through ablation-based analysis and security vulnerability discovery. The project combines adversarial red-teaming with alignment research, systematically probing models to understand how they break and how to make them safer.

Model Card

Ablation-based testing framework. Systematically removes or modifies model components (attention heads, layers, embeddings) and measures behavioral changes. Combines automated probing with manual vulnerability analysis.
Adversarial prompt datasets, safety benchmarks, and custom ablation test suites.
Identifies specific model components responsible for safety behaviors and maps failure modes under targeted ablation.
Ablation analysis is model-specific — findings may not transfer across architectures. Testing coverage is inherently incomplete.

Team

Susan Schneider, Ph.D.

Co-Principal Investigator (Co-PI)
Distinguished Professor

[email protected]

Machine Perception and Cognitive Robotics Laboratory