AI Alignment

AI Alignment

AI Alignment investigates failure modes in large language models through ablation-based analysis and security vulnerability discovery. The project combines adversarial red-teaming with alignment research, systematically probing models to understand how they break and how to make them safer.

Model Card



Team


Susan Schneider, Ph.D.

Co-Principal Investigator (Co-PI)
Distinguished Professor