Research
A selection of research conducted by pivotal fellows.
Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI
Aaron Sandoval
Mentor: Cody Rushing (Redwood Research)
Decomposition of Small Transformer Models
Casper L. Christensen
Mentor: Logan Riggs Smith
NeurIPS 2025: Mechanistic Interpretability Workshop
Influence Dynamics and Stagewise Data Attribution
Jin Hwa Lee & Matt Smith
Mentor: Jesse Hoogland (Timaeus)
Activation Probes Are Reliable With a Handful of Positive Examples
Riya Tyagi
Mentor: Stefan Heimersheim (Apollo Research)
NeurIPS 2025: Mechanistic Interpretability Workshop
Bayesian Influence Functions for Hessian-Free Data Attribution
Philipp Alexander Kreer
Mentor: Jesse Hoogland (Timaeus)
ICML 2025: Workshop on High-dimensional Learning Dynamics
Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data
Nicola Croce
Mentor: Tobin South (MIT)
Benchmarking Deception Probes via Black-to-White Performance Boosts
Avi Parrack
Mentor: Stefan Heimersheim (Apollo Research)
ICML: Actionable Interpretability Workshop
Bayesian Influence Functions for Scalable Data Attribution
Philipp Alexander Kreer
Mentor: Jesse Hoogland (Timaeus)
ICML 2025: High-dimensional Learning Dynamics
Safety Features for a Centralised AGI Project
Sarah Hastings-Woodhouse
Mentor: Elliot Jones (Ada Lovelace Institute)
Product Liability as a Model for UK AI Security
Gaurav Yadav
Mentor: Peter Wills (University of New Brunswick)
Forging the Biological Weapon Convention: A Brief History of the Creation of the BWC
Neha Suresh
Mentors: Alex DeMarsh (RAND) & Gabby Essix (NTI)
The Pandora Report
Will the US Government Control the First AGI?—Finding Base Rates
Luise Woehlke
Mentor: Christian Ruhl (Founders Pledge)
Sharing the AI Windfall: A Strategic Approach to International Benefit-Sharing
Michel Justen
Mentors: Matthew van der Merwe (GovAI) & Max Dalton (Forethought)
The Role of AI Safety Institutes in Contributing to International Standards for Frontier AI Safety
Kristina Fort
Mentor: Oliver Guest (IAPS)
