Jérémy Scheurer

Evaluations

14 Nov

Written By Tobias Häberli

Projects

Research Direction: Evaluations & Science of Deception, Scheming and Situational Awareness

Build a better understanding of what makes a model situationally aware by changing parts of a prompt/eval.
Automatically generate inputs using RL which adapts a prompt/eval to make a model believe it is real.
Build public evals that test models for deceptive capabilities and propensities (focus on aspects that don't exist publicly yet).

What I'm looking for in a Mentee

You interact a lot with LLMs (red team, use claude code and other tools like this etc.)
Ideally you have previously built evaluations (even if toy) and are familiar with the process
You are a good programmer that can implement ideas fast, get results, and update on them (building tight feedback loops). Ideally, you have experience with using the inspect library.
You like empirical work, building evaluations, looking at LLM traces, and are agentic.

Bio

Jérémy Scheurer is a research scientist in the evaluations team at Apollo Research. His work focuses on evaluating language models for deceptive capabilities and propensities. Before that, he contracted with OpenAI and worked in its dangerous capabilities evaluations team. Previously, Jérémy was a Research Scientist at FAR AI (and NYU), collaborating with Ethan Perez, where he published work on learning from language feedback. He has a Master's in CS from ETH Zurich.

EvaluationsScheming

Tobias Häberli

Jérémy Scheurer

Projects

What I'm looking for in a Mentee

Bio

Fazl Barez

Dylan Hadfield-Menell