Jérémy Scheurer

Projects

Research Direction: Evaluations & Science of Deception, Scheming and Situational Awareness

  • Build a better understanding of what makes a model situationally aware by changing parts of a prompt/eval.

  • Automatically generate inputs using RL which adapts a prompt/eval to make a model believe it is real.

  • Build public evals that test models for deceptive capabilities and propensities (focus on aspects that don't exist publicly yet).

What I'm looking for in a Mentee

  • You interact a lot with LLMs (red team the, use claude code and other tools like this etc.)

  • Ideally you have previously built evaluations (even if toy) and are familiar with the process

  • You are a good programmer that can implement ideas fast, get results, and update on them (building tight feedback loops). Ideally, you have experience with using the inspect library.

  • You like empirical work, building evaluations, looking at LLM traces, and are agentic.

Bio

Jérémy Scheurer is a research scientist in the evaluations team at Apollo Research. His work focuses on evaluating language models for deceptive capabilities and propensities. Before that, he contracted with OpenAI and worked in its dangerous capabilities evaluations team. Previously, Jérémy was a Research Scientist at FAR AI (and NYU), collaborating with Ethan Perez, where he published work on learning from language feedback. He has a Master's in CS from ETH Zurich.

Next
Next

Dylan Hadfield-Menell