Cozmin Ududec

15 Apr

Projects

Research Direction: Persona dynamics over long interactions

Context: Language models do not start each conversational turn from scratch. Their behavioral tendencies carry over from prior turns, and their own generated text becomes part of the context that shapes subsequent behavior. Over long interactions, this can create a feedback loop: the model’s current persona generates outputs that serve as evidence for that same persona, potentially entrenching behavioral modes.

Project: This project investigates persona dynamics over extended model interactions, focusing on the role of self-generated content as a driver of behavior. We will track how a model’s effective persona evolves turn by turn across long conversations, measuring state-dependence and drift trajectories of personas. A central question is how self-generated content compares to similar externally provided context, whether the resulting drift stabilizes, compounds, or reverses, and whether certain regions of persona space act as attractors. A second direction is how the feedback between persona selection and self-generated context interacts with conversation length and topic: does open-ended dialogue produce qualitatively different persona trajectories than constrained, task-oriented exchange, and does the strength of the feedback loop grow with conversation length? What determines the basin structure of the persona landscape; do some behavioral modes trap more strongly than others?

References:

Simhi et al., Old Habits Die Hard: How Conversational History Geometrically Traps LLMs, arXiv:2603.03308
Lu et al., The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models, arXiv:2601.10387
Cheng et al., Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks, arXiv:2402.09177
Bigelow et al., Belief Dynamics, arXiv:2511.00617

What I'm looking for in a Mentee

I typically prefer mentees that can be fairly independent with strong background and skills in at least one area of the project, e.g., experiment design, coding, good research practices, exploration, execution. I also typically prefer smaller teams of 2-3 mentees working closely together on one project, who are able to iterate quickly.

What I'm Like as a Mentor

I typically run mentoring sessions in a semi-structured format, with mentees ideally preparing ahead of time a short summary of how the week went, and key questions and decisions for discussion. I'm flexible in terms of how hands-on I can be, ranging from giving detailed feedback on experiments, results, plots, etc in the weekly updates, to just giving strategic direction and advice. I have a tendency to share loads of things (papers, X posts, etc) I find interesting and potentially useful in a project. I can generally communicate throughout the week on Slack and can do short bursts of thinking/writing if helpful.

Bio

Cozmin currently leads the Science of Evaluation team at the AI Security Institute in London. He joined AISI early in its life, and has worked in several roles, including co-leading the team responsible for the pre-deployment testing programme. Currently, Cozmin is interested in topics around task-horizon and inference scaling, methods for extracting insights from long agent transcripts on very hard tasks, and understanding model personas and weird generalisation.

EvaluationsAlignmentAI Agents

Tobias Häberli

Cozmin Ududec

Projects

What I'm looking for in a Mentee

What I'm Like as a Mentor

Bio

Adam Kaufman

Peter Hase