Julian Stastny

AlignmentControl

10 Nov

Written By Tobias Häberli

Projects

Research Direction: Studying Scheming and Alignment

Studying abstract analogies of scheming such as these two proposed projects.

Training AIs to be more goal-directed and coherent (and studying whether this can lead to scheming).
Training AIs on various model specs and studying effects on the model's personality.

What I'm looking for in a Mentee

Strong empirical research skills.

Ability and drive to take conceptual ownership of their project.
Enjoys the early exploratory part of research (where there aren't any numbers to drive up yet).

Bio

At Redwood, I manage a bunch of empirical and conceptual research projects. Previously I worked at the Center on Long-term Risk. My academic background is in computer science and machine learning.

AI AlignmentScheming

Tobias Häberli

Julian Stastny

Projects

What I'm looking for in a Mentee

Bio

Dylan Hadfield-Menell

Oscar Delaney