Jesse Hoogland

3 Nov

Project

Research Direction: SLT for AI Safety

I'm interested in working on follow-ups to research I previously conducted with Pivotal scholars on applications of SLT to training data attribution:

Here's a (non-exhaustive) list of some examples of the kind of projects we could work on:

Developing more efficient estimators based on block-wise/low-rank decompositions of these covariance kernels.
Applying these techniques to more directly relevant safety problems (/toy models), like emergent misalignment and eliciting secret knowledge.
Further benchmarking these techniques against other techniques, like CAFT and preventative steering.
Connecting these techniques to other SLT-derived interpretability tools ("susceptibilities")

What I'm looking for in a Mentee

Ideally, you're doing a PhD in Physics, Math, ML, or something similar. I'm looking for people who are comfortable dealing with pen-and-paper math but who can also get their hands dirty with experiments.

Bio

Jesse cofounded and leads Timaeus, a nonprofit research organization working on the Singular Learning Theory (SLT) for AI Safety research agenda. In addition to outreach and operations, he leads several research projects at Timaeus and has most recently been focusing on training data attribution using Bayesian Influence Functions.

Singular Learning TheoryAI AlignmentInterpretability

Tobias Häberli

Jesse Hoogland

Project

What I'm looking for in a Mentee

Bio

Noah Y. Siegel

Ben Bucknall