Jesse Hoogland
Project
Research Direction: SLT for AI Safety
I'm interested in working on follow-ups to research I previously conducted with Pivotal scholars on applications of SLT to training data attribution:
Bayesian Influence Functions for Hessian-Free Data Attribution
The Loss Kernel: A Geometric Probe for Deep Learning Interpretability
Here's a (non-exhaustive) list of some examples of the kind of projects we could work on:
Developing more efficient estimators based on block-wise/low-rank decompositions of these covariance kernels.
Applying these techniques to more directly relevant safety problems (/toy models), like emergent misalignment and eliciting secret knowledge.
Further benchmarking these techniques against other techniques, like CAFT and preventative steering.
Connecting these techniques to other SLT-derived interpretability tools ("susceptibilities")
What I'm looking for in a Mentee
Ideally, you're doing a PhD in Physics, Math, ML, or something similar. I'm looking for people who are comfortable dealing with pen-and-paper math but who can also get their hands dirty with experiments.
Bio
Jesse cofounded and leads Timaeus, a nonprofit research organization working on the Singular Learning Theory (SLT) for AI Safety research agenda. In addition to outreach and operations, he leads several research projects at Timaeus and has most recently been focusing on training data attribution using Bayesian Influence Functions.
