Damiano Fornasiere & Gaël Gendron
Projects
Research Direction: Introspection & contextualization: causes, implication, & learning
Causes of introspection. We have preliminary empirical evidence that the mechanisms allowing language models to answer different kinds of introspective questions share a common root. We would like to validate our hypothesis both theoretically and empirically.
Introspection and learning. We have shown that language models can in-context learn to answer introspective questions, even beyond steering vectors and when reasoning is involved. Might this extend to prospective learning (the ability to execute actions based on future predictions---and in particular predict how other agents will react to one's actions)? What does this imply for, e.g., alignment and cooperation?
Source-aware training is known to improve LLMs transparency, verifiability and helps steering. Similar techniques (e.g., inoculation prompting) are now used to mitigate specification gaming as well as emergent misalignment . We are studying how [re]contextualization helps with generalization and alignment more generally.
What I'm looking for in a Mentee
Style:
I appreciate clear communication and clean math / code / writing, yet the road to reach there can be messy.
I am happy to supervise people at different stages of career / levels of independence / backgrounds, yet the propensity to be scientific, precise, and excited should be there.
Essential knowledge:
Foundations of machine- and deep-learning;
Transformer architecture and large language models;
Empirical AI safety literature (e.g., evaluations, guardrails, interpretability, …).
Essential experience:
Python;
Designing and implementing machine learning workflows using PyTorch;
Implementing evaluation protocols for language models;
Supervised- or RL-fine tuning of language models, at least with toy experiments and some publicly available datasets.
Bonus:
Prompt and context engineering, RAG;
Experience with libraries such as vLLM, TRL, Hugging Face transformers, accelerate and/or PEFT;
Familiarity with statistical hypothesis testing.
What I'm Like as a Mentor
Supervision is well-scoped at the beginning while, towards the second half, the mentees have more freedom to self-direct. I am excited to sanity-check results, including the math/code/plots.
At least 1h meeting / week. Always available via Slack and Google chat.
Bio
Damiano is a research scientist at LawZero, where he works on (i) introspection, (ii) evaluation- and training-awareness, (iii) the math behind the Scientist AI. MATS and SPAR mentor, MATS alumnus, PhD in mathematics.
Gaël is a research scientist at LawZero. His work focuses on improving trustworthiness and factuality in frontier models and the ScientistAI. PhD in deep learning and causality theory.
