RL at Mistral · PhD at Mila, Quebec AI Institute
I am an RL intern at Mistral AI and a final year PhD at Mila co-supervised by Glen Berseth and Nikolay Malkin. I closely work with Yoshua Bengio and am an academic collaborator at LawZero helping develop safe and controllable AI systems. I also collaborate with LLNL on scaling off-policy RL for large reasoning models. I recently finished an internship at Valence Labs training flow bridges for molecular systems.
I believe we are close to recursive self-improvement, and my research focus has shifted accordingly. My work focuses on understanding LLM RL science and inference scaling. Right now, I’m most excited about self-improvement and compaction: getting models to improve their abstract reasoning over long task horizons, and to continually bootstrap from their own abilities. I’m also simply fascinated by the Machine Minds we’re spawning, and want to better understand them. This is important both for improving capabilities and for ensuring we aren’t creating unforeseen harm.