I am co-founder and CSO of a new generative AI startup based in Montréal, Canada and Berlin, Germany. I am also a core industry member and Canada CIFAR AI Chair at Mila.
I obtained my PhD from the University of Alberta in Canada, proposing the use of Atari 2600 video games to benchmark progress in reinforcement learning research. My advisors were Michael Bowling and Joel Veness.
I previously led the reinforcement learning efforts of the Google Brain team in Montréal.
In 2013–2017 I was a research scientist at DeepMind in the UK.
Our book surveys the core elements of distributional reinforcement learning, which seeks to understand how the various sources of randomness in an environment combine to produce complex distributions of outcomes, and how these distributions can be estimated from experience. Among others, the theory has been used as a model of dopaminergic neurons in the brain, to reduce the risk of failure in robotic grasping, and to achieve state-of-the-art performance in simulated car racing and video-game playing. With Will Dabney and Mark Rowland. Draft available at http://distributional-rl.org.
As part of a collaboration with Loon, we used deep reinforcement learning to improve the navigation capabilities of stratospheric balloons. Based on a 13-balloon, 39-day controlled experiment over the Pacific Ocean, we found evidence of significantly improved power efficiency, increased time within range of a designated station, and determined that the controller had discovered new navigation techniques. Training the controller was made possible by a statistically plausible simulator that could model the wind field's "known unknowns" and the effect of the diurnal cycle on power availability. Read our 2020 paper here.
The Arcade Learning Environment (ALE) is a reinforcement-learning interface that enables artificial agents to play Atari 2600 games. We released the first complete version of the benchmark in 2012 (see paper in the Journal of Artificial Intelligence Research). The ALE was popularized by the release of the highly-successful DQN algorithm (see our 2015 paper in Nature) and continues to support deep reinforcement learning research today.
My research focuses on two complementary problems in reinforcement learning. First comes the problem of representation: How should a learning system structure and update its knowledge about the environment it operates in? The second problem is concerned with exploration: How should the same learning system organize its decisions to be maximally effective at discovering its environment, and in particular to be able to rapidly acquire information to build better representations?