Marc G. Bellemare

Adjunct Professor, McGill University
Adjunct Professor, Université de Montréal
Canada CIFAR AI Chair, Mila
Associate Fellow, CIFAR LMB Program


I am co-founder and CSO of a new generative AI startup based in Montréal, Canada and Berlin, Germany. I am also a core industry member and Canada CIFAR AI Chair at Mila.
I obtained my PhD from the University of Alberta in Canada, proposing the use of Atari 2600 video games to benchmark progress in reinforcement learning research. My advisors were Michael Bowling and Joel Veness.
I previously led the reinforcement learning efforts of the Google Brain team in Montréal.
In 2013–2017 I was a research scientist at DeepMind in the UK.

Project highlights

Distributional Reinforcement Learning

MIT Press, Spring 2023

Our book surveys the core elements of distributional reinforcement learning, which seeks to understand how the various sources of randomness in an environment combine to produce complex distributions of outcomes, and how these distributions can be estimated from experience. Among others, the theory has been used as a model of dopaminergic neurons in the brain, to reduce the risk of failure in robotic grasping, and to achieve state-of-the-art performance in simulated car racing and video-game playing. With Will Dabney and Mark Rowland. Draft available at

Further reading

Talk at the University of British Columbia (2020)
A distributional perspective on reinforcement learning (ICML, 2017)

Autonomous Navigation of Stratospheric Balloons

Nature, 2020

As part of a collaboration with Loon, we used deep reinforcement learning to improve the navigation capabilities of stratospheric balloons. Based on a 13-balloon, 39-day controlled experiment over the Pacific Ocean, we found evidence of significantly improved power efficiency, increased time within range of a designated station, and determined that the controller had discovered new navigation techniques. Training the controller was made possible by a statistically plausible simulator that could model the wind field's "known unknowns" and the effect of the diurnal cycle on power availability. Read our 2020 paper here.

In 2022 we open-sourced a high-fidelity replica of the original simulator, offering a unique challenge for reinforcement learning algorithms. Read the blog post.

In the press

New Scientist, Google's AI can keep Loon balloons flying for over 300 days in a row
La Presse, Loon fait appel à l'expertise en intelligence artificielle de Google Montréal
Heise Online, Google: KI hält Stratosphärenballons in festem Gebiet

The Arcade Learning Environment

Journal of Artificial Intelligence Research, 2013

The Arcade Learning Environment (ALE) is a reinforcement-learning interface that enables artificial agents to play Atari 2600 games. We released the first complete version of the benchmark in 2012 (see paper in the Journal of Artificial Intelligence Research). The ALE was popularized by the release of the highly-successful DQN algorithm (see our 2015 paper in Nature) and continues to support deep reinforcement learning research today.

Further reading

Revisiting the arcade learning environment: evaluation protocols and open problems for general agents (JAIR, 2018)

Selected publications

My research focuses on two complementary problems in reinforcement learning. First comes the problem of representation: How should a learning system structure and update its knowledge about the environment it operates in? The second problem is concerned with exploration: How should the same learning system organize its decisions to be maximally effective at discovering its environment, and in particular to be able to rapidly acquire information to build better representations?

Deep reinforcement learning at the edge of the statistical precipice. Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare. NeurIPS 2021, Best paper award [GitHub].
On bonus-based exploration methods in the arcade learning environment. Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare. ICLR 2020. Also best paper award at ICML Workshop on Exploration in RL, 2019.
DeepMDP: Learning continuous latent space models for representation learning. Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare. ICML 2019.
A geometric perspective on optimal representations for reinforcement learning. Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle. NeurIPS 2019.
Dopamine: A research framework for deep reinforcement learning. Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare. 2018. [GitHub]
Unifying count-based exploration and intrinsic motivation. Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Rémi Munos. NeurIPS 2016.



Adrien Ali Taïga (PhD), with Aaron Courville.
Jacob Buckman (PhD), with Doina Precup.
Rishabh Agarwal (PhD), with Aaron Courville.
Harley Wiltzer (PhD), with David Meger.
Pierluca D'Oro (PhD), with Pierre-Luc Bacon.
Nathan U. Rahn (PhD), with Doina Precup.
Jesse Farebrother (PhD), with David Meger.
Charline Le Lan (PhD, Oxford University), with Yee Whye Teh and Shimon Whiteson.
Max Schwarzer (PhD), with Aaron Courville.
Johan Obando Céron (MSc), with Pablo Samuel Castro.
Linda Petrini (PhD), with Aaron Courville.


Marlos C. Machado (PhD 2019, University of Alberta).
Philip Amortila (MSc 2019).
Vishal Jain (MSc 2019).
Harley Wiltzer (MSc 2021).