Marc G. Bellemare

Marc G. Bellemare

Research Scientist, Google Research (Brain team)

Adjunct Professor, McGill University

Canada CIFAR AI Chair, Mila

I lead the reinforcement learning efforts of the Google Research team in Montréal, Canada. I also supervise a number of graduate students at Mila, of which I am a core industry member.

My research focuses on two complementary problems in reinforcement learning. First comes the problem of representation: How should a learning system structure and update its knowledge about the environment it operates in? The second problem is concerned with exploration: How should the same learning system organize its decisions to be maximally effective at discovering its environment, and in particular acquiring information to build better representations?

From 2013 to 2017 I was at DeepMind in the United Kingdom. I completed my Ph.D. at the University of Alberta, focusing on the Atari 2600 as a benchmark for reinforcement learning research — what became the Arcade Learning Environment.

Current students
Adrien Ali Taïga (PhD), with Aaron Courville.
Jacob Buckman (PhD), with Doina Precup.
Rishabh Agarwal (PhD), with Aaron Courville.
Harley Wiltzer (MSc), with David Meger.
Pierluca D'Oro (PhD), with Pierre-Luc Bacon.
Nathan U. Rahn (MSc/PhD), with Doina Precup.

Graduated students
Marlos C. Machado (PhD 2019, U. of Alberta), with Michael Bowling. Now at Google Brain, Montréal.
Philip Amortila (MSc 2019), with Prakash Panagaden and Doina Precup. Now PhD student at UIUC.
Vishal Jain (MSc 2019), with Doina Precup.


November 1st, 2020. Jacques Drouin and myself have been awarded one of five grants to apply tools from machine learning to epigenomics (featured in Le Devoir; in French).

October 15th, 2020. My work (joint with Georg Ostrovski) on exploration in challenging Atari 2600 games is featured in Brian Christian's most recent book, The Alignment Problem.

February 4th, 2019. We've open-sourced the Hanabi Learning Environment, a platform for multiagent AI research on the board game Hanabi. This has been a fantastic collaboration with researchers from DeepMind. More details in this blog post.

February 4th, 2019. Last August we released the first version of the Dopamine framework for reinforcement learning research. This week version 2.0 comes out. Dopamine 2.0 provides support for most discrete action domains supported by OpenAI Gym. Huge thanks to Pablo Samuel Castro for getting this second release out the door, blog post here.

December 1st, 2017. I am thrilled to finally announce the release of version 0.6.0 of the Arcade Learning Environment. This new version brings many exciting new features and bug fixes, most importantly the ability to play games in different modes and difficulties, in some cases giving rise to dozens of different configurations! This will be of particular interest to researchers interested in transfer, lifelong, and continual learning. More details here.

October 13th, 2017. I recently gave two talks on distributional reinforcement learning: a short one at the Montreal AI Symposium and a longer one at my alma mater, the University of Alberta.

September 5th, 2017. I joined the Google Brain team in Montréal, Canada.

Distributional Reinforcement Learning

In classical reinforcement learning, the agent learns to predict the expected sum of future rewards received as a consequence of its decisions. In the last few years, with colleagues Will Dabney and Mark Rowland I've explored distributional reinforcement learning, where the agent instead learns to predict the entire distribution of outcomes. Doing so improves the empirical performance of the learning agents in complex tasks, we hypothesize because it improves and stabilizes the representation of its environment. Beyond performance metrics, distributional reinforcement learning provides a set of tools for reasoning about stochastic outcomes that is useful when decisions should account for risk, or when the system is partially observable.

Talk at UBC CAIDA (Nov. 18th, 2020)

Blog post (DeepMind; 2017 paper)

The Arcade Learning Environment

The Arcade Learning Environment (ALE) is the highly-successful interface to Atari 2600 games, designed at the University of Alberta from 2008 onwards and first released in 2010. As part of my PhD I produced the first complete version of the benchmark, reported on in JAIR. The ALE was popularized by our 2015 Nature paper at DeepMind, and has since supported countless research projects. I remained the ALE's lead maintainer until 2016 or so, and was co-lead (with Marlos C. Machado) from 2016—2019. Jesse Farebrother is now the lead maintainer and has been actively improving the benchmark.

Exploration in Reinforcement Learning

Most of the successes of deep reinforcement learning have been achieved with the most naive forms of exploration, where an agent discovers its environment by making choices at random. This approach is extremely wasteful; for example, the DQN agent took over 38 days of continuous play to obtain human-level performance. I am generally interested in understanding how artificial curiosity can address this issue. Some of our earlier work used density modelling techniques to estimate the novelty of a particular situation (original paper; follow-up work with Georg Ostrovski). Our flagship benchmark, the game Montezuma's Revenge posed a multi-year challenge to the reinforcement learning community, and was eventually tackled with hybrid techniques such as Go-Explore.