Publications

2021

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning. Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare,  to appear in Proceedings of the International Conference on Learning Representations (ICLR), 2021.

The Importance of Pessimism in Fixed-Dataset Policy Optimization. Jacob Buckman, Carles Gelada, Marc G. Bellemare, to appear in Proceedings of the International Conference on Learning Representations (ICLR), 2021[longer version]

Metrics and Continuity in Reinforcement Learning. Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro, to appear in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021.

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning. Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver, to appear in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021. [arXiv]

2020

Autonomous navigation of stratospheric balloons using reinforcement learning. Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda, Ziyu Wang. Nature, 588:77–82, 2020.

Representations for Stable Off-Policy Reinforcement Learning. Dibya Ghosh, Marc G. Bellemare, Proceedings of the International Conference on Machine Learning (ICML), 2020. [arXiv]

Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue. Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare, Proceedings of the International Conference on Computational Creativity (ICCC), 2020. [arXiv]

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment. Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare, Proceedings of the International Conference on Learning Representations (ICLR), 2020. Also best paper award at ICML 2019 Exploration in Reinforcement Learning Workshop. [arXiv]

Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction. Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare, Proceedings of the Thirty-Second AAAI Conference (AAAI), 2020. [arXiv]

Count-Based Exploration with the Successor Representation. Marlos C. Machado, Marc G. Bellemare, Michael BowlingProceedings of the Thirty-Second AAAI Conference (AAAI), 2020. [arXiv]

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms. Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2020.

The Hanabi Challenge: A New Frontier for AI Research. Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling, Artificial Intelligence, 2020 [paper, codearXiv].

2019

Distributional reinforcement learning with linear function approximation. Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. [paper]

Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift. Carles Gelada and Marc G. Bellemare. Proceedings of the Thirty-First AAAI Conference (AAAI), 2019. [arXiv]

A Comparative Analysis of Expected and Distributional Reinforcement Learning. Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare. Proceedings of the Thirty-First AAAI Conference (AAAI), 2019. [arXiv]

The Value Function Polytope in Reinforcement Learning. Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare, Proceedings of the International Conference on Machine Learning (ICML), 2019. [paperarXiv]

Statistics and Samples in Distributional Reinforcement Learning. Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney, Proceedings of the International Conference on Machine Learning (ICML), 2019. [paper]

DeepMDP: Learning Continuous Latent Space Models for Representation Learning. Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare, Proceedings of the International Conference on Machine Learning (ICML), 2019. [paper; arXiv has more results]

A Geometric Perspective on Optimal Representations in Reinforcement Learning. Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle, Advances in Neural Information Processing Systems (NeurIPS), 2019. [paperarXiv]

Generalized Policy Updates for Policy Optimization. Saurabh Kumar, Robert Dadashi, Zafarali Ahmed, Dale Schuurmans, Marc G. Bellemare, presented at NeurIPS 2019 Optimization Foundations for Reinforcement Learning Workshop, 2019. [paper]

Temporally Extended Metrics for Markov Decision Processes. Philip Amortila, Marc G. Bellemare, Prakash Panangaden, Doina Precup, SafeAI Workshop at AAAI2019. [pdf]

Hyperbolic Discounting and Learning over Multiple Horizons. William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle, best paper award at Reinforcement Learning and Decision-Making Symposium (RLDM), 2019. [arXiv]

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents. Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Ludwig Schubert, Marc Bellemare, Jeff Clune, Joel Lehman, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)2019. [paperarXiv]

2018

The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning. Audrunas Gruslys, Will Dabney, Mohammad Geshlaghi Azar, Bilal Piot, Marc G. Bellemare, Rémi Munos, Proceedings of the International Conference on Learning Representations, 2018. Previously "The Reactor: A Sample-Efficient Actor-Critic Architecture". [pdfarXiv]

An Analysis of Categorical Distributional Reinforcement Learning. Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018. [pdfarXiv]

Distributional Reinforcement Learning with Quantile Regression. Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos, Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [arXiv]

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling, Journal of Artificial Intelligence Research, 2018. [arXiv]

Approximate Exploration through State Abstraction. Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare, 2018. [arXiv]

The Barbados 2018 List of Open Issues in Continual Learning. Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup, 2018. [arXiv]

An Introduction to Deep Reinforcement Learning. Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau, Foundations and Trends in Machine Learning, 2018.

Dopamine: A Research Framework for Deep Reinforcement Learning. Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare, 2018. [arXiv, code]

2017

The Cramér Distance as a Solution to Biased Wasserstein Gradients. Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos, 2017. [arXiv]

A Distributional Perspective on Reinforcement Learning. Marc G. Bellemare*, Will Dabney*, Rémi Munos,  Proceedings of the International Conference on Machine Learning, 2017. [pdf with erratum, as publishedarXiv]

Count-Based Exploration with Neural Density Models. Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Rémi Munos, Proceedings of the International Conference on Machine Learning, 2017. [pdfarXiv, info]

Automatic Curriculum Learning for Neural Networks. Alex Graves, Marc G. Bellemare, Jacob Menick, Rémi Munos, Koray Kavukcuoglu, Proceedings of the International Conference on Machine Learning, 2017. [pdfarXiv]

A Laplacian Framework for Option Discovery in Reinforcement Learning. Marlos C. Machado, Marc G. Bellemare, Michael Bowling, Proceedings of the International Conference on Machine Learning, 2017. [pdfarXiv, video]

2016

Unifying Count-Based Exploration and Intrinsic Motivation. Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Rémi Munos, Advances in Neural Information Processing Systems 30, 2016. [pdf, long version, bib, CTS density model tutorial]

Safe and Efficient Off-Policy Reinforcement Learning. Rémi Munos, Tom Stepleton, Anna Harutyunyan, and Marc G. Bellemare, Advances in Neural Information Processing Systems 30, 2016. [pdf, long version, bib]

Q(lambda) with Off-Policy Corrections. Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, and Rémi Munos, Proceedings of Algorithmic Learning Theory, 2016. [pdf, bib]

Increasing the Action Gap: New Operators for Reinforcement Learning. Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, and Rémi Munos, Proceedings of the Thirtieth AAAI Conference, 2016. [pdf, supplemental, videos, code, addendum]

2015

Human-level Control through Deep Reinforcement Learning. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis, Nature, 2015. [web]

Extended Abstract: The Arcade Learning Environment: An Evaluation Platform for General Agents. Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling, Presented at IJCAI-15, 2015. This is an extended abstract for our 2013 JAIR paper on the ALE. It also contains a new section on the algorithm "The Brute". [pdf]

Online Learning of k-CNF Boolean Functions. Joel Veness, Marcus Hutter, Laurent Orseau, and Marc G. Bellemare, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. [pdf, arXiv]

Count-Based Frequency Estimation using Bounded Memory. Marc G. Bellemare, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015. [pdf, supplemental]

Compress and Control. Joel Veness, Marc G. Bellemare, Marcus Hutter, Alvin Chua, Guillaume Desjardins, Proceedings of the Twenty-Ninth AAAI Conference, 2015. [pdf]

2014

Skip Context Tree Switching. Marc G. Bellemare, Joel Veness and Erik Talvitie, Proceedings of the International Conference on Machine Learning, 2014. [pdf, code]

2013

Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains. Marc G. Bellemare, Ph.D. Thesis, University of Alberta, 2013. [pdf]

The Arcade Learning Environment: An Evaluation Platform for General Agents. Marc G. Bellemare, Yavar Naddaf, Joel Veness and Michael Bowling, Journal of Artificial Intelligence Research 47, pp. 253-279, 2013. [pdf, page, arXiv]

Bayesian Learning of Recursively Factored Environments. Marc G. Bellemare, Joel Veness and Michael Bowling, Proceedings of the International Conference on Machine Learning, 2013. [pdf]

2012

Sketch-Based Linear Value Function Approximation. Marc G. Bellemare, Joel Veness and Michael Bowling, In Advances in Neural Information Processing Systems 25, 2012. [pdf, page, code]

Investigating Contingency Awareness Using Atari 2600 Games. Marc G. Bellemare, Joel Veness and Michael Bowling, In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI), 2012. [pdf, page]

2011 and Earlier

A Primer on Reinforcement Learning in the Brain: Psychological, Computational and Neural Perspectives. Elliot A. Ludvig, Marc G. Bellemare and Keir G. Pearson, In E. Alonso, E. Mondragon (Eds.), Computational Neuroscience for Advancing Artificial Intelligence: Models, Methods and Applications, pp. 111-144, Hershey, PA: IGI Global, 2011.

Constructing Evidence-based Treatment Strategies using Methods from Computer Science. Joelle Pineau, Marc G. Bellemare, A. John Rush, Adrian Ghizaru and Susan A. Murphy, Drug and Alcohol Dependence, 88, Supplement 2, pp. 52-60, 2007.

Learning Prediction and Abstraction in Partially Observable Models. Marc G. Bellemare, Master's Thesis, McGill University, 2007. [ps]

Context-driven predictions. Marc G. Bellemare and Doina Precup, Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, 2007. [pdf]

Cascade correlation algorithms for on-line reinforcement learning. Marc G. Bellemare, Proceedings of the North East Student Colloquium on Artificial Intelligence (NESCAI), 2006. [ps]