Publications

2024

  1. Reward Centering PDF
    Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton
    RLC, 2024
  2. An Idiosyncrasy of Time-discretization in Reinforcement Learning PDF
    Kris De Asis, Richard S. Sutton
    RLC, 2024
  3. SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning PDF
    Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton
    RLC, 2024
  4. Demystifying the Recency Heuristic in Temporal-Difference Learning
    Brett Daley, Marlos C. Machado, Martha White.
    RLC, 2024
  5. Investigating the Interplay of Prioritized Replay and Generalization
    Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White
    RLC, 2024
  6. The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning
    Andrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, Adam White.
    RLC, 2024
  7. Harnessing Discrete Representations for Continual Reinforcement Learning
    Edan Jacob Meyer, Adam White, Marlos C. Machado
    RLC, 2024
  8. The Cliff of Overcommitment with Policy Gradient Step Sizes
    Scott M. Jordan, Samuel Neumann, James E. Kostas, Adam White, Philip S. Thomas
    RLC, 2024
  9. Mitigating the Curse of Horizon in Monte-Carlo Returns
    Alex Ayoub, David Szepesvari, Francesco Zanini, Bryan Chan, Dhawal Gupta, Bruno Castro da Silva, Dale Schuurmans
    RLC, 2024
  10. Learning to Optimize for Reinforcement Learning
    Qingfeng Lan, A. Rupam Mahmood, Shuicheng YAN, Zhongwen Xu
    RLC, 2024
  11. More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
    Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu
    RLC, 2024
  12. Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning,
    Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, A. Rupam Mahmood
    RLC, 2024
  13. Weight Clipping for Deep Continual and Reinforcement Learning
    Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood
    RLC, 2024
  14. Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models PDF
    Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin J Talvitie, Michael Bowling, Martha White
    JAIR, 2024

2021

  1. A Distribution-dependent Analysis of Meta Learning
    Mikhail Konobeev (University of Alberta) · Ilja Kuzborskij (University of Milan) · Csaba Szepesvari (DeepMind/University of Alberta)
    ICML, 2021
  2. Meta-Thompson Sampling
    Branislav Kveton (Google Research) · Mikhail Konobeev (University of Alberta) · Manzil Zaheer (Google Research) · Chih-wei Hsu ( Google Research) · Martin Mladenov (Google) · Craig Boutilier (Google) · Csaba Szepesvari (DeepMind/University of Alberta)
    ICML, 2021
  3. Average-Reward Off-Policy Policy Evaluation with Function Approximation
    Shangtong Zhang (University of Oxford) · Yi Wan (University of Alberta) · Richard Sutton (DeepMind / Univ Alberta) · Shimon Whiteson (University of Oxford)
    ICML, 2021
  4. Learning and Planning in Average-Reward Markov Decision Processes
    Yi Wan (University of Alberta) · Abhishek Naik (University of Alberta) · Richard Sutton (DeepMind / Univ Alberta)
    ICML, 2021
  5. Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games
    Dustin Morrill (University of Alberta) · Ryan D'Orazio ( Université de Montréal) · Marc Lanctot (DeepMind) · James Wright (University of Alberta) · Michael Bowling (University of Alberta) · Amy Greenwald (Brown)
    ICML, 2021
  6. Characterizing the Gap Between Actor-Critic and Policy Gradient
    Junfeng Wen (University of Alberta) · Saurabh Kumar (Stanford) · Subha Ramakrishna Gummadi (Google Brain) · Dale Schuurmans (University of Alberta)
    ICML, 2021
  7. On the Optimality of Batch Policy Optimization Algorithms
    Chenjun Xiao (Google / University of Alberta) · Yifan Wu (Carnegie Mellon University) · Jincheng Mei (University of Alberta / Google Brain) · Bo Dai (Google Brain) · Tor Lattimore (DeepMind) · Lihong Li (Google Research) · Csaba Szepesvari (DeepMind/University of Alberta) · Dale Schuurmans (Google / University of Alberta)
    ICML, 2021
  8. Leveraging Non-uniformity in First-order Non-convex Optimization
    Jincheng Mei (University of Alberta / Google Brain) · Yue Gao (University of Alberta) · Bo Dai (Google Brain) · Csaba Szepesvari (DeepMind/University of Alberta) · Dale Schuurmans (University of Alberta)
    ICML, 2021
  9. Randomized Exploration in Reinforcement Learning with General Value Function Approximation PDF
    Haque Ishfaq (MILA / McGill University) · Qiwen Cui (Peking University) · Alex Ayoub (University of Alberta) · Viet Nguyen (McGill, Mila) · Zhuoran Yang (Princeton University) · Zhaoran Wang (Northwestern U) · Doina Precup (McGill University / DeepMind) · Lin Yang (UCLA)
    ICML, 2021
  10. Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization PDF
    Wesley Chung (MILA / McGill University), Valentin Thomas (MILA / UdeM) , Marlos C. Machado, (DeepMind/University of Alberta), Nicolas Le Roux (MILA / McGill University / UdeM)
    ICML, 2021
  11. Differentially Private Approximations of a Convex Hull in Low Dimensions PDF
    Yue Gao (University of Alberta), Or Sheffet (Bar-Ilan University)
    ITC, 2021
  12. Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online
    Yangchen Pan, Kirby Banman, Martha White
    ICLR, 2021

2020

  1. An implicit function learning approach for parametric modal regression
    Yangchen Pan, Ehsan Imani, Martha White, Amir-massoud Farahmand
    NeurIPS, 2020
  2. Escaping the Gravitational Pull of Softmax
    Jincheng Mei, Chenjun Xia, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans
    NeurIPS, 2020
  3. Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
    Zaheen F Ahmad, Levi Lelis, Michael Bowling
    NeurIPS, 2020
  4. Efficient Planning in Large MDPs with Weak Linear Function Approximation
    Roshan Shariff, Csaba Szepesvari
    NeurIPS, 2020
  5. Low-Variance and Zero-Variance Baselines for Extensive-Form Games
    Trevor Davis, Martin Schmid, Michael Bowling
    ICML, 2020
  6. Gradient Temporal-Difference Learning with Regularized Corrections
    Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White
    ICML, 2020
  7. Batch Stationary Distribution Estimation
    Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans
    ICML, 2020
  8. Selective Dyna-style Planning Under Limited Model Capacity
    Muhammad Zaheer, Samuel Sokota, Erin Talvitie, Martha White
    ICML, 2020
  9. On the Global Convergence Rates of Softmax Policy Gradient Methods
    Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans
    ICML, 2020
  10. Domain Aggregation Networks for Multi-Source Domain Adaptation
    Junfeng Wen, Russell Greiner, Dale Schuurmans
    ICML, 2020
  11. Model-Based Reinforcement Learning with Value-Targeted Regression PDF
    Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang
    ICML, 2020
  12. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey PDF
    Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, Peter Stone
    JMLR, 2020
  13. Solving Zero-Sum Imperfect Information Games Using Alternative Link Functions: An Analysis of f-Regression Counterfactual Regret Minimization PDF
    Ryan D’Orazio, Dustin Morrill, James Wright, Michael Bowling
    AAMAS, 2020
  14. Maximizing Information Gain via Prediction Rewards
    Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White
    AAMAS, 2020
  15. Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks
    Sina Ghiassian, Banafsheh Rafiee, Yat Long Lo, Adam White
    AAMAS, 2020
  16. Multi Type Mean Field Reinforcement Learning PDF
    Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E. Taylor, Nidhi Hegde
    AAMAS, 2020
  17. Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients PDF
    Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls
    AAMAS, 2020
  18. Useful Policy Invariance Shaping from Arbitrary Advice PDF
    Paniz Behboudian, Yash satsangi, Matthew E. Taylor, Anna Harutyunyan, Michael Bowling
    AAMAS, 2020
  19. Training Recurrent Neural Networks Online by Learning Explicit State Variables PDF
    Somjit Nath, Vincent Liu, Alan Chan, Adam White, Martha White
    ICLR, 2020
  20. Frequency-based Search-control in Dyna PDF
    Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand, Martha White
    ICLR, 2020
  21. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning PDF
    Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
    ICLR, 2020
  22. GenDICE: Generalized Offline Estimation of Stationary Values PDF
    Ruiyi Zhang*, Bo Dai*, Lihong Li, Dale Schuurmans
    ICLR, 2020
  23. Count-Based Exploration with the Successor Representation
    Marlos C. Machado (Google Brain)*, Marc G. Bellemare (Google Brain), Michael Bowling
    AAAI, 2020
  24. Guiding CDCL SAT Search via Random Exploration amid Conflict Depression
    Md Solimul Chowdhury*, Martin Müller, Jia-Huai You
    AAAI, 2020
  25. Gamma-Nets: Generalizing Value Estimation Over Timescale
    Craig Sherstan*, Shibhansh Dohare, James MacGlashan, Johannes Guenther, Patrick Pilarski
    AAAI, 2020
  26. Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning
    Kris De Asis; Alan Chan, Silviu Pitis, Richard Sutton, Daniel Graves
    AAAI, 2020

2019

  1. Ease-of-Teaching and Language Structure from Emergent Communication. PDF
    Fushan Li and Michael Bowling.
    NeurIPS, 2019
  2. Importance Resampling for Off-policy Prediction. PDF
    Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White.
    NeurIPS, 2019
  3. Meta-Learning Representations for Continual Learning. PDF
    Khurram Javed and Martha White.
    NeurIPS, 2019
  4. Learning Macroscopic Brain Connectomes via Group-Sparse Factorization.
    Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar Caiafa, Russell Greiner, Martha White.
    NeurIPS, 2019
  5. Exponential Family Estimation via Adversarial Dynamics Embedding. PDF
    Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans.
    NeurIPS, 2019
  6. Maximum Entropy Monte-Carlo Planning.
    Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans,Martin Müller.
    NeurIPS, 2019
  7. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making.
    Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans.
    NeurIPS, 2019
  8. Invertible Convolutional Flow.
    Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth
    NeurIPS, 2019
  9. A Geometric Perspective on Optimal Representations for Reinforcement Learning. PDF
    Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
    NeurIPS, 2019
  10. Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging.
    Pooria Joulani, András György, Csaba Szepesvari.
    NeurIPS, 2019
  11. Detecting Overfitting via Adversarial Examples. PDF
    Roman Werpachowski, András György, Csaba Szepesvari.
    NeurIPS, 2019
  12. Planning with Expectation Models. PDF
    Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard Sutton.
    IJCAI, 2019
  13. Perturbed-History Exploration in Stochastic Multi-Armed Bandits. PDF
    Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier.
    IJCAI, 2019
  14. Advantage Amplification in Slowly Evolving Latent-State Environments. PDF
    Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier.
    IJCAI, 2019
  15. On Principled Entropy Exploration in Policy Optimization. PDF
    Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller.
    IJCAI, 2019
  16. Hill Climbing on Value Estimates for Search-control in Dyna.
    Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White.
    IJCAI, 2019
  17. BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback.
    Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi
    UAI, 2019
  18. Perturbed-History Exploration in Stochastic Linear Bandits.
    Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier
    UAI, 2019
  19. CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration.
    Gellért Weisz, Andras Gyorgy, Csaba Szepesvari.
    ICML, 2019
  20. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.
    Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh
    ICML, 2019
  21. Online Learning to Rank with Features.
    Shuai Li, Tor Lattimore, Csaba Szepesvari
    ICML, 2019
  22. Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning.
    Jakob Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling
    ICML, 2019
  23. The Value Function Polytope in Reinforcement Learning.
    Robert Dadashi, Marc Bellemare, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans
    ICML, 2019
  24. Understanding the Impact of Entropy on Policy Optimization.
    Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans
    ICML, 2019
  25. Learning to Generalize from Sparse and Underspecified Rewards.
    Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouz
    ICML, 2019
  26. Two-Timescale Networks for Nonlinear Value Function Approximation.
    Wesley Chung, Somjit Nath, Ajin Joseph, Martha White.
    ICLR, 2019
  27. Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures.
    Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham Ruderma, Keith Anderson, Krishnamurthy Dvijotham, Nicolas Heess, Pushmeet Kohli.
    ICLR, 2019
  28. Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots.
    Banafsheh Rafiee, Sina Ghiassian, Adam White, Richard Sutton.
    AAMAS, 2019
  29. The Utility of Sparse Representations for Control in Reinforcement Learning.
    Vincent Liu, Raksha Kumaraswamy, Lei Le, Martha White.
    AAAI, 2019
  30. An Exponential Tail Bound for the Deleted Estimate.
    Abou-Moustafa Karim, Csaba Szepesvari.
    AAAI, 2019
  31. Variance Reduction in Monte Carlo Regret Minimization for Extensive Games using Baselines.
    Martin Schmid, Matej Moravcik, Neil Burch, Marc Lanctot, Rudolf Kadlec, Michael Bowling.
    AAAI, 2019
  32. Solving Large Extensive-Form Games with Strategy Constraints.
    Trevor Davis, Kevin Waugh, Michael Bowling.
    AAAI, 2019
  33. Meta-descent for Online, Continual Prediction.
    Andrew Jacobsen, Matthew Schlegel, Cameron Linke,Thomas Degris, Adam White, Martha White.
    AAAI, 2019

2018

  1. An Off-policy Policy Gradient Theorem Using Emphatic Weightings PDF
    Ehsan Imani, Eric Graves and Martha White
    NeurIPS, 2018
  2. Context-dependent upper-confidence bounds for directed exploration PDF
    Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White
    NeurIPS, 2018
  3. Supervised autoencoders: Improving generalization performance with unsupervised regularizers PDF
    Lei Le, Andrew Patterson, Martha White
    NeurIPS, 2018
  4. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments PDF
    Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
    NeurIPS, 2018
  5. Non-delusional Q-learning and value-iteration PDF
    Tyler Lu, Dale Schuurmans, Craig Boutilier
    (BEST PAPER AWARD) NeurIPS, 2018
  6. PAC-Bayes bounds for stable algorithms with instance-dependent priors PDF
    Omar Rivasplata, Csaba Szepesvari, John Shawe-Taylor, Emilio Parrado-Hernandez, Shiliang Sun
    NeurIPS, 2018
  7. TopRank: A practical algorithm for online stochastic ranking PDF
    Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari
    NeurIPS, 2018
  8. Per-Decision Multi-step Temporal Difference Learning with Control Variates PDF
    Kris De Asis, Richard Sutton
    UAI, 2018
  9. Multi-step Reinforcement Learning: A Unifying Algorithm PDF
    Kris De Asis*, Juan-Fernando Hernandez-Garcia*, Gordon Zacharias Holland*, Richard Sutton
    AAAI, 2018