1. Training Recurrent Neural Networks Online by Learning Explicit State Variables PDF
    Somjit Nath, Vincent Liu, Alan Chan, Adam White, Martha White
    ICLR, 2020
  2. Frequency-based Search-control in Dyna PDF
    Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand, Martha White
    ICLR, 2020
  3. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning PDF
    Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
    ICLR, 2020
  4. GenDICE: Generalized Offline Estimation of Stationary Values PDF
    Ruiyi Zhang*, Bo Dai*, Lihong Li, Dale Schuurmans
    ICLR, 2020
  5. Count-Based Exploration with the Successor Representation PDF
    Marlos C. Machado (Google Brain)*, Marc G. Bellemare (Google Brain), Michael Bowling
    AAAI, 2020
  6. Guiding CDCL SAT Search via Random Exploration amid Conflict Depression PDF
    Md Solimul Chowdhury*, Martin Müller, Jia-Huai You
    AAAI, 2020
  7. Gamma-Nets: Generalizing Value Estimation Over Timescale PDF
    Craig Sherstan*, Shibhansh Dohare, James MacGlashan, Johannes Guenther, Patrick Pilarski
    AAAI, 2020
  8. Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning PDF
    Kristopher De Asis*; Alan Chan, Silviu Pitis, Richard Sutton, Daniel Graves
    AAAI, 2020


  1. Ease-of-Teaching and Language Structure from Emergent Communication. PDF
    Fushan Li and Michael Bowling.
    NeurIPS, 2019
  2. Importance Resampling for Off-policy Prediction. PDF
    Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White.
    NeurIPS, 2019
  3. Meta-Learning Representations for Continual Learning. PDF
    Khurram Javed and Martha White.
    NeurIPS, 2019
  4. Learning Macroscopic Brain Connectomes via Group-Sparse Factorization.
    Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar Caiafa, Russell Greiner, Martha White.
    NeurIPS, 2019
  5. Exponential Family Estimation via Adversarial Dynamics Embedding. PDF
    Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans.
    NeurIPS, 2019
  6. Maximum Entropy Monte-Carlo Planning.
    Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans,Martin Müller.
    NeurIPS, 2019
  7. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making.
    Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans.
    NeurIPS, 2019
  8. Invertible Convolutional Flow.
    Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth
    NeurIPS, 2019
  9. A Geometric Perspective on Optimal Representations for Reinforcement Learning. PDF
    Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
    NeurIPS, 2019
  10. Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging.
    Pooria Joulani, András György, Csaba Szepesvari.
    NeurIPS, 2019
  11. Detecting Overfitting via Adversarial Examples. PDF
    Roman Werpachowski, András György, Csaba Szepesvari.
    NeurIPS, 2019
  12. Planning with Expectation Models. PDF
    Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard Sutton.
    IJCAI, 2019
  13. Perturbed-History Exploration in Stochastic Multi-Armed Bandits. PDF
    Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier.
    IJCAI, 2019
  14. Advantage Amplification in Slowly Evolving Latent-State Environments. PDF
    Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier.
    IJCAI, 2019
  15. On Principled Entropy Exploration in Policy Optimization. PDF
    Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller.
    IJCAI, 2019
  16. Hill Climbing on Value Estimates for Search-control in Dyna.
    Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White.
    IJCAI, 2019
  17. BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback.
    Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi
    UAI, 2019
  18. Perturbed-History Exploration in Stochastic Linear Bandits.
    Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier
    UAI, 2019
  19. CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration.
    Gellért Weisz, Andras Gyorgy, Csaba Szepesvari.
    ICML, 2019
  20. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.
    Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh
    ICML, 2019
  21. Online Learning to Rank with Features.
    Shuai Li, Tor Lattimore, Csaba Szepesvari
    ICML, 2019
  22. Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning.
    Jakob Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling
    ICML, 2019
  23. The Value Function Polytope in Reinforcement Learning.
    Robert Dadashi, Marc Bellemare, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans
    ICML, 2019
  24. Understanding the Impact of Entropy on Policy Optimization.
    Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans
    ICML, 2019
  25. Learning to Generalize from Sparse and Underspecified Rewards.
    Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouz
    ICML, 2019
  26. Two-Timescale Networks for Nonlinear Value Function Approximation.
    Wesley Chung, Somjit Nath, Ajin Joseph, Martha White.
    ICLR, 2019
  27. Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures.
    Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham Ruderma, Keith Anderson, Krishnamurthy Dvijotham, Nicolas Heess, Pushmeet Kohli.
    ICLR, 2019
  28. Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots.
    Banafsheh Rafiee, Sina Ghiassian, Adam White, Richard Sutton.
    AAMAS, 2019
  29. The Utility of Sparse Representations for Control in Reinforcement Learning.
    Vincent Liu, Raksha Kumaraswamy, Lei Le, Martha White.
    AAAI, 2019
  30. An Exponential Tail Bound for the Deleted Estimate.
    Abou-Moustafa Karim, Csaba Szepesvari.
    AAAI, 2019
  31. Variance Reduction in Monte Carlo Regret Minimization for Extensive Games using Baselines.
    Martin Schmid, Matej Moravcik, Neil Burch, Marc Lanctot, Rudolf Kadlec, Michael Bowling.
    AAAI, 2019
  32. Solving Large Extensive-Form Games with Strategy Constraints.
    Trevor Davis, Kevin Waugh, Michael Bowling.
    AAAI, 2019
  33. Meta-descent for Online, Continual Prediction.
    Andrew Jacobsen, Matthew Schlegel, Cameron Linke,Thomas Degris, Adam White, Martha White.
    AAAI, 2019


  1. An Off-policy Policy Gradient Theorem Using Emphatic Weightings PDF
    Ehsan Imani, Eric Graves and Martha White
    NeurIPS, 2018
  2. Context-dependent upper-confidence bounds for directed exploration PDF
    Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White
    NeurIPS, 2018
  3. Supervised autoencoders: Improving generalization performance with unsupervised regularizers PDF
    Lei Le, Andrew Patterson, Martha White
    NeurIPS, 2018
  4. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments PDF
    Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
    NeurIPS, 2018
  5. Non-delusional Q-learning and value-iteration PDF
    Tyler Lu, Dale Schuurmans, Craig Boutilier
    (BEST PAPER AWARD) NeurIPS, 2018
  6. PAC-Bayes bounds for stable algorithms with instance-dependent priors PDF
    Omar Rivasplata, Csaba Szepesvari, John Shawe-Taylor, Emilio Parrado-Hernandez, Shiliang Sun
    NeurIPS, 2018
  7. TopRank: A practical algorithm for online stochastic ranking PDF
    Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari
    NeurIPS, 2018
  8. Per-Decision Multi-step Temporal Difference Learning with Control Variates PDF
    Kristopher De Asis, Richard Sutton
    UAI, 2018
  9. Multi-step Reinforcement Learning: A Unifying Algorithm PDF
    Kristopher De Asis*, Juan-Fernando Hernandez-Garcia*, Gordon Zacharias Holland*, Richard Sutton
    AAAI, 2018