Publications
2024
-
Reward Centering PDFRLC, 2024
-
An Idiosyncrasy of Time-discretization in Reinforcement Learning PDFRLC, 2024
-
SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning PDFRLC, 2024
-
Demystifying the Recency Heuristic in Temporal-Difference LearningRLC, 2024
-
Investigating the Interplay of Prioritized Replay and GeneralizationRLC, 2024
-
The Cross-environment Hyperparameter Setting Benchmark for Reinforcement LearningRLC, 2024
-
Harnessing Discrete Representations for Continual Reinforcement LearningRLC, 2024
-
The Cliff of Overcommitment with Policy Gradient Step SizesRLC, 2024
-
Mitigating the Curse of Horizon in Monte-Carlo ReturnsRLC, 2024
-
Learning to Optimize for Reinforcement LearningRLC, 2024
-
More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingRLC, 2024
-
Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning,RLC, 2024
-
Weight Clipping for Deep Continual and Reinforcement LearningRLC, 2024
-
Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models PDFJAIR, 2024
2021
-
A Distribution-dependent Analysis of Meta LearningICML, 2021
-
Meta-Thompson SamplingICML, 2021
-
Average-Reward Off-Policy Policy Evaluation with Function ApproximationICML, 2021
-
Learning and Planning in Average-Reward Markov Decision ProcessesICML, 2021
-
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form GamesICML, 2021
-
Characterizing the Gap Between Actor-Critic and Policy GradientICML, 2021
-
On the Optimality of Batch Policy Optimization AlgorithmsICML, 2021
-
Leveraging Non-uniformity in First-order Non-convex OptimizationICML, 2021
-
Randomized Exploration in Reinforcement Learning with General Value Function Approximation PDFICML, 2021
-
Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization PDFICML, 2021
-
Differentially Private Approximations of a Convex Hull in Low Dimensions PDFITC, 2021
-
Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations OnlineICLR, 2021
2020
-
An implicit function learning approach for parametric modal regressionNeurIPS, 2020
-
Escaping the Gravitational Pull of SoftmaxNeurIPS, 2020
-
Marginal Utility for Planning in Continuous or Large Discrete Action SpacesNeurIPS, 2020
-
Efficient Planning in Large MDPs with Weak Linear Function ApproximationNeurIPS, 2020
-
Low-Variance and Zero-Variance Baselines for Extensive-Form GamesICML, 2020
-
Gradient Temporal-Difference Learning with Regularized CorrectionsICML, 2020
-
Batch Stationary Distribution EstimationICML, 2020
-
Selective Dyna-style Planning Under Limited Model CapacityICML, 2020
-
On the Global Convergence Rates of Softmax Policy Gradient MethodsICML, 2020
-
Domain Aggregation Networks for Multi-Source Domain AdaptationICML, 2020
-
Model-Based Reinforcement Learning with Value-Targeted Regression PDFICML, 2020
-
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey PDFJMLR, 2020
-
Solving Zero-Sum Imperfect Information Games Using Alternative Link Functions: An Analysis of f-Regression Counterfactual Regret Minimization PDFAAMAS, 2020
-
Maximizing Information Gain via Prediction RewardsAAMAS, 2020
-
Improving Performance in Reinforcement Learning by Breaking Generalization in Neural NetworksAAMAS, 2020
-
Multi Type Mean Field Reinforcement Learning PDFAAMAS, 2020
-
Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients PDFAAMAS, 2020
-
Useful Policy Invariance Shaping from Arbitrary Advice PDFAAMAS, 2020
-
Training Recurrent Neural Networks Online by Learning Explicit State Variables PDFICLR, 2020
-
Frequency-based Search-control in Dyna PDFICLR, 2020
-
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning PDFICLR, 2020
-
GenDICE: Generalized Offline Estimation of Stationary Values PDFICLR, 2020
-
Count-Based Exploration with the Successor RepresentationAAAI, 2020
-
Guiding CDCL SAT Search via Random Exploration amid Conflict DepressionAAAI, 2020
-
Gamma-Nets: Generalizing Value Estimation Over TimescaleAAAI, 2020
-
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement LearningAAAI, 2020
2019
-
Ease-of-Teaching and Language Structure from Emergent Communication. PDFNeurIPS, 2019
-
Importance Resampling for Off-policy Prediction. PDFNeurIPS, 2019
-
Meta-Learning Representations for Continual Learning. PDFNeurIPS, 2019
-
Learning Macroscopic Brain Connectomes via Group-Sparse Factorization.NeurIPS, 2019
-
Exponential Family Estimation via Adversarial Dynamics Embedding. PDFNeurIPS, 2019
-
Maximum Entropy Monte-Carlo Planning.NeurIPS, 2019
-
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making.NeurIPS, 2019
-
Invertible Convolutional Flow.NeurIPS, 2019
-
A Geometric Perspective on Optimal Representations for Reinforcement Learning. PDFNeurIPS, 2019
-
Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging.NeurIPS, 2019
-
Detecting Overfitting via Adversarial Examples. PDFNeurIPS, 2019
-
Planning with Expectation Models. PDFIJCAI, 2019
-
Perturbed-History Exploration in Stochastic Multi-Armed Bandits. PDFIJCAI, 2019
-
Advantage Amplification in Slowly Evolving Latent-State Environments. PDFIJCAI, 2019
-
On Principled Entropy Exploration in Policy Optimization. PDFIJCAI, 2019
-
Hill Climbing on Value Estimates for Search-control in Dyna.IJCAI, 2019
-
BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback.UAI, 2019
-
Perturbed-History Exploration in Stochastic Linear Bandits.UAI, 2019
-
CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration.ICML, 2019
-
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.ICML, 2019
-
Online Learning to Rank with Features.ICML, 2019
-
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning.ICML, 2019
-
The Value Function Polytope in Reinforcement Learning.ICML, 2019
-
Understanding the Impact of Entropy on Policy Optimization.ICML, 2019
-
Learning to Generalize from Sparse and Underspecified Rewards.ICML, 2019
-
Two-Timescale Networks for Nonlinear Value Function Approximation.ICLR, 2019
-
Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures.ICLR, 2019
-
Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots.AAMAS, 2019
-
The Utility of Sparse Representations for Control in Reinforcement Learning.AAAI, 2019
-
An Exponential Tail Bound for the Deleted Estimate.AAAI, 2019
-
Variance Reduction in Monte Carlo Regret Minimization for Extensive Games using Baselines.AAAI, 2019
-
Solving Large Extensive-Form Games with Strategy Constraints.AAAI, 2019
-
Meta-descent for Online, Continual Prediction.AAAI, 2019
2018
-
An Off-policy Policy Gradient Theorem Using Emphatic Weightings PDFNeurIPS, 2018
-
Context-dependent upper-confidence bounds for directed exploration PDFNeurIPS, 2018
-
Supervised autoencoders: Improving generalization performance with unsupervised regularizers PDFNeurIPS, 2018
-
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments PDFNeurIPS, 2018
-
Non-delusional Q-learning and value-iteration PDF(BEST PAPER AWARD) NeurIPS, 2018
-
PAC-Bayes bounds for stable algorithms with instance-dependent priors PDFNeurIPS, 2018
-
TopRank: A practical algorithm for online stochastic ranking PDFNeurIPS, 2018
-
Per-Decision Multi-step Temporal Difference Learning with Control Variates PDFUAI, 2018
-
Multi-step Reinforcement Learning: A Unifying Algorithm PDFAAAI, 2018