awesome-offline-rl |
Haruka Kiyohara | | | (Cornell University) |
Yuta Saito | | | (Hanjuku-kaso Co., Ltd. / Cornell University) |
awesome-offline-rl / Table of Contents |
Papers | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents / Papers |
Review/Survey/Position Papers | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents / Papers / Review/Survey/Position Papers |
Offline RL | 931 | 6 months ago | |
Off-Policy Evaluation and Learning | 931 | 6 months ago | |
Related Reviews | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents / Papers |
Offline RL: Theory/Methods | 931 | 6 months ago | |
Offline RL: Benchmarks/Experiments | 931 | 6 months ago | |
Offline RL: Applications | 931 | 6 months ago | |
Off-Policy Evaluation and Learning: Theory/Methods | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents / Papers / Off-Policy Evaluation and Learning: Theory/Methods |
Off-Policy Evaluation: Contextual Bandits | 931 | 6 months ago | |
Off-Policy Evaluation: Reinforcement Learning | 931 | 6 months ago | |
Off-Policy Learning | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents / Papers |
Off-Policy Evaluation and Learning: Benchmarks/Experiments | 931 | 6 months ago | |
Off-Policy Evaluation and Learning: Applications | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents |
Open Source Software/Implementations | 931 | 6 months ago | |
Blog/Podcast | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents / Blog/Podcast |
Blog | 931 | 6 months ago | |
Podcast | 931 | 6 months ago | |
awesome-offline-rl / Table of Contents |
Related Workshops | 931 | 6 months ago | |
Tutorials/Talks/Lectures | 931 | 6 months ago | |
awesome-offline-rl / Papers / Review/Survey/Position Papers |
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | | | |
A Survey on Offline Model-Based Reinforcement Learning | | | |
Foundation Models for Decision Making: Problems, Methods, and Opportunities | | | |
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems | | | |
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems | | | |
A Review of Off-Policy Evaluation in Reinforcement Learning | | | |
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems | | | |
Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization | | | |
Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives | | | |
A Survey on Transformers in Reinforcement Learning | | | |
Deep Reinforcement Learning: Opportunities and Challenges | | | |
A Survey on Model-based Reinforcement Learning | | | |
Survey on Fair Reinforcement Learning: Theory and Practice | | | |
Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation | | | |
A Survey of Generalisation in Deep Reinforcement Learning | | | |
awesome-offline-rl / Papers / Offline RL: Theory/Methods |
Value-Aided Conditional Supervised Learning for Offline RL | | | |
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning | | | |
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching | | | |
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning | | | |
Context-Former: Stitching via Latent Conditioned Sequence Modeling | | | |
Adversarially Trained Actor Critic for offline CMDPs | | | |
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization | | | |
Solving Continual Offline Reinforcement Learning with Decision Transformer | | | |
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning | | | |
Reframing Offline Reinforcement Learning as a Regression Problem | | | |
Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback | | | |
Policy-regularized Offline Multi-objective Reinforcement Learning | | | |
Differentiable Tree Search in Latent State Space | | | |
Learning from Sparse Offline Datasets via Conservative Density Estimation | | | |
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model | | | |
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning | | | |
Critic-Guided Decision Transformer for Offline Reinforcement Learning | | | |
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning | | | |
Neural Network Approximation for Pessimistic Offline Reinforcement Learning | | | |
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning | | | |
The Generalization Gap in Offline Reinforcement Learning | | | |
Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills | | | |
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator | | | |
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization | | | |
Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning | | | |
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning | | | |
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees | | | |
Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning | | | |
Hierarchical Decision Transformer | | | |
Prompt-Tuning Decision Transformer with Preference Ranking | | | |
Context Shift Reduction for Offline Meta-Reinforcement Learning | | | |
Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization | | | |
Score Models for Offline Goal-Conditioned Reinforcement Learning | | | |
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity | | | |
Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective | | | |
Rethinking Decision Transformer via Hierarchical Reinforcement Learning | | | |
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning | | | |
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models | | | |
SERA: Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning | | | |
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage | | | |
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning | | | |
CROP: Conservative Reward for Model-based Offline Policy Optimization | | | |
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption | | | |
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias | | | |
Boosting Continuous Control with Consistency Policy | | | |
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning | | | |
Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning | | | |
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning | | | |
Self-Confirming Transformer for Locally Consistent Online Adaptation in Multi-Agent Reinforcement Learning | | | |
Learning to Reach Goals via Diffusion | | | |
Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making | | | |
Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning | | | |
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning | | | |
Reasoning with Latent Diffusion in Offline Reinforcement Learning | | | |
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance | | | |
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness | | | |
Robust Offline Reinforcement Learning -- Certify the Confidence Interval | | | |
Stackelberg Batch Policy Learning | | | |
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps | | | |
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions | | | |
DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning | | | |
Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration | | | |
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning | | | |
Reasoning with Latent Diffusion in Offline Reinforcement Learning | | | |
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance | | | |
Multi-Objective Decision Transformers for Offline Reinforcement Learning | | | |
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning | | | |
Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations | | | |
PASTA: Pretrained Action-State Transformer Agents | | | |
Towards A Unified Agent with Foundation Models | | | |
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning | | | |
Offline Reinforcement Learning with Imbalanced Datasets | | | |
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning | | | |
Elastic Decision Transformer | | | |
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning | | | |
Is RLHF More Difficult than Standard RL? | | | |
Supervised Pretraining Can Learn In-Context Reinforcement Learning | | | |
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching | | | |
Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery | | | |
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning | | | |
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting | | | |
Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning | | | |
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning | | | |
HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach | | | |
Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration | | | |
In-Sample Policy Iteration for Offline Reinforcement Learning | | | |
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning | | | |
Offline Prioritized Experience Replay | | | |
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding | | | |
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation | | | |
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning | | | |
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism | | | |
MADiff: Offline Multi-agent Learning with Diffusion Models | | | |
Provable Offline Reinforcement Learning with Human Feedback | | | |
Think Before You Act: Decision Transformers with Internal Working Memory | | | |
Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning | | | |
Offline Primal-Dual Reinforcement Learning for Linear MDPs | | | |
Federated Offline Policy Learning with Heterogeneous Observational Data | | | |
Offline Reinforcement Learning with Additional Covering Distributions | | | |
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning | | | |
Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems | | | |
Federated Ensemble-Directed Offline Reinforcement Learning | | | |
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies | | | |
Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments | | | |
Reinforcement Learning from Passive Data via Latent Intentions | | | [ ] |
Uncertainty-driven Trajectory Truncation for Model-based Offline Reinforcement Learning | | | |
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment | | | |
Batch Quantum Reinforcement Learning | | | |
Accelerating exploration and representation learning with offline pre-training | | | |
On Context Distribution Shift in Task Representation Learning for Offline Meta RL | | | |
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning | | | |
Learning Excavation of Rigid Objects with Offline Reinforcement Learning | | | |
Goal-conditioned Offline Reinforcement Learning through State Space Partitioning | | | |
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies | | | |
Deploying Offline Reinforcement Learning with Human Feedback | | | |
Synthetic Experience Replay | | | |
ENTROPY: Environment Transformer and Offline Policy Optimization | | | |
Graph Decision Transformer | | | |
Selective Uncertainty Propagation in Offline RL | | | |
Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning | | | |
Skill Decision Transformer | | | |
Guiding Online Reinforcement Learning with Action-Free Offline Pretraining | | | |
SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning | | | |
APAC: Authorized Probability-controlled Actor-Critic For Offline Reinforcement Learning | | | |
Designing an offline reinforcement learning objective from scratch | | | |
Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning | | | |
Learning to View: Decision Transformers for Active Object Detection | | | |
Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning | | | |
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization | | | |
Contextual Conservative Q-Learning for Offline Reinforcement Learning | | | |
Offline Policy Optimization in RL with Variance Regularizaton | | | |
Transformer in Transformer as Backbone for Deep Reinforcement Learning | | | |
SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning | | | |
Revisiting the Minimalist Approach to Offline Reinforcement Learning | | | |
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning | | | |
Supported Value Regularization for Offline Reinforcement Learning | | | |
Conservative State Value Estimation for Offline Reinforcement Learning | | | |
Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning | | | |
Adversarial Model for Offline Reinforcement Learning | | | |
Percentile Criterion Optimization in Offline Reinforcement Learning | | | |
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning | | | |
HIQL: Offline Goal-Conditioned RL with Latent States as Actions | | | |
Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning | | | |
Offline RL with Discrete Proxy Representations for Generalizability in POMDPs | | | |
Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization | | | |
Bi-Level Offline Policy Optimization with Limited Exploration | | | |
Provably (More) Sample-Efficient Offline RL with Options | | | |
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage | | | |
AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation | | | |
Budgeting Counterfactual for Offline RL | | | |
Efficient Diffusion Policies for Offline Reinforcement Learning | | | |
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning | | | |
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data | | | |
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage | | | |
Provably Efficient Offline Reinforcement Learning in Regular Decision Processes | | | |
Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability | | | |
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling and Beyond | | | |
Conservative Offline Policy Adaptation in Multi-Agent Games | | | |
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL | | | |
Survival Instinct in Offline Reinforcement Learning | | | |
Learning from Visual Observation via Offline Pretrained State-to-Go Transformer | | | |
Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization | | | |
Learning to Influence Human Behavior with Offline Reinforcement Learning | | | |
Residual Q-Learning: Offline and Online Policy Customization without Value | | | |
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning | | | |
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets | | | |
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL | | | |
Corruption-Robust Offline Reinforcement Learning with General Function Approximation | | | |
Learning to Modulate pre-trained Models in RL | | | |
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning | | | |
One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning | | | |
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning | | | |
Mutual Information Regularized Offline Reinforcement Learning | | | |
Offline RL With Heteroskedastic Datasets and Support Constraints | | | |
Offline Reinforcement Learning with Differential Privacy | | | |
Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples | | | |
Reining Generalization in Offline Reinforcement Learning via Representation Distinction | | | |
VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning | | | |
SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations | | | |
Hierarchical Diffusion for Offline Decision Making | | | |
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations | | | |
Safe Offline Reinforcement Learning with Real-Time Budget Constraints | | | |
Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints | | | |
A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning | | | |
Anti-Exploration by Random Network Distillation | | | |
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning | | | |
PASTA: Pessimistic Assortment Optimization | | | |
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning | | | |
Supported Trust Region Optimization for Offline Reinforcement Learning | | | |
Principled Offline RL in the Presence of Rich Exogenous Information | | | |
Efficient Online Reinforcement Learning with Offline Data | | | |
Boosting Offline Reinforcement Learning with Action Preference Query | | | |
Model-based Offline Reinforcement Learning with Count-based Conservatism | | | |
Constrained Decision Transformer for Offline Safe Reinforcement Learning | | | |
Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning | | | |
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources | | | |
What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? | | | |
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning | | | |
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL | | | |
Distance Weighted Supervised Learning for Offline Interaction Data | | | |
Masked Trajectory Models for Prediction, Representation, and Control | | | |
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning | | | |
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models | | | |
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap | | | |
Future-conditioned Unsupervised Pretraining for Decision Transformer | | | |
PAC-Bayesian Offline Contextual Bandits With Guarantees | | | |
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL | | | |
Jump-Start Reinforcement Learning | | | [ ] |
Learning Temporally AbstractWorld Models without Online Experimentation | | | |
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback | | | |
Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation | | | |
Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories | | | |
Actor-Critic Alignment for Offline-to-Online Reinforcement Learning | | | |
Leveraging Offline Data in Online Reinforcement Learning | | | |
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators | | | |
Offline Learning in Markov Games with General Function Approximation | | | |
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation | | | |
Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL | | | |
Confidence-Conditioned Value Functions for Offline Reinforcement Learning | | | |
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes | | | [ ] |
Is Conditional Generative Modeling all you need for Decision-Making? | | | [ ] |
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization | | | |
Extreme Q-Learning: MaxEnt RL without Entropy | | | |
Dichotomy of Control: Separating What You Can Control from What You Cannot | | | |
From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data | | | |
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation | | | |
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian | | | |
The In-Sample Softmax for Offline Reinforcement Learning | | | |
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training | | | [ ] [ ] |
Does Zero-Shot Reinforcement Learning Exist? | | | |
Behavior Prior Representation learning for Offline Reinforcement Learning | | | |
Mind the Gap: Offline Policy Optimization for Imperfect Rewards | | | |
Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement | | | |
User-Interactive Offline Reinforcement Learning | | | |
Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data | | | |
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient | | | [ ] |
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting | | | |
Efficient Offline Policy Optimization with a Learned Model | | | |
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning | | | |
When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning | | | |
In-sample Actor Critic for Offline Reinforcement Learning | | | |
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning | | | |
Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization | | | |
Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling | | | |
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient | | | |
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game | | | |
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes | | | |
Hyper-Decision Transformer for Efficient Online Policy Adaptation | | | |
Efficient Planning in a Compact Latent Action Space | | | |
Preference Transformer: Modeling Human Preferences using Transformers for RL | | | [ ] |
Behavior Proximal Policy Optimization | | | |
Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards | | | |
The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning | | | |
Decision Transformer under Random Frame Dropping | | | |
Policy Expansion for Bridging Offline-to-Online Reinforcement Learning | | | |
Finetuning Offline World Models in the Real World | | | |
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples | | | |
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning | | | |
Safe Policy Improvement for POMDPs via Finite-State Controllers | | | |
Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning | | | |
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation | | | |
Contrastive Example-Based Control | | | |
Curriculum Offline Reinforcement Learning | | | |
Offline Reinforcement Learning with On-Policy Q-Function Regularization | | | |
Model-based Offline Policy Optimization with Adversarial Network | | | |
Efficient experience replay architecture for offline reinforcement learning | | | |
Automatic Trade-off Adaptation in Offline RL | | | |
Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling | | | |
Latent Variable Representation for Reinforcement Learning | | | |
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning | | | |
State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning | | | |
Masked Autoencoding for Scalable and Generalizable Decision Making | | | |
Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning | | | |
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size | | | |
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows | | | |
Model-based Trajectory Stitching for Improved Offline Reinforcement Learning | | | |
Offline Reinforcement Learning with Adaptive Behavior Regularization | | | |
Contextual Transformer for Offline Meta Reinforcement Learning | | | |
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning | | | |
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data | | | |
Contrastive Value Learning: Implicit Models for Simple Offline RL | | | |
Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping | | | |
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian | | | |
Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information | | | |
Provable Safe Reinforcement Learning with Binary Feedback | | | |
Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision | | | |
Implicit Offline Reinforcement Learning via Supervised Learning | | | |
Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation | | | |
Boosting Offline Reinforcement Learning via Data Rebalancing | | | |
ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning | | | [ ] |
State Advantage Weighting for Offline RL | | | |
Blessing from Experts: Super Reinforcement Learning in Confounded Environments | | | |
DCE: Offline Reinforcement Learning With Double Conservative Estimates | | | |
On the Opportunities and Challenges of using Animals Videos in Reinforcement Learning | | | |
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes | | | |
Exploiting Reward Shifting in Value-Based Deep RL | | | |
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation | | | |
C^2:Co-design of Robots via Concurrent Networks Coupling Online and Offline Reinforcement Learning | | | |
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments | | | |
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity | | | |
AdaCat: Adaptive Categorical Discretization for Autoregressive Models | | | |
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning | | | |
Offline Reinforcement Learning at Multiple Frequencies | | | [ ] |
General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States | | | |
Behavior Transformers: Cloning k modes with one stone | | | |
Contrastive Learning as Goal-Conditioned Reinforcement Learning | | | |
Federated Offline Reinforcement Learning | | | |
Provable Benefit of Multitask Representation Learning in Reinforcement Learning | | | |
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward | | | |
Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games | | | |
Offline Reinforcement Learning with Causal Structured World Models | | | |
Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning | | | |
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL | | | |
Byzantine-Robust Online and Offline Distributed Reinforcement Learning | | | |
Model Generation with Provable Coverability for Offline Reinforcement Learning | | | |
You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments | | | |
Multi-Game Decision Transformers | | | |
Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning | | | |
Distance-Sensitive Offline Reinforcement Learning | | | |
No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL | | | |
How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation | | | |
Offline Visual Representation Learning for Embodied Navigation | | | |
Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers | | | |
BATS: Best Action Trajectory Stitching | | | |
Settling the Sample Complexity of Model-Based Offline Reinforcement Learning | | | |
PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations | | | |
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps | | | |
Meta Reinforcement Learning for Adaptive Control: An Offline Approach | | | |
The Efficacy of Pessimism in Asynchronous Q-Learning | | | |
Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation | | | |
A Regularized Implicit Policy for Offline Reinforcement Learning | | | |
Reinforcement Learning in Possibly Nonstationary Environments | | | [ ] |
Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons | | | |
VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning | | | |
Retrieval-Augmented Reinforcement Learning | | | |
Online Decision Transformer | | | |
Transferred Q-learning | | | |
Settling the Communication Complexity for Distributed Offline Reinforcement Learning | | | |
Offline Reinforcement Learning with Realizability and Single-policy Concentrability | | | |
Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL | | | |
Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning | | | |
Can Wikipedia Help Offline Reinforcement Learning? | | | |
MOORe: Model-based Offline-to-Online Reinforcement Learning | | | |
Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning | | | |
Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning | | | |
Single-Shot Pruning for Offline Reinforcement Learning | | | |
Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations | | | [ ] [ ] |
Data-Driven Offline Decision-Making via Invariant Representation Learning | | | |
Bellman Residual Orthogonalization for Offline Reinforcement Learning | | | |
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP | | | |
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing | | | |
On Gap-dependent Bounds for Offline Reinforcement Learning | | | |
Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus | | | |
Supported Policy Optimization for Offline Reinforcement Learning | | | |
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning | | | |
Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters | | | |
When does return-conditioned supervised learning work for offline reinforcement learning? | | | |
Pessimism for Offline Linear Contextual Bandits using ℓp Confidence Sets | | | |
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning | | | |
When is Offline Two-Player Zero-Sum Markov Game Solvable? | | | |
Robust Reinforcement Learning using Offline Data | | | |
Bidirectional Learning for Offline Infinite-width Model-based Optimization | | | |
Mildly Conservative Q-Learning for Offline Reinforcement Learning | | | |
Bootstrapped Transformer for Offline Reinforcement Learning | | | |
LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation | | | |
Latent-Variable Advantage-Weighted Policy Optimization for Offline RL | | | |
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination | | | |
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions | | | |
Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression | | | |
Dual Generator Offline Reinforcement Learning | | | |
MoCoDA: Model-based Counterfactual Data Augmentation | | | |
A Policy-Guided Imitation Approach for Offline Reinforcement Learning | | | [ ] |
A Unified Framework for Alternating Offline Model Training and Policy Learning | | | |
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief | | | |
S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning | | | |
ASPiRe:Adaptive Skill Priors for Reinforcement Learning | | | |
Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning | | | |
Offline Multi-Agent Reinforcement Learning with Knowledge Distillation | | | |
Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer | | | |
Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning | | | |
Offline RL Policies Should be Trained to be Adaptive | | | |
Adversarially Trained Actor Critic for Offline Reinforcement Learning | | | |
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets | | | |
How to Leverage Unlabeled Data in Offline Reinforcement Learning | | | |
Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification | | | |
Learning Pseudometric-based Action Representations for Offline Reinforcement Learning | | | |
Offline Meta-Reinforcement Learning with Online Self-Supervision | | | |
Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching | | | |
Constrained Offline Policy Optimization | | | |
Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations | | | |
Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes | | | |
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity | | | |
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach | | | |
Prompting Decision Transformer for Few-Shot Policy Generalization | | | |
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning | | | |
On the Role of Discount Factor in Offline Reinforcement Learning | | | |
Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics | | | |
Representation Learning for Online and Offline RL in Low-rank MDPs | | | [ ] |
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage | | | [ ] |
Revisiting Design Choices in Model-Based Offline Reinforcement Learning | | | |
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization | | | |
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation | | | |
POETREE: Interpretable Policy Learning with Adaptive Decision Trees | | | |
Planning in Stochastic Environments with a Learned Model | | | |
Offline Reinforcement Learning with Value-based Episodic Memory | | | |
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? | | | |
Learning Value Functions from Undirected State-only Experience | | | [ ] [ ] |
Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL | | | |
Offline Reinforcement Learning with Implicit Q-Learning | | | |
RvS: What is Essential for Offline RL via Supervised Learning? | | | |
Pareto Policy Pool for Model-based Offline Reinforcement Learning | | | |
CrowdPlay: Crowdsourcing Human Demonstrations for Offline Learning | | | |
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks | | | |
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning | | | |
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism | | | |
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning | | | |
Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization | | | |
Generalized Decision Transformer for Offline Hindsight Information Matching | | | [ ] |
Model-Based Offline Meta-Reinforcement Learning with Regularization | | | |
AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale | | | [ ] |
Dealing with the Unknown: Pessimistic Offline Reinforcement Learning | | | |
You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL | | | |
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning | | | |
A Workflow for Offline Model-Free Robotic Reinforcement Learning | | | [ ] |
Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes | | | [ ] [ ] [ ] |
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions | | | |
Offline Reinforcement Learning with Representations for Actions | | | |
Towards Off-Policy Learning for Ranking Policies with Logged Feedback | | | |
Safe Offline Reinforcement Learning Through Hierarchical Policies | | | |
TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets | | | |
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks | | | |
Model Selection in Batch Policy Optimization | | | |
Learning Contraction Policies from Offline Data | | | |
CoMPS: Continual Meta Policy Search | | | |
MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance | | | |
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks | | | |
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms | | | |
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation | | | [ ] |
UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning | | | |
Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning | | | |
Batch Reinforcement Learning from Crowds | | | |
SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning | | | |
Safely Bridging Offline and Online Reinforcement Learning | | | |
Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information | | | |
Value Penalized Q-Learning for Recommender Systems | | | |
Offline Reinforcement Learning with Soft Behavior Regularization | | | |
Planning from Pixels in Environments with Combinatorially Hard Search Spaces | | | |
StARformer: Transformer with State-Action-Reward Representations | | | |
Offline RL With Resource Constrained Online Deployment | | | [ ] |
Lifelong Robotic Reinforcement Learning by Retaining Experiences | | | [ ] |
Dual Behavior Regularized Reinforcement Learning | | | |
DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning | | | [ ] [ ] |
DROMO: Distributionally Robust Offline Model-based Policy Optimization | | | |
Implicit Behavioral Cloning | | | |
Reducing Conservativeness Oriented Offline Reinforcement Learning | | | |
Policy Gradients Incorporating the Future | | | |
Offline Decentralized Multi-Agent Reinforcement Learning | | | |
OPAL: Offline Preference-Based Apprenticeship Learning | | | [ ] |
Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning | | | |
Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning | | | |
The Least Restriction for Offline Reinforcement Learning | | | |
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble | | | |
Causal Reinforcement Learning using Observational and Interventional Data | | | |
On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data | | | |
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL | | | [ ] |
On Multi-objective Policy Optimization as a Tool for Reinforcement Learning | | | |
Offline Reinforcement Learning as Anti-Exploration | | | |
Corruption-Robust Offline Reinforcement Learning | | | |
Offline Inverse Reinforcement Learning | | | |
Heuristic-Guided Reinforcement Learning | | | |
Reinforcement Learning as One Big Sequence Modeling Problem | | | |
Decision Transformer: Reinforcement Learning via Sequence Modeling | | | |
Model-Based Offline Planning with Trajectory Pruning | | | |
InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem | | | |
Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm | | | [ ] |
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale | | | [ ] |
Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L) | | | |
Regularized Behavior Value Estimation | | | |
Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning | | | |
Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning | | | |
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning | | | |
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning | | | |
Continuous Doubly Constrained Batch Reinforcement Learning | | | |
Q-Value Weighted Regression: Reinforcement Learning with Limited Data | | | |
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency | | | |
Fast Rates for the Regret of Offline Reinforcement Learning | | | [ ] |
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment | | | [ ] |
Weighted Model Estimation for Offline Model-based Reinforcement Learning | | | |
A Minimalist Approach to Offline Reinforcement Learning | | | |
Conservative Offline Distributional Reinforcement Learning | | | |
Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL | | | |
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning | | | |
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning | | | |
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs | | | |
Offline Reinforcement Learning as One Big Sequence Modeling Problem | | | |
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism | | | [ ] |
Offline Reinforcement Learning with Reverse Model-based Imagination | | | |
Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies | | | |
Nearly Horizon-Free Offline Reinforcement Learning | | | |
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning | | | |
Online and Offline Reinforcement Learning by Planning with a Learned Model | | | |
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning | | | |
Offline RL Without Off-Policy Evaluation | | | |
Offline Model-based Adaptable Policy Learning | | | |
COMBO: Conservative Offline Model-Based Policy Optimization | | | |
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators | | | |
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction | | | |
Bellman-consistent Pessimism for Offline Reinforcement Learning | | | [ ] |
The Difficulty of Passive Learning in Deep Reinforcement Learning | | | |
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble | | | |
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism | | | |
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL | | | |
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills | | | [ ] |
Is Pessimism Provably Efficient for Offline RL? | | | [ ] |
Representation Matters: Offline Pretraining for Sequential Decision Making | | | |
Offline Reinforcement Learning with Pseudometric Learning | | | |
Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment | | | |
Offline Contextual Bandits with Overparameterized Models | | | |
Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning | | | |
Offline Reinforcement Learning with Fisher Divergence Critic Regularization | | | |
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation | | | |
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning | | | |
Vector Quantized Models for Planning | | | |
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL | | | [ ] |
Instabilities of Offline RL with Pre-Trained Neural Representation | | | |
Offline Meta-Reinforcement Learning with Advantage Weighting | | | |
Model-Based Offline Planning | | | [ ] |
Batch Reinforcement Learning Through Continuation Method | | | |
Model-Based Visual Planning with Self-Supervised Functional Distances | | | |
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization | | | |
Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization | | | |
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs | | | |
What are the Statistical Limits of Offline RL with Linear Function Approximation? | | | [ ] |
Reset-Free Lifelong Learning with Skill-Space Planning | | | [ ] |
Risk-Averse Offline Reinforcement Learning | | | |
Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning | | | |
Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework | | | |
Efficient Self-Supervised Data Collection for Offline Robot Learning | | | |
Boosting Offline Reinforcement Learning with Residual Generative Modeling | | | |
BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning | | | |
Behavior Constraining in Weight Space for Offline Reinforcement Learning | | | |
Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents | | | |
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? | | | |
Reinforcement Learning via Fenchel-Rockafellar Duality | | | [ ] |
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | | | [ ] [ ] [ ] |
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient | | | |
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting | | | |
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient | | | |
Batch Value-function Approximation with Only Realizability | | | |
DRIFT: Deep Reinforcement Learning for Functional Software Testing | | | |
Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains | | | |
Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion | | | [ ] |
Semi-Supervised Reward Learning for Offline Reinforcement Learning | | | |
Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation | | | |
Offline Reinforcement Learning from Images with Latent Space Models | | | [ ] |
POPO: Pessimistic Offline Policy Optimization | | | |
Reinforcement Learning with Videos: Combining Offline Observations with Interaction | | | |
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones | | | [ ] |
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning | | | |
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning | | | [ ] |
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning | | | |
Learning Dexterous Manipulation from Suboptimal Experts | | | [ ] |
The Reinforcement Learning-Based Multi-Agent Cooperative Approach for the Adaptive Speed Regulation on a Metallurgical Pickling Line | | | |
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning | | | [ ] |
Offline Meta Learning of Exploration | | | |
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL | | | |
Hyperparameter Selection for Offline Reinforcement Learning | | | |
Interpretable Control by Reinforcement Learning | | | |
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning | | | [ ] |
Accelerating Online Reinforcement Learning with Offline Datasets | | | [ ] [ ] |
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction | | | [ ] |
Critic Regularized Regression | | | |
Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration | | | |
Conservative Q-Learning for Offline Reinforcement Learning | | | [ ] [ ] [ ] |
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning | | | |
MOPO: Model-based Offline Policy Optimization | | | [ ] |
MOReL: Model-Based Offline Reinforcement Learning | | | [ ] |
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation | | | |
Multi-task Batch Reinforcement Learning with Metric Learning | | | |
Counterfactual Data Augmentation using Locally Factored Dynamics | | | [ ] |
On Reward-Free Reinforcement Learning with Linear Function Approximation | | | |
Constrained Policy Improvement for Safe and Efficient Reinforcement Learning | | | |
BRPO: Batch Residual Policy Optimization | | | [ ] |
Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning | | | |
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning | | | [ ] [ ] [ ] |
Accelerating Reinforcement Learning with Learned Skill Priors | | | |
PLAS: Latent Action Space for Offline Reinforcement Learning | | | [ ] [ ] |
Scaling data-driven robotics with reward sketching and batch reinforcement learning | | | [ ] |
Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping | | | |
Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration | | | |
Behavior Regularized Offline Reinforcement Learning | | | |
Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift | | | |
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning | | | |
AlgaeDICE: Policy Gradient from Arbitrary Experience | | | |
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction | | | [ ] [ ] [ ] |
Off-Policy Deep Reinforcement Learning without Exploration | | | |
Safe Policy Improvement with Baseline Bootstrapping | | | |
Information-Theoretic Considerations in Batch Reinforcement Learning | | | |
Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents | | | |
Safe Policy Improvement with Soft Baseline Bootstrapping | | | |
Importance Weighted Transfer of Samples in Reinforcement Learning | | | |
Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | | | [ ] |
Off-Policy Policy Gradient with State Distribution Correction | | | |
Behavioral Cloning from Observation | | | |
Diverse Exploration for Fast and Safe Policy Improvement | | | |
Deep Exploration via Bootstrapped DQN | | | |
Safe Policy Improvement by Minimizing Robust Baseline Regret | | | |
Residential Demand Response Applications Using Batch Reinforcement Learning | | | |
Structural Return Maximization for Reinforcement Learning | | | |
Simultaneous Perturbation Algorithms for Batch Off-Policy Search | | | |
Guided Policy Search | | | |
Off-Policy Actor-Critic | | | |
PAC-Bayesian Policy Evaluation for Reinforcement Learning | | | |
Tree-Based Batch Mode Reinforcement Learning | | | |
Neural Fitted Q Iteration–First Experiences with a Data Efficient Neural Reinforcement Learning Method | | | |
Off-Policy Temporal-Difference Learning with Function Approximation | | | |
awesome-offline-rl / Papers / Offline RL: Benchmarks/Experiments |
ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning | | | |
Pearl: A Production-ready Reinforcement Learning Agent | | | |
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models | | | |
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning | | | |
Datasets and Benchmarks for Offline Safe Reinforcement Learning | | | |
Improving and Benchmarking Offline Reinforcement Learning Algorithms | | | |
Benchmarks and Algorithms for Offline Preference-Based Reward Learning | | | |
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks | | | |
CORL: Research-oriented Deep Offline Reinforcement Learning Library | | | [ ] |
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | | | [ ] |
Train Offline, Test Online: A Real Robot Learning Benchmark | | | |
Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation | | | |
Real World Offline Reinforcement Learning with Realistic Data Source | | | [ ] [ ] |
Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets | | | |
B2RL: An open-source Dataset for Building Batch Reinforcement Learning | | | |
An Empirical Study of Implicit Regularization in Deep Offline RL | | | |
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | | | |
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning | | | [ ] |
The Challenges of Exploration for Offline Reinforcement Learning | | | |
Offline Equilibrium Finding | | | [ ] |
Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning | | | |
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data | | | |
Dungeons and Data: A Large-Scale NetHack Dataset | | | |
NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning | | | [ ] [ ] |
A Closer Look at Offline RL Agents | | | |
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis | | | |
On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning | | | |
Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters | | | |
d3rlpy: An Offline Deep Reinforcement Learning Library | | | [ ] |
Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning | | | [ ] |
Interpretable performance analysis towards offline reinforcement learning: A dataset perspective | | | |
Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning | | | |
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning | | | [ ] |
Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning | | | |
Offline Reinforcement Learning Hands-On | | | |
D4RL: Datasets for Deep Data-Driven Reinforcement Learning | | | [ ] [ ] [ ] |
RL Unplugged: Benchmarks for Offline Reinforcement Learning | | | [ ] [ ] |
Benchmarking Batch Deep Reinforcement Learning Algorithms | | | |
awesome-offline-rl / Papers / Offline RL: Applications |
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning | | | |
P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer | | | |
Online Symbolic Music Alignment with Offline Reinforcement Learning | | | |
Advancing RAN Slicing with Offline Reinforcement Learning | | | |
Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach | | | |
Self-Driving Telescopes: Autonomous Scheduling of Astronomical Observation Campaigns with Offline Reinforcement Learning | | | |
A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning | | | |
Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets | | | |
STEER: Unified Style Transfer with Expert Reinforcement | | | |
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations | | | |
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning | | | |
Offline Reinforcement Learning for Optimizing Production Bidding Policies | | | |
End-to-end Offline Reinforcement Learning for Glycemia Control | | | |
Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments | | | |
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach | | | |
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments | | | |
Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills | | | |
Robotic Offline RL from Internet Videos via Value-Function Pre-Training | | | |
VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning | | | |
RLSynC: Offline-Online Reinforcement Learning for Synthon Completion | | | |
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World | | | |
Reinforced Self-Training (ReST) for Language Modeling | | | |
Aligning Language Models with Offline Reinforcement Learning from Human Feedback | | | |
Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation | | | |
Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills | | | |
Improving Offline RL by Blending Heuristics | | | |
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control | | | |
Robust Reinforcement Learning Objectives for Sequential Recommender Systems | | | |
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning | | | |
PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning | | | |
Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure | | | |
Offline Experience Replay for Continual Offline Reinforcement Learning | | | |
Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning | | | |
Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning | | | |
User Retention-oriented Recommendation with Decision Transformer | | | |
Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning | | | |
INVICTUS: Optimizing Boolean Logic Circuit Synthesis via Synergistic Learning and Search | | | |
Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings | | | |
Winning Solution of Real Robot Challenge III | | | |
Learning-based MPC from Big Data Using Reinforcement Learning | | | |
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management | | | |
Beyond Reward: Offline Preference-guided Policy Optimization | | | |
DevFormer: A Symmetric Transformer for Context-Aware Device Placement | | | |
On the Effectiveness of Offline RL for Dialogue Response Generation | | | |
Bidirectional Learning for Offline Model-based Biological Sequence Design | | | |
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer | | | |
Semi-Offline Reinforcement Learning for Optimized Text Generation | | | |
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement | | | |
Offline RL for Natural Language Generation with Implicit Language Q Learning | | | |
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning | | | |
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning | | | |
Dialog Action-Aware Transformer for Dialog Policy Learning | | | |
Can Offline Reinforcement Learning Help Natural Language Understanding? | | | |
NeurIPS 2022 Competition: Driving SMARTS | | | |
Controlling Commercial Cooling Systems Using Reinforcement Learning | | | |
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials | | | [ ] |
Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning | | | |
Learning-to-defer for sequential medical decision-making under uncertainty | | | |
Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios | | | |
Dialogue Evaluation with Offline Reinforcement Learning | | | |
Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems | | | |
A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning | | | |
BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion | | | |
Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space | | | |
Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective | | | |
ARLO: A Framework for Automated Reinforcement Learning | | | |
A Reinforcement Learning-based Volt-VAR Control Dataset and Testing Environment | | | |
CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning | | | |
Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes | | | [ ] |
CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System | | | [ ] |
A Conservative Q-Learning approach for handling distribution shift in sepsis treatment strategies | | | |
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning | | | |
Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit | | | |
Offline Reinforcement Learning for Mobile Notifications | | | |
Offline Reinforcement Learning for Road Traffic Control | | | |
Sustainable Online Reinforcement Learning for Auto-bidding | | | |
Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare | | | |
Multi-objective Optimization of Notifications Using Offline Reinforcement Learning | | | |
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning | | | |
GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems | | | |
Offline Reinforcement Learning for Visual Navigation | | | |
Semi-Markov Offline Reinforcement Learning for Healthcare | | | |
Automate Page Layout Optimization: An Offline Deep Q-Learning Approach | | | |
RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System | | | [ ] [ ] |
Compressive Features in Offline Reinforcement Learning for Recommender Systems | | | |
Causal-aware Safe Policy Improvement for Task-oriented dialogue | | | |
Offline Contextual Bandits for Wireless Network Optimization | | | |
Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment | | | |
Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement | | | |
Medical Dead-ends and Learning to Identify High-risk States and Treatments | | | |
An Offline Deep Reinforcement Learning for Maintenance Decision-Making | | | |
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation | | | |
Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs | | | |
Offline reinforcement learning with uncertainty for treatment strategies in sepsis | | | |
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL | | | |
Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles | | | |
pH-RL: A personalization architecture to bring reinforcement learning to health practice | | | |
DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning | | | [ ] |
Personalization for Web-based Services using Offline Reinforcement Learning | | | |
BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market | | | |
Safe Driving via Expert Guided Policy Optimization | | | [ ] [ ] |
A General Offline Reinforcement Learning Framework for Interactive Recommendation | | | |
Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms | | | |
Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning | | | |
Learning robust driving policies without online exploration | | | |
Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation | | | |
Network Intrusion Detection Based on Extended RBF Neural Network With Offline Reinforcement Learning | | | |
Towards Accelerating Offline RL based Recommender Systems | | | |
Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation | | | |
Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation | | | |
An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare | | | |
Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP | | | |
Remote Electrical Tilt Optimization via Safe Reinforcement Learning | | | |
An Optimistic Perspective on Offline Reinforcement Learning | | | [ ] [ ] |
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning | | | |
Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation | | | |
Human-centric Dialog Training via Offline Reinforcement Learning | | | |
Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning | | | |
Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning | | | |
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog | | | |
Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning | | | |
A Clustering-Based Reinforcement Learning Approach for Tailored Personalization of E-Health Interventions | | | |
Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming | | | |
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient | | | |
Batch Reinforcement Learning on the Industrial Benchmark: First Experiences | | | |
Policy Networks with Two-Stage Training for Dialogue Systems | | | |
Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning | | | |
awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Theory/Methods |
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction | | | |
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits | | | |
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling | | | |
Multiply Robust Off-policy Evaluation and Learning under Truncation by Death | | | |
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior | | | |
Policy-Adaptive Estimator Selection for Off-Policy Evaluation | | | |
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits | | | |
Offline Policy Evaluation in Large Action Spaces via Outcome-Oriented Action Grouping | | | |
Off-Policy Evaluation for Large Action Spaces via Policy Convolution | | | |
Distributional Off-Policy Evaluation for Slate Recommendations | | | |
Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits | | | |
Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces | | | |
Offline Policy Evaluation with Out-of-Sample Guarantees | | | |
Quantile Off-Policy Evaluation via Deep Conditional Generative Learning | | | |
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model | | | [ ] |
Off-Policy Evaluation for Large Action Spaces via Embeddings | | | [ ] [ ] |
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning | | | |
Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions | | | |
Conformal Off-Policy Prediction in Contextual Bandits | | | |
Off-Policy Evaluation with Policy-Dependent Optimization Response | | | |
Off-Policy Evaluation with Deficient Support Using Side Information | | | |
Towards Robust Off-Policy Evaluation via Human Inputs | | | |
Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model | | | |
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation | | | |
Anytime-valid off-policy inference for contextual bandits | | | |
Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency | | | |
Off-Policy Evaluation in Embedded Spaces | | | |
Safe Exploration for Efficient Policy Evaluation and Comparison | | | |
Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias | | | |
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning | | | |
Control Variates for Slate Off-Policy Evaluation | | | |
Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings | | | |
Optimal Off-Policy Evaluation from Multiple Logging Policies | | | [ ] |
Off-policy Confidence Sequences | | | |
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting | | | [ ] |
Off-Policy Evaluation Using Information Borrowing and Context-Based Switching | | | |
Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation | | | |
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation | | | |
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits | | | |
Off-Policy Risk Assessment in Contextual Bandits | | | |
Off-Policy Evaluation of Slate Policies under Bayes Risk | | | |
A Practical Guide of Off-Policy Evaluation for Bandit Problems | | | |
Off-Policy Evaluation and Learning for External Validity under a Covariate Shift | | | |
Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions | | | |
Doubly robust off-policy evaluation with shrinkage | | | |
Adaptive Estimator Selection for Off-Policy Evaluation | | | [ ] |
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits | | | |
Improving Offline Contextual Bandits with Distributional Robustness | | | |
Balanced Off-Policy Evaluation in General Action Spaces | | | |
Policy Evaluation with Latent Confounders via Optimal Balance | | | |
On the Design of Estimators for Bandit Off-Policy Evaluation | | | |
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning | | | |
Focused Context Balancing for Robust Offline Policy Evaluation | | | |
When People Change their Mind: Off-Policy Evaluation in Non-Stationary Recommendation Environments | | | |
Policy Evaluation and Optimization with Continuous Treatments | | | |
Confounding-Robust Policy Improvement | | | |
Balanced Policy Evaluation and Learning | | | |
Offline Evaluation of Ranking Policies with Click Models | | | |
Effective Evaluation using Logged Bandit Feedback from Multiple Loggers | | | |
Off-policy Evaluation for Slate Recommendation | | | |
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits | | | |
Data-Efficient Policy Evaluation Through Behavior Policy Search | | | |
Doubly Robust Policy Evaluation and Optimization | | | |
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms | | | |
Distributional Off-policy Evaluation with Bellman Residual Minimization | | | |
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs | | | |
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits | | | |
State-Action Similarity-Based Representations for Off-Policy Evaluation | | | |
Off-Policy Evaluation for Human Feedback | | | |
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation | | | |
An Instrumental Variable Approach to Confounded Off-Policy Evaluation | | | |
Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes | | | |
Distributional Offline Policy Evaluation with Predictive Error Guarantees | | | |
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation | | | |
Revisiting Bellman Errors for Offline Model Selection | | | [ ] |
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction | | | |
Variational Latent Branching Model for Off-Policy Evaluation | | | |
Multiple-policy High-confidence Policy Evaluation | | | |
Off-Policy Evaluation with Online Adaptation for Robot Exploration in Challenging Environments | | | |
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation | | | |
Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards | | | |
When is Offline Policy Selection Sample Efficient for Reinforcement Learning? | | | |
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks | | | |
Evaluation of Active Feature Acquisition Methods for Static Feature Settings | | | |
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework | | | |
Marginalized Importance Sampling for Off-Environment Policy Evaluation | | | |
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning | | | |
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments | | | |
Off-policy Evaluation in Doubly Inhomogeneous Environments | | | |
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data | | | |
π2vec : Policy Representations with Successor Features | | | |
Conformal Off-Policy Evaluation in Markov Decision Processes | | | |
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation | | | |
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders | | | |
Minimax Weight Learning for Absorbing MDPs | | | |
Improving Monte Carlo Evaluation with Offline Data | | | |
First-order Policy Optimization for Robust Policy Evaluation | | | |
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes | | | |
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation | | | |
Learning Bellman Complete Representations for Offline Policy Evaluation | | | |
Supervised Off-Policy Ranking | | | |
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory | | | |
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions | | | |
Oracle Inequalities for Model Selection in Offline Reinforcement Learning | | | |
Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models | | | |
Off-Policy Evaluation for Action-Dependent Non-stationary Environments | | | |
Stateful Offline Contextual Policy Evaluation and Learning | | | |
Off-Policy Risk Assessment for Markov Decision Processes | | | |
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information | | | |
Offline Policy Evaluation and Optimization under Confounding | | | |
Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies | | | |
Safe Evaluation For Offline Learning: Are We Ready To Deploy? | | | |
Low Variance Off-policy Evaluation with State-based Importance Sampling | | | |
Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach | | | |
Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency | | | |
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks | | | |
A Sharp Characterization of Linear Estimators for Offline Policy Evaluation | | | |
A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets | | | [ ] |
A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation | | | |
SOPE: Spectrum of Off-Policy Estimators | | | |
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation | | | |
Variance-Aware Off-Policy Evaluation with Linear Function Approximation | | | |
Universal Off-Policy Evaluation | | | |
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning | | | |
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings | | | |
State Relevance for Off-Policy Evaluation | | | |
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference | | | |
Deeply-Debiased Off-Policy Interval Estimation | | | |
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization | | | |
Minimax Model Learning | | | |
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders | | | |
High-Confidence Off-Policy (or Counterfactual) Variance Estimation | | | |
Debiased Off-Policy Evaluation for Recommendation Systems | | | |
Pessimistic Model Selection for Offline Deep Reinforcement Learning | | | |
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes | | | |
Off-Policy Evaluation in Partially Observed Markov Decision Processes | | | |
A Spectral Approach to Off-Policy Evaluation for POMDPs | | | |
Projected State-action Balancing Weights for Offline Reinforcement Learning | | | s |
Active Offline Policy Selection | | | |
On Instrumental Variable Regression for Deep Offline Policy Evaluation | | | |
Average-Reward Off-Policy Policy Evaluation with Function Approximation | | | |
Sequential causal inference in a single world of connected units | | | |
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding | | | |
CoinDICE: Off-Policy Confidence Interval Estimation | | | |
Off-Policy Interval Estimation with Lipschitz Value Iteration | | | |
Off-Policy Evaluation via the Regularized Lagrangian | | | |
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization | | | |
GenDICE: Generalized Offline Estimation of Stationary Values | | | |
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies | | | |
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation | | | |
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning | | | |
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values | | | |
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation | | | |
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions | | | |
Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation | | | |
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling | | | |
Minimax Weight and Q-Function Learning for Off-Policy Evaluation | | | |
Accountable Off-Policy Evaluation With Kernel Bellman Statistics | | | |
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning | | | |
Batch Stationary Distribution Estimation | | | |
Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control | | | [ ] |
Defining Admissible Rewards for High Confidence Policy Evaluation in Batch Reinforcement Learning | | | |
Offline Policy Selection under Uncertainty | | | |
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning | | | |
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies | | | |
Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning | | | |
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation | | | |
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning | | | |
Off-Policy Evaluation in Partially Observable Environments | | | |
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning | | | |
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling | | | |
Off-Policy Evaluation via Off-Policy Classification | | | |
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections | | | [ ] |
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy | | | |
Batch Policy Learning under Constraints | | | [ ] [ ] |
More Efficient Off-Policy Evaluation through Regularized Targeted Learning | | | |
Combining parametric and nonparametric models for off-policy evaluation | | | |
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models | | | |
Importance Sampling Policy Evaluation with an Estimated Behavior Policy | | | |
Representation Balancing MDPs for Off-policy Policy Evaluation | | | |
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation | | | |
More Robust Doubly Robust Off-policy Evaluation | | | |
Importance Sampling for Fair Policy Selection | | | |
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing | | | |
Consistent On-Line Off-Policy Evaluation | | | |
Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation | | | |
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning | | | |
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning | | | |
High Confidence Policy Improvement | | | |
High Confidence Off-Policy Evaluation | | | |
Eligibility Traces for Off-Policy Policy Evaluation | | | |
Sequential Counterfactual Risk Minimization | | | |
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning | | | |
Multi-Task Off-Policy Learning from Bandit Feedback | | | |
Exponential Smoothing for Off-Policy Learning | | | |
Counterfactual Learning with General Data-generating Policies | | | |
Distributionally Robust Policy Gradient for Offline Contextual Bandits | | | |
Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits | | | |
Pessimistic Off-Policy Multi-Objective Optimization | | | |
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective | | | |
Uncertainty-Aware Off-Policy Learning | | | |
Fair Off-Policy Learning from Observational Data | | | |
Interpretable Off-Policy Learning via Hyperbox Search | | | |
Offline Policy Optimization with Eligible Actions | | | |
Towards Robust Off-policy Learning for Runtime Uncertainty | | | |
Safe Optimal Design with Applications in Off-Policy Learning | | | |
Off-Policy Actor-critic for Recommender Systems | | | |
MGPolicy: Meta Graph Enhanced Off-policy Learning for Recommendations | | | |
Distributionally Robust Policy Learning with Wasserstein Distance | | | |
Local Policy Improvement for Recommender Systems | | | |
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality | | | |
Fast Offline Policy Optimization for Large Scale Recommendation | | | |
Practical Counterfactual Policy Learning for Top-K Recommendations | | | |
Boosted Off-Policy Learning | | | |
Semi-Counterfactual Risk Minimization Via Neural Networks | | | |
IMO^3: Interactive Multi-Objective Off-Policy Optimization | | | |
Pessimistic Off-Policy Optimization for Learning to Rank | | | |
Non-Stationary Off-Policy Optimization | | | |
Learning from eXtreme Bandit Feedback | | | |
Generalizing Off-Policy Learning under Sample Selection Bias | | | |
Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values | | | |
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies | | | |
From Importance Sampling to Doubly Robust Policy Gradient | | | |
Efficient Policy Learning from Surrogate-Loss Classification Reductions | | | [ ] |
Off-policy Bandits with Deficient Support | | | |
Off-policy Learning in Two-stage Recommender Systems | | | |
More Efficient Policy Learning via Optimal Retargeting | | | |
Learning When-to-Treat Policies | | | |
Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks | | | |
Bandit Overfitting in Offline Policy Learning | | | |
Counterfactual Learning of Continuous Stochastic Policies | | | |
Top-K Off-Policy Correction for a REINFORCE Recommender System | | | |
Semi-Parametric Efficient Policy Learning with Continuous Actions | | | |
Efficient Counterfactual Learning from Bandit Feedback | | | |
Deep Learning with Logged Bandit Feedback | | | |
The Self-Normalized Estimator for Counterfactual Learning | | | |
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback | | | |
awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Benchmarks/Experiments |
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation | | | |
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation | | | |
Offline Policy Comparison with Confidence: Benchmarks and Baselines | | | |
Extending Open Bandit Pipeline to Simulate Industry Challenges | | | |
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation | | | [ ] [ ] |
Evaluating the Robustness of Off-Policy Evaluation | | | [ ] |
Benchmarks for Deep Off-Policy Evaluation | | | [ ] |
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning | | | [ ] |
awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Applications |
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare | | | |
When is Off-Policy Evaluation Useful? A Data-Centric Perspective | | | |
Counterfactual Evaluation of Peer-Review Assignment Policies | | | |
Balanced Off-Policy Evaluation for Personalized Pricing | | | |
Multi-Action Dialog Policy Learning from Logged User Feedback | | | |
CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong | | | |
Reward Shaping for User Satisfaction in a REINFORCE Recommender | | | |
Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service | | | |
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach | | | |
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings | | | |
Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling | | | |
Offline Evaluation to Make Decisions About Playlist Recommendation | | | |
Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters | | | |
Evaluating Reinforcement Learning Algorithms in Observational Health Settings | | | |
Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems | | | |
Offline A/B testing for Recommender Systems | | | |
Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback | | | |
Handling Confounding for Realistic Off-Policy Evaluation | | | |
Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising | | | |
awesome-offline-rl / Open Source Software/Implementations |
SCOPE-RL: A Python library for offline reinforcement learning, off-policy evaluation, and selection | 114 | 8 months ago | [ ] [ ] [ ] |
Open Bandit Pipeline: a research framework for bandit algorithms and off-policy evaluation | 645 | 6 months ago | [ ] [ ] [ ] |
pyIEOE: Towards An Interpretable Evaluation for Offline Evaluation | 31 | about 3 years ago | [ ] |
d3rlpy: An Offline Deep Reinforcement Learning Library | 1,327 | 13 days ago | [ ] [ ] [ ] |
MINERVA: An out-of-the-box GUI tool for data-driven deep reinforcement learning | 95 | over 3 years ago | [ ] [ ] |
Minari | 294 | 14 days ago | |
CORL: Clean Offline Reinforcement Learning | 482 | 10 months ago | [ ] |
COBS: Caltech OPE Benchmarking Suite | 61 | over 2 years ago | [ ] |
Benchmarks for Deep Off-Policy Evaluation | 85 | 4 months ago | [ ] |
DICE: The DIstribution Correction Estimation Library | 99 | 4 months ago | [ ] |
RL Unplugged: Benchmarks for Offline Reinforcement Learning | 13,250 | 26 days ago | [ ] [ ] |
D4RL: Datasets for Deep Data-Driven Reinforcement Learning | 1,346 | 18 days ago | [ ] [ ] |
V-D4RL: Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | 95 | 6 months ago | [ } |
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | 17 | 10 months ago | [ ] |
RLDS: Reinforcement Learning Datasets | 293 | about 2 months ago | [ ] |
OEF: Offline Equilibrium Finding | 3 | over 2 years ago | [ ] |
ExORL: Exploratory Data for Offline Reinforcement Learning | 105 | almost 3 years ago | [ ] |
RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System | 220 | 10 months ago | [ ] ] |
NeoRL: Near Real-World Benchmarks for Offline Reinforcement Learning | | | [ ] [ ] |
The Industrial Benchmark Offline RL Datasets | 126 | over 1 year ago | [ ] |
ARLO: A Framework for Automated Reinforcement Learning | 10 | over 2 years ago | [ ] |
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising | 467 | over 3 years ago | [ ] |
MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces | 51 | 10 months ago | [ ] [ ] |
A Reinforcement Learning-based Volt-VAR Control Dataset | 20 | over 2 years ago | [ ] |
awesome-offline-rl / Blog/Podcast / Blog |
Counterfactual Evaluation for Recommendation Systems | | | |
Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications | | | |
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | | | |
D4RL: Building Better Benchmarks for Offline Reinforcement Learning | | | |
Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning? | | | |
Tackling Open Challenges in Offline Reinforcement Learning | | | |
An Optimistic Perspective on Offline Reinforcement Learning | | | |
Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning | | | |
Introducing completely free datasets for data-driven deep reinforcement learning | | | |
Offline (Batch) Reinforcement Learning: A Review of Literature and Applications | | | |
Data-Driven Deep Reinforcement Learning | | | |
awesome-offline-rl / Blog/Podcast / Podcast |
AI Trends 2023: Reinforcement Learning – RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine | | | |
Bandits and Simulators for Recommenders with Olivier Jeunen | | | |
Sergey Levine on Robot Learning & Offline RL | | | |
Off-Line, Off-Policy RL for Real-World Decision Making at Facebook | | | |
Xianyuan Zhan | TalkRL: The Reinforcement Learning Podcast | | | |
MOReL: Model-Based Offline Reinforcement Learning with Aravind Rajeswaran | | | |
Trends in Reinforcement Learning with Chelsea Finn | | | |
Nan Jiang | TalkRL: The Reinforcement Learning Podcast | | | |
Scott Fujimoto | TalkRL: The Reinforcement Learning Podcast | | | |
|
CONSEQUENCES (RecSys 2023) | | | |
Offline Reinforcement Learning (NeurIPS 2022) | | | |
Reinforcement Learning for Real Life (NeurIPS 2022) | | | |
CONSEQUENCES + REVEAL (RecSys 2022) | | | |
Offline Reinforcement Learning (NeurIPS 2021) | | | |
Reinforcement Learning for Real Life (ICML 2021) | | | |
Reinforcement Learning Day 2021 | | | |
Offline Reinforcement Learning (NeurIPS 2020) | | | |
Reinforcement Learning from Batch Data and Simulation | | | |
Reinforcement Learning for Real Life (RL4RealLife 2020) | | | |
Safety and Robustness in Decision Making (NeurIPS 2019) | | | |
Reinforcement Learning for Real Life (ICML 2019) | | | |
Real-world Sequential Decision Making (ICML 2019) | | | |
awesome-offline-rl / Tutorials/Talks/Lectures |
Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs | | | |
Counterfactual Evaluation and Learning for Interactive Systems | | | |
Representation Learning for Online and Offline RL in Low-rank MDPs | | | |
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation | | | |
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment | | | |
Deep Reinforcement Learning with Real-World Data | | | |
Planning with Reinforcement Learning | | | |
Imitation learning vs. offline reinforcement learning | | | |
Tutorial on the Foundations of Offline Reinforcement Learning | | | |
Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances | | | [ ] |
Offline Reinforcement Learning | | | |
Offline Reinforcement Learning | | | |
Fast Rates for the Regret of Offline Reinforcement Learning | | | |
Bellman-consistent Pessimism for Offline Reinforcement Learning | | | |
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage | | | |
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism | | | |
Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm | | | |
Is Pessimism Provably Efficient for Offline RL? | | | |
Adaptive Estimator Selection for Off-Policy Evaluation | | | |
What are the Statistical Limits of Offline RL with Linear Function Approximation? | | | |
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL | | | |
A Gentle Introduction to Offline Reinforcement Learning | | | |
Principles for Tackling Distribution Shift: Pessimism, Adaptation, and Anticipation | | | |
Offline Reinforcement Learning: Incorporating Knowledge from Data into RL | | | |
Offline RL | | | |
Learning a Multi-Agent Simulator from Offline Demonstrations | | | |
Towards Reliable Validation and Evaluation for Offline RL | | | |
Batch RL Models Built for Validation | | | |
Offline Reinforcement Learning: From Algorithms to Practical Challenges | | | |
Data Scalability for Robot Learning | | | |
Statistically Efficient Offline Reinforcement Learning | | | |
Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning | | | |
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation | | | |
Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry | | | |
Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings | | | |
Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning | | | |
Scaling Probabilistically Safe Learning to Robotics | | | |
Deep Reinforcement Learning in the Real World | | | |