awesome-offline-rl

RL benchmarks

An index of research papers and reviews on offline reinforcement learning algorithms

An index of algorithms for offline reinforcement learning (offline-rl)

GitHub

931 stars
45 watching
87 forks
last commit: 6 months ago
Linked from 3 awesome lists

awesomeawesome-listoff-policy-evaluationoffline-rlreinforcement-learningresearch

awesome-offline-rl

Haruka Kiyohara (Cornell University)
Yuta Saito (Hanjuku-kaso Co., Ltd. / Cornell University)

awesome-offline-rl / Table of Contents

Papers 931 6 months ago

awesome-offline-rl / Table of Contents / Papers

Review/Survey/Position Papers 931 6 months ago

awesome-offline-rl / Table of Contents / Papers / Review/Survey/Position Papers

Offline RL 931 6 months ago
Off-Policy Evaluation and Learning 931 6 months ago
Related Reviews 931 6 months ago

awesome-offline-rl / Table of Contents / Papers

Offline RL: Theory/Methods 931 6 months ago
Offline RL: Benchmarks/Experiments 931 6 months ago
Offline RL: Applications 931 6 months ago
Off-Policy Evaluation and Learning: Theory/Methods 931 6 months ago

awesome-offline-rl / Table of Contents / Papers / Off-Policy Evaluation and Learning: Theory/Methods

Off-Policy Evaluation: Contextual Bandits 931 6 months ago
Off-Policy Evaluation: Reinforcement Learning 931 6 months ago
Off-Policy Learning 931 6 months ago

awesome-offline-rl / Table of Contents / Papers

Off-Policy Evaluation and Learning: Benchmarks/Experiments 931 6 months ago
Off-Policy Evaluation and Learning: Applications 931 6 months ago

awesome-offline-rl / Table of Contents

Open Source Software/Implementations 931 6 months ago
Blog/Podcast 931 6 months ago

awesome-offline-rl / Table of Contents / Blog/Podcast

Blog 931 6 months ago
Podcast 931 6 months ago

awesome-offline-rl / Table of Contents

Related Workshops 931 6 months ago
Tutorials/Talks/Lectures 931 6 months ago

awesome-offline-rl / Papers / Review/Survey/Position Papers

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
A Survey on Offline Model-Based Reinforcement Learning
Foundation Models for Decision Making: Problems, Methods, and Opportunities
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
A Review of Off-Policy Evaluation in Reinforcement Learning
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization
Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives
A Survey on Transformers in Reinforcement Learning
Deep Reinforcement Learning: Opportunities and Challenges
A Survey on Model-based Reinforcement Learning
Survey on Fair Reinforcement Learning: Theory and Practice
Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
A Survey of Generalisation in Deep Reinforcement Learning

awesome-offline-rl / Papers / Offline RL: Theory/Methods

Value-Aided Conditional Supervised Learning for Offline RL
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
Context-Former: Stitching via Latent Conditioned Sequence Modeling
Adversarially Trained Actor Critic for offline CMDPs
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
Solving Continual Offline Reinforcement Learning with Decision Transformer
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Reframing Offline Reinforcement Learning as a Regression Problem
Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
Policy-regularized Offline Multi-objective Reinforcement Learning
Differentiable Tree Search in Latent State Space
Learning from Sparse Offline Datasets via Conservative Density Estimation
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning
Critic-Guided Decision Transformer for Offline Reinforcement Learning
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
The Generalization Gap in Offline Reinforcement Learning
Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning
Hierarchical Decision Transformer
Prompt-Tuning Decision Transformer with Preference Ranking
Context Shift Reduction for Offline Meta-Reinforcement Learning
Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
Score Models for Offline Goal-Conditioned Reinforcement Learning
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
SERA: Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
CROP: Conservative Reward for Model-based Offline Policy Optimization
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Boosting Continuous Control with Consistency Policy
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
Self-Confirming Transformer for Locally Consistent Online Adaptation in Multi-Agent Reinforcement Learning
Learning to Reach Goals via Diffusion
Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making
Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
Robust Offline Reinforcement Learning -- Certify the Confidence Interval
Stackelberg Batch Policy Learning
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning
Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance
Multi-Objective Decision Transformers for Offline Reinforcement Learning
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations
PASTA: Pretrained Action-State Transformer Agents
Towards A Unified Agent with Foundation Models
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning
Offline Reinforcement Learning with Imbalanced Datasets
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
Elastic Decision Transformer
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
Is RLHF More Difficult than Standard RL?
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery
CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach
Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration
In-Sample Policy Iteration for Offline Reinforcement Learning
Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning
Offline Prioritized Experience Replay
Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
MADiff: Offline Multi-agent Learning with Diffusion Models
Provable Offline Reinforcement Learning with Human Feedback
Think Before You Act: Decision Transformers with Internal Working Memory
Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning
Offline Primal-Dual Reinforcement Learning for Linear MDPs
Federated Offline Policy Learning with Heterogeneous Observational Data
Offline Reinforcement Learning with Additional Covering Distributions
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning
Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems
Federated Ensemble-Directed Offline Reinforcement Learning
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments
Reinforcement Learning from Passive Data via Latent Intentions [ ]
Uncertainty-driven Trajectory Truncation for Model-based Offline Reinforcement Learning
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Batch Quantum Reinforcement Learning
Accelerating exploration and representation learning with offline pre-training
On Context Distribution Shift in Task Representation Learning for Offline Meta RL
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
Learning Excavation of Rigid Objects with Offline Reinforcement Learning
Goal-conditioned Offline Reinforcement Learning through State Space Partitioning
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies
Deploying Offline Reinforcement Learning with Human Feedback
Synthetic Experience Replay
ENTROPY: Environment Transformer and Offline Policy Optimization
Graph Decision Transformer
Selective Uncertainty Propagation in Offline RL
Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning
Skill Decision Transformer
Guiding Online Reinforcement Learning with Action-Free Offline Pretraining
SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning
APAC: Authorized Probability-controlled Actor-Critic For Offline Reinforcement Learning
Designing an offline reinforcement learning objective from scratch
Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning
Learning to View: Decision Transformers for Active Object Detection
Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
Contextual Conservative Q-Learning for Offline Reinforcement Learning
Offline Policy Optimization in RL with Variance Regularizaton
Transformer in Transformer as Backbone for Deep Reinforcement Learning
SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
Supported Value Regularization for Offline Reinforcement Learning
Conservative State Value Estimation for Offline Reinforcement Learning
Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
Adversarial Model for Offline Reinforcement Learning
Percentile Criterion Optimization in Offline Reinforcement Learning
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
HIQL: Offline Goal-Conditioned RL with Latent States as Actions
Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning
Offline RL with Discrete Proxy Representations for Generalizability in POMDPs
Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization
Bi-Level Offline Policy Optimization with Limited Exploration
Provably (More) Sample-Efficient Offline RL with Options
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
Budgeting Counterfactual for Offline RL
Efficient Diffusion Policies for Offline Reinforcement Learning
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling and Beyond
Conservative Offline Policy Adaptation in Multi-Agent Games
Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Survival Instinct in Offline Reinforcement Learning
Learning from Visual Observation via Offline Pretrained State-to-Go Transformer
Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization
Learning to Influence Human Behavior with Offline Reinforcement Learning
Residual Q-Learning: Offline and Online Policy Customization without Value
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Learning to Modulate pre-trained Models in RL
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
Mutual Information Regularized Offline Reinforcement Learning
Offline RL With Heteroskedastic Datasets and Support Constraints
Offline Reinforcement Learning with Differential Privacy
Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples
Reining Generalization in Offline Reinforcement Learning via Representation Distinction
VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
Hierarchical Diffusion for Offline Decision Making
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
Safe Offline Reinforcement Learning with Real-Time Budget Constraints
Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints
A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning
Anti-Exploration by Random Network Distillation
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
PASTA: Pessimistic Assortment Optimization
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
Supported Trust Region Optimization for Offline Reinforcement Learning
Principled Offline RL in the Presence of Rich Exogenous Information
Efficient Online Reinforcement Learning with Offline Data
Boosting Offline Reinforcement Learning with Action Preference Query
Model-based Offline Reinforcement Learning with Count-based Conservatism
Constrained Decision Transformer for Offline Safe Reinforcement Learning
Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
Distance Weighted Supervised Learning for Offline Interaction Data
Masked Trajectory Models for Prediction, Representation, and Control
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Future-conditioned Unsupervised Pretraining for Decision Transformer
PAC-Bayesian Offline Contextual Bandits With Guarantees
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
Jump-Start Reinforcement Learning [ ]
Learning Temporally AbstractWorld Models without Online Experimentation
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback
Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation
Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
Actor-Critic Alignment for Offline-to-Online Reinforcement Learning
Leveraging Offline Data in Online Reinforcement Learning
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
Offline Learning in Markov Games with General Function Approximation
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL
Confidence-Conditioned Value Functions for Offline Reinforcement Learning
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes [ ]
Is Conditional Generative Modeling all you need for Decision-Making? [ ]
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
Extreme Q-Learning: MaxEnt RL without Entropy
Dichotomy of Control: Separating What You Can Control from What You Cannot
From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
The In-Sample Softmax for Offline Reinforcement Learning
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training [ ] [ ]
Does Zero-Shot Reinforcement Learning Exist?
Behavior Prior Representation learning for Offline Reinforcement Learning
Mind the Gap: Offline Policy Optimization for Imperfect Rewards
Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement
User-Interactive Offline Reinforcement Learning
Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient [ ]
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
Efficient Offline Policy Optimization with a Learned Model
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning
In-sample Actor Critic for Offline Reinforcement Learning
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Hyper-Decision Transformer for Efficient Online Policy Adaptation
Efficient Planning in a Compact Latent Action Space
Preference Transformer: Modeling Human Preferences using Transformers for RL [ ]
Behavior Proximal Policy Optimization
Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards
The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning
Decision Transformer under Random Frame Dropping
Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
Finetuning Offline World Models in the Real World
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
Safe Policy Improvement for POMDPs via Finite-State Controllers
Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Contrastive Example-Based Control
Curriculum Offline Reinforcement Learning
Offline Reinforcement Learning with On-Policy Q-Function Regularization
Model-based Offline Policy Optimization with Adversarial Network
Efficient experience replay architecture for offline reinforcement learning
Automatic Trade-off Adaptation in Offline RL
Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling
Latent Variable Representation for Reinforcement Learning
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning
State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning
Masked Autoencoding for Scalable and Generalizable Decision Making
Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
Model-based Trajectory Stitching for Improved Offline Reinforcement Learning
Offline Reinforcement Learning with Adaptive Behavior Regularization
Contextual Transformer for Offline Meta Reinforcement Learning
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
Contrastive Value Learning: Implicit Models for Simple Offline RL
Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information
Provable Safe Reinforcement Learning with Binary Feedback
Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision
Implicit Offline Reinforcement Learning via Supervised Learning
Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation
Boosting Offline Reinforcement Learning via Data Rebalancing
ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning [ ]
State Advantage Weighting for Offline RL
Blessing from Experts: Super Reinforcement Learning in Confounded Environments
DCE: Offline Reinforcement Learning With Double Conservative Estimates
On the Opportunities and Challenges of using Animals Videos in Reinforcement Learning
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes
Exploiting Reward Shifting in Value-Based Deep RL
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation
C^2:Co-design of Robots via Concurrent Networks Coupling Online and Offline Reinforcement Learning
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
AdaCat: Adaptive Categorical Discretization for Autoregressive Models
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning
Offline Reinforcement Learning at Multiple Frequencies [ ]
General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States
Behavior Transformers: Cloning k modes with one stone
Contrastive Learning as Goal-Conditioned Reinforcement Learning
Federated Offline Reinforcement Learning
Provable Benefit of Multitask Representation Learning in Reinforcement Learning
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward
Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games
Offline Reinforcement Learning with Causal Structured World Models
Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning
Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL
Byzantine-Robust Online and Offline Distributed Reinforcement Learning
Model Generation with Provable Coverability for Offline Reinforcement Learning
You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments
Multi-Game Decision Transformers
Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning
Distance-Sensitive Offline Reinforcement Learning
No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL
How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
Offline Visual Representation Learning for Embodied Navigation
Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers
BATS: Best Action Trajectory Stitching
Settling the Sample Complexity of Model-Based Offline Reinforcement Learning
PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps
Meta Reinforcement Learning for Adaptive Control: An Offline Approach
The Efficacy of Pessimism in Asynchronous Q-Learning
Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation
A Regularized Implicit Policy for Offline Reinforcement Learning
Reinforcement Learning in Possibly Nonstationary Environments [ ]
Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
Retrieval-Augmented Reinforcement Learning
Online Decision Transformer
Transferred Q-learning
Settling the Communication Complexity for Distributed Offline Reinforcement Learning
Offline Reinforcement Learning with Realizability and Single-policy Concentrability
Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL
Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning
Can Wikipedia Help Offline Reinforcement Learning?
MOORe: Model-based Offline-to-Online Reinforcement Learning
Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning
Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning
Single-Shot Pruning for Offline Reinforcement Learning
Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations [ ] [ ]
Data-Driven Offline Decision-Making via Invariant Representation Learning
Bellman Residual Orthogonalization for Offline Reinforcement Learning
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
On Gap-dependent Bounds for Offline Reinforcement Learning
Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
Supported Policy Optimization for Offline Reinforcement Learning
When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters
When does return-conditioned supervised learning work for offline reinforcement learning?
Pessimism for Offline Linear Contextual Bandits using ℓp Confidence Sets
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
When is Offline Two-Player Zero-Sum Markov Game Solvable?
Robust Reinforcement Learning using Offline Data
Bidirectional Learning for Offline Infinite-width Model-based Optimization
Mildly Conservative Q-Learning for Offline Reinforcement Learning
Bootstrapped Transformer for Offline Reinforcement Learning
LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression
Dual Generator Offline Reinforcement Learning
MoCoDA: Model-based Counterfactual Data Augmentation
A Policy-Guided Imitation Approach for Offline Reinforcement Learning [ ]
A Unified Framework for Alternating Offline Model Training and Policy Learning
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
ASPiRe:Adaptive Skill Priors for Reinforcement Learning
Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning
Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
Offline RL Policies Should be Trained to be Adaptive
Adversarially Trained Actor Critic for Offline Reinforcement Learning
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
How to Leverage Unlabeled Data in Offline Reinforcement Learning
Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification
Learning Pseudometric-based Action Representations for Offline Reinforcement Learning
Offline Meta-Reinforcement Learning with Online Self-Supervision
Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching
Constrained Offline Policy Optimization
Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations
Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Prompting Decision Transformer for Few-Shot Policy Generalization
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
On the Role of Discount Factor in Offline Reinforcement Learning
Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
Representation Learning for Online and Offline RL in Low-rank MDPs [ ]
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage [ ]
Revisiting Design Choices in Model-Based Offline Reinforcement Learning
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Planning in Stochastic Environments with a Learned Model
Offline Reinforcement Learning with Value-based Episodic Memory
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Learning Value Functions from Undirected State-only Experience [ ] [ ]
Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL
Offline Reinforcement Learning with Implicit Q-Learning
RvS: What is Essential for Offline RL via Supervised Learning?
Pareto Policy Pool for Model-based Offline Reinforcement Learning
CrowdPlay: Crowdsourcing Human Demonstrations for Offline Learning
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning
Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization
Generalized Decision Transformer for Offline Hindsight Information Matching [ ]
Model-Based Offline Meta-Reinforcement Learning with Regularization
AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale [ ]
Dealing with the Unknown: Pessimistic Offline Reinforcement Learning
You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning
A Workflow for Offline Model-Free Robotic Reinforcement Learning [ ]
Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes [ ] [ ] [ ]
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions
Offline Reinforcement Learning with Representations for Actions
Towards Off-Policy Learning for Ranking Policies with Logged Feedback
Safe Offline Reinforcement Learning Through Hierarchical Policies
TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
Model Selection in Batch Policy Optimization
Learning Contraction Policies from Offline Data
CoMPS: Continual Meta Policy Search
MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation [ ]
UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning
Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning
Batch Reinforcement Learning from Crowds
SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning
Safely Bridging Offline and Online Reinforcement Learning
Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information
Value Penalized Q-Learning for Recommender Systems
Offline Reinforcement Learning with Soft Behavior Regularization
Planning from Pixels in Environments with Combinatorially Hard Search Spaces
StARformer: Transformer with State-Action-Reward Representations
Offline RL With Resource Constrained Online Deployment [ ]
Lifelong Robotic Reinforcement Learning by Retaining Experiences [ ]
Dual Behavior Regularized Reinforcement Learning
DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning [ ] [ ]
DROMO: Distributionally Robust Offline Model-based Policy Optimization
Implicit Behavioral Cloning
Reducing Conservativeness Oriented Offline Reinforcement Learning
Policy Gradients Incorporating the Future
Offline Decentralized Multi-Agent Reinforcement Learning
OPAL: Offline Preference-Based Apprenticeship Learning [ ]
Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning
Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning
The Least Restriction for Offline Reinforcement Learning
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
Causal Reinforcement Learning using Observational and Interventional Data
On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [ ]
On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Offline Reinforcement Learning as Anti-Exploration
Corruption-Robust Offline Reinforcement Learning
Offline Inverse Reinforcement Learning
Heuristic-Guided Reinforcement Learning
Reinforcement Learning as One Big Sequence Modeling Problem
Decision Transformer: Reinforcement Learning via Sequence Modeling
Model-Based Offline Planning with Trajectory Pruning
InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem
Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm [ ]
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [ ]
Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)
Regularized Behavior Value Estimation
Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning
Continuous Doubly Constrained Batch Reinforcement Learning
Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
Fast Rates for the Regret of Offline Reinforcement Learning [ ]
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment [ ]
Weighted Model Estimation for Offline Model-based Reinforcement Learning
A Minimalist Approach to Offline Reinforcement Learning
Conservative Offline Distributional Reinforcement Learning
Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
Offline Reinforcement Learning as One Big Sequence Modeling Problem
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism [ ]
Offline Reinforcement Learning with Reverse Model-based Imagination
Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies
Nearly Horizon-Free Offline Reinforcement Learning
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Online and Offline Reinforcement Learning by Planning with a Learned Model
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
Offline RL Without Off-Policy Evaluation
Offline Model-based Adaptable Policy Learning
COMBO: Conservative Offline Model-Based Policy Optimization
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
Bellman-consistent Pessimism for Offline Reinforcement Learning [ ]
The Difficulty of Passive Learning in Deep Reinforcement Learning
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills [ ]
Is Pessimism Provably Efficient for Offline RL? [ ]
Representation Matters: Offline Pretraining for Sequential Decision Making
Offline Reinforcement Learning with Pseudometric Learning
Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
Offline Contextual Bandits with Overparameterized Models
Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Vector Quantized Models for Planning
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL [ ]
Instabilities of Offline RL with Pre-Trained Neural Representation
Offline Meta-Reinforcement Learning with Advantage Weighting
Model-Based Offline Planning [ ]
Batch Reinforcement Learning Through Continuation Method
Model-Based Visual Planning with Self-Supervised Functional Distances
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
What are the Statistical Limits of Offline RL with Linear Function Approximation? [ ]
Reset-Free Lifelong Learning with Skill-Space Planning [ ]
Risk-Averse Offline Reinforcement Learning
Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
Efficient Self-Supervised Data Collection for Offline Robot Learning
Boosting Offline Reinforcement Learning with Residual Generative Modeling
BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning
Behavior Constraining in Weight Space for Offline Reinforcement Learning
Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
Reinforcement Learning via Fenchel-Rockafellar Duality [ ]
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [ ] [ ] [ ]
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
Batch Value-function Approximation with Only Realizability
DRIFT: Deep Reinforcement Learning for Functional Software Testing
Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains
Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion [ ]
Semi-Supervised Reward Learning for Offline Reinforcement Learning
Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation
Offline Reinforcement Learning from Images with Latent Space Models [ ]
POPO: Pessimistic Offline Policy Optimization
Reinforcement Learning with Videos: Combining Offline Observations with Interaction
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones [ ]
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning [ ]
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
Learning Dexterous Manipulation from Suboptimal Experts [ ]
The Reinforcement Learning-Based Multi-Agent Cooperative Approach for the Adaptive Speed Regulation on a Metallurgical Pickling Line
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning [ ]
Offline Meta Learning of Exploration
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Hyperparameter Selection for Offline Reinforcement Learning
Interpretable Control by Reinforcement Learning
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning [ ]
Accelerating Online Reinforcement Learning with Offline Datasets [ ] [ ]
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction [ ]
Critic Regularized Regression
Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
Conservative Q-Learning for Offline Reinforcement Learning [ ] [ ] [ ]
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
MOPO: Model-based Offline Policy Optimization [ ]
MOReL: Model-Based Offline Reinforcement Learning [ ]
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
Multi-task Batch Reinforcement Learning with Metric Learning
Counterfactual Data Augmentation using Locally Factored Dynamics [ ]
On Reward-Free Reinforcement Learning with Linear Function Approximation
Constrained Policy Improvement for Safe and Efficient Reinforcement Learning
BRPO: Batch Residual Policy Optimization [ ]
Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [ ] [ ] [ ]
Accelerating Reinforcement Learning with Learned Skill Priors
PLAS: Latent Action Space for Offline Reinforcement Learning [ ] [ ]
Scaling data-driven robotics with reward sketching and batch reinforcement learning [ ]
Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration
Behavior Regularized Offline Reinforcement Learning
Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
AlgaeDICE: Policy Gradient from Arbitrary Experience
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction [ ] [ ] [ ]
Off-Policy Deep Reinforcement Learning without Exploration
Safe Policy Improvement with Baseline Bootstrapping
Information-Theoretic Considerations in Batch Reinforcement Learning
Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents
Safe Policy Improvement with Soft Baseline Bootstrapping
Importance Weighted Transfer of Samples in Reinforcement Learning
Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation [ ]
Off-Policy Policy Gradient with State Distribution Correction
Behavioral Cloning from Observation
Diverse Exploration for Fast and Safe Policy Improvement
Deep Exploration via Bootstrapped DQN
Safe Policy Improvement by Minimizing Robust Baseline Regret
Residential Demand Response Applications Using Batch Reinforcement Learning
Structural Return Maximization for Reinforcement Learning
Simultaneous Perturbation Algorithms for Batch Off-Policy Search
Guided Policy Search
Off-Policy Actor-Critic
PAC-Bayesian Policy Evaluation for Reinforcement Learning
Tree-Based Batch Mode Reinforcement Learning
Neural Fitted Q Iteration–First Experiences with a Data Efficient Neural Reinforcement Learning Method
Off-Policy Temporal-Difference Learning with Function Approximation

awesome-offline-rl / Papers / Offline RL: Benchmarks/Experiments

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning
Pearl: A Production-ready Reinforcement Learning Agent
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
Datasets and Benchmarks for Offline Safe Reinforcement Learning
Improving and Benchmarking Offline Reinforcement Learning Algorithms
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
CORL: Research-oriented Deep Offline Reinforcement Learning Library [ ]
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [ ]
Train Offline, Test Online: A Real Robot Learning Benchmark
Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation
Real World Offline Reinforcement Learning with Realistic Data Source [ ] [ ]
Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets
B2RL: An open-source Dataset for Building Batch Reinforcement Learning
An Empirical Study of Implicit Regularization in Deep Offline RL
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning [ ]
The Challenges of Exploration for Offline Reinforcement Learning
Offline Equilibrium Finding [ ]
Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Dungeons and Data: A Large-Scale NetHack Dataset
NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning [ ] [ ]
A Closer Look at Offline RL Agents
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
d3rlpy: An Offline Deep Reinforcement Learning Library [ ]
Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning [ ]
Interpretable performance analysis towards offline reinforcement learning: A dataset perspective
Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning [ ]
Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning
Offline Reinforcement Learning Hands-On
D4RL: Datasets for Deep Data-Driven Reinforcement Learning [ ] [ ] [ ]
RL Unplugged: Benchmarks for Offline Reinforcement Learning [ ] [ ]
Benchmarking Batch Deep Reinforcement Learning Algorithms

awesome-offline-rl / Papers / Offline RL: Applications

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning
P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer
Online Symbolic Music Alignment with Offline Reinforcement Learning
Advancing RAN Slicing with Offline Reinforcement Learning
Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach
Self-Driving Telescopes: Autonomous Scheduling of Astronomical Observation Campaigns with Offline Reinforcement Learning
A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning
Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets
STEER: Unified Style Transfer with Expert Reinforcement
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning
Offline Reinforcement Learning for Optimizing Production Bidding Policies
End-to-end Offline Reinforcement Learning for Glycemia Control
Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments
Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning
RLSynC: Offline-Online Reinforcement Learning for Synthon Completion
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World
Reinforced Self-Training (ReST) for Language Modeling
Aligning Language Models with Offline Reinforcement Learning from Human Feedback
Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation
Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills
Improving Offline RL by Blending Heuristics
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
Robust Reinforcement Learning Objectives for Sequential Recommender Systems
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning
Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure
Offline Experience Replay for Continual Offline Reinforcement Learning
Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning
Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning
User Retention-oriented Recommendation with Decision Transformer
Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning
INVICTUS: Optimizing Boolean Logic Circuit Synthesis via Synergistic Learning and Search
Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings
Winning Solution of Real Robot Challenge III
Learning-based MPC from Big Data Using Reinforcement Learning
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Beyond Reward: Offline Preference-guided Policy Optimization
DevFormer: A Symmetric Transformer for Context-Aware Device Placement
On the Effectiveness of Offline RL for Dialogue Response Generation
Bidirectional Learning for Offline Model-based Biological Sequence Design
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer
Semi-Offline Reinforcement Learning for Optimized Text Generation
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement
Offline RL for Natural Language Generation with Implicit Language Q Learning
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
Dialog Action-Aware Transformer for Dialog Policy Learning
Can Offline Reinforcement Learning Help Natural Language Understanding?
NeurIPS 2022 Competition: Driving SMARTS
Controlling Commercial Cooling Systems Using Reinforcement Learning
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials [ ]
Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning
Learning-to-defer for sequential medical decision-making under uncertainty
Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios
Dialogue Evaluation with Offline Reinforcement Learning
Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems
A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning
BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion
Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space
Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective
ARLO: A Framework for Automated Reinforcement Learning
A Reinforcement Learning-based Volt-VAR Control Dataset and Testing Environment
CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning
Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes [ ]
CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System [ ]
A Conservative Q-Learning approach for handling distribution shift in sepsis treatment strategies
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning
Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit
Offline Reinforcement Learning for Mobile Notifications
Offline Reinforcement Learning for Road Traffic Control
Sustainable Online Reinforcement Learning for Auto-bidding
Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
Multi-objective Optimization of Notifications Using Offline Reinforcement Learning
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning
GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems
Offline Reinforcement Learning for Visual Navigation
Semi-Markov Offline Reinforcement Learning for Healthcare
Automate Page Layout Optimization: An Offline Deep Q-Learning Approach
RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System [ ] [ ]
Compressive Features in Offline Reinforcement Learning for Recommender Systems
Causal-aware Safe Policy Improvement for Task-oriented dialogue
Offline Contextual Bandits for Wireless Network Optimization
Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment
Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement
Medical Dead-ends and Learning to Identify High-risk States and Treatments
An Offline Deep Reinforcement Learning for Maintenance Decision-Making
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs
Offline reinforcement learning with uncertainty for treatment strategies in sepsis
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles
pH-RL: A personalization architecture to bring reinforcement learning to health practice
DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning [ ]
Personalization for Web-based Services using Offline Reinforcement Learning
BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market
Safe Driving via Expert Guided Policy Optimization [ ] [ ]
A General Offline Reinforcement Learning Framework for Interactive Recommendation
Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms
Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
Learning robust driving policies without online exploration
Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation
Network Intrusion Detection Based on Extended RBF Neural Network With Offline Reinforcement Learning
Towards Accelerating Offline RL based Recommender Systems
Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation
Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare
Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP
Remote Electrical Tilt Optimization via Safe Reinforcement Learning
An Optimistic Perspective on Offline Reinforcement Learning [ ] [ ]
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation
Human-centric Dialog Training via Offline Reinforcement Learning
Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning
Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning
A Clustering-Based Reinforcement Learning Approach for Tailored Personalization of E-Health Interventions
Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient
Batch Reinforcement Learning on the Industrial Benchmark: First Experiences
Policy Networks with Two-Stage Training for Dialogue Systems
Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning

awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Theory/Methods

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
Multiply Robust Off-policy Evaluation and Learning under Truncation by Death
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits
Offline Policy Evaluation in Large Action Spaces via Outcome-Oriented Action Grouping
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
Distributional Off-Policy Evaluation for Slate Recommendations
Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits
Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces
Offline Policy Evaluation with Out-of-Sample Guarantees
Quantile Off-Policy Evaluation via Deep Conditional Generative Learning
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model [ ]
Off-Policy Evaluation for Large Action Spaces via Embeddings [ ] [ ]
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
Conformal Off-Policy Prediction in Contextual Bandits
Off-Policy Evaluation with Policy-Dependent Optimization Response
Off-Policy Evaluation with Deficient Support Using Side Information
Towards Robust Off-Policy Evaluation via Human Inputs
Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
Anytime-valid off-policy inference for contextual bandits
Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency
Off-Policy Evaluation in Embedded Spaces
Safe Exploration for Efficient Policy Evaluation and Comparison
Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
Control Variates for Slate Off-Policy Evaluation
Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings
Optimal Off-Policy Evaluation from Multiple Logging Policies [ ]
Off-policy Confidence Sequences
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting [ ]
Off-Policy Evaluation Using Information Borrowing and Context-Based Switching
Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits
Off-Policy Risk Assessment in Contextual Bandits
Off-Policy Evaluation of Slate Policies under Bayes Risk
A Practical Guide of Off-Policy Evaluation for Bandit Problems
Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions
Doubly robust off-policy evaluation with shrinkage
Adaptive Estimator Selection for Off-Policy Evaluation [ ]
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits
Improving Offline Contextual Bandits with Distributional Robustness
Balanced Off-Policy Evaluation in General Action Spaces
Policy Evaluation with Latent Confounders via Optimal Balance
On the Design of Estimators for Bandit Off-Policy Evaluation
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
Focused Context Balancing for Robust Offline Policy Evaluation
When People Change their Mind: Off-Policy Evaluation in Non-Stationary Recommendation Environments
Policy Evaluation and Optimization with Continuous Treatments
Confounding-Robust Policy Improvement
Balanced Policy Evaluation and Learning
Offline Evaluation of Ranking Policies with Click Models
Effective Evaluation using Logged Bandit Feedback from Multiple Loggers
Off-policy Evaluation for Slate Recommendation
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Data-Efficient Policy Evaluation Through Behavior Policy Search
Doubly Robust Policy Evaluation and Optimization
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Distributional Off-policy Evaluation with Bellman Residual Minimization
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits
State-Action Similarity-Based Representations for Off-Policy Evaluation
Off-Policy Evaluation for Human Feedback
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
An Instrumental Variable Approach to Confounded Off-Policy Evaluation
Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes
Distributional Offline Policy Evaluation with Predictive Error Guarantees
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
Revisiting Bellman Errors for Offline Model Selection [ ]
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
Variational Latent Branching Model for Off-Policy Evaluation
Multiple-policy High-confidence Policy Evaluation
Off-Policy Evaluation with Online Adaptation for Robot Exploration in Challenging Environments
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards
When is Offline Policy Selection Sample Efficient for Reinforcement Learning?
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
Evaluation of Active Feature Acquisition Methods for Static Feature Settings
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
Marginalized Importance Sampling for Off-Environment Policy Evaluation
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments
Off-policy Evaluation in Doubly Inhomogeneous Environments
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
π2vec : Policy Representations with Successor Features
Conformal Off-Policy Evaluation in Markov Decision Processes
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
Minimax Weight Learning for Absorbing MDPs
Improving Monte Carlo Evaluation with Offline Data
First-order Policy Optimization for Robust Policy Evaluation
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
Learning Bellman Complete Representations for Offline Policy Evaluation
Supervised Off-Policy Ranking
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
Oracle Inequalities for Model Selection in Offline Reinforcement Learning
Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models
Off-Policy Evaluation for Action-Dependent Non-stationary Environments
Stateful Offline Contextual Policy Evaluation and Learning
Off-Policy Risk Assessment for Markov Decision Processes
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information
Offline Policy Evaluation and Optimization under Confounding
Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies
Safe Evaluation For Offline Learning: Are We Ready To Deploy?
Low Variance Off-policy Evaluation with State-based Importance Sampling
Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach
Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
A Sharp Characterization of Linear Estimators for Offline Policy Evaluation
A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets [ ]
A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation
SOPE: Spectrum of Off-Policy Estimators
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
Variance-Aware Off-Policy Evaluation with Linear Function Approximation
Universal Off-Policy Evaluation
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
State Relevance for Off-Policy Evaluation
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
Deeply-Debiased Off-Policy Interval Estimation
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Minimax Model Learning
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
High-Confidence Off-Policy (or Counterfactual) Variance Estimation
Debiased Off-Policy Evaluation for Recommendation Systems
Pessimistic Model Selection for Offline Deep Reinforcement Learning
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes
Off-Policy Evaluation in Partially Observed Markov Decision Processes
A Spectral Approach to Off-Policy Evaluation for POMDPs
Projected State-action Balancing Weights for Offline Reinforcement Learning s
Active Offline Policy Selection
On Instrumental Variable Regression for Deep Offline Policy Evaluation
Average-Reward Off-Policy Policy Evaluation with Function Approximation
Sequential causal inference in a single world of connected units
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
CoinDICE: Off-Policy Confidence Interval Estimation
Off-Policy Interval Estimation with Lipschitz Value Iteration
Off-Policy Evaluation via the Regularized Lagrangian
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
GenDICE: Generalized Offline Estimation of Stationary Values
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Batch Stationary Distribution Estimation
Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control [ ]
Defining Admissible Rewards for High Confidence Policy Evaluation in Batch Reinforcement Learning
Offline Policy Selection under Uncertainty
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies
Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning
Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Off-Policy Evaluation in Partially Observable Environments
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Off-Policy Evaluation via Off-Policy Classification
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections [ ]
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
Batch Policy Learning under Constraints [ ] [ ]
More Efficient Off-Policy Evaluation through Regularized Targeted Learning
Combining parametric and nonparametric models for off-policy evaluation
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Representation Balancing MDPs for Off-policy Policy Evaluation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
More Robust Doubly Robust Off-policy Evaluation
Importance Sampling for Fair Policy Selection
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
Consistent On-Line Off-Policy Evaluation
Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
High Confidence Policy Improvement
High Confidence Off-Policy Evaluation
Eligibility Traces for Off-Policy Policy Evaluation
Sequential Counterfactual Risk Minimization
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Multi-Task Off-Policy Learning from Bandit Feedback
Exponential Smoothing for Off-Policy Learning
Counterfactual Learning with General Data-generating Policies
Distributionally Robust Policy Gradient for Offline Contextual Bandits
Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
Pessimistic Off-Policy Multi-Objective Optimization
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Uncertainty-Aware Off-Policy Learning
Fair Off-Policy Learning from Observational Data
Interpretable Off-Policy Learning via Hyperbox Search
Offline Policy Optimization with Eligible Actions
Towards Robust Off-policy Learning for Runtime Uncertainty
Safe Optimal Design with Applications in Off-Policy Learning
Off-Policy Actor-critic for Recommender Systems
MGPolicy: Meta Graph Enhanced Off-policy Learning for Recommendations
Distributionally Robust Policy Learning with Wasserstein Distance
Local Policy Improvement for Recommender Systems
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality
Fast Offline Policy Optimization for Large Scale Recommendation
Practical Counterfactual Policy Learning for Top-K Recommendations
Boosted Off-Policy Learning
Semi-Counterfactual Risk Minimization Via Neural Networks
IMO^3: Interactive Multi-Objective Off-Policy Optimization
Pessimistic Off-Policy Optimization for Learning to Rank
Non-Stationary Off-Policy Optimization
Learning from eXtreme Bandit Feedback
Generalizing Off-Policy Learning under Sample Selection Bias
Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
From Importance Sampling to Doubly Robust Policy Gradient
Efficient Policy Learning from Surrogate-Loss Classification Reductions [ ]
Off-policy Bandits with Deficient Support
Off-policy Learning in Two-stage Recommender Systems
More Efficient Policy Learning via Optimal Retargeting
Learning When-to-Treat Policies
Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks
Bandit Overfitting in Offline Policy Learning
Counterfactual Learning of Continuous Stochastic Policies
Top-K Off-Policy Correction for a REINFORCE Recommender System
Semi-Parametric Efficient Policy Learning with Continuous Actions
Efficient Counterfactual Learning from Bandit Feedback
Deep Learning with Logged Bandit Feedback
The Self-Normalized Estimator for Counterfactual Learning
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Benchmarks/Experiments

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
Offline Policy Comparison with Confidence: Benchmarks and Baselines
Extending Open Bandit Pipeline to Simulate Industry Challenges
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation [ ] [ ]
Evaluating the Robustness of Off-Policy Evaluation [ ]
Benchmarks for Deep Off-Policy Evaluation [ ]
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning [ ]

awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Applications

HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
When is Off-Policy Evaluation Useful? A Data-Centric Perspective
Counterfactual Evaluation of Peer-Review Assignment Policies
Balanced Off-Policy Evaluation for Personalized Pricing
Multi-Action Dialog Policy Learning from Logged User Feedback
CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings
Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
Offline Evaluation to Make Decisions About Playlist Recommendation
Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters
Evaluating Reinforcement Learning Algorithms in Observational Health Settings
Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
Offline A/B testing for Recommender Systems
Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback
Handling Confounding for Realistic Off-Policy Evaluation
Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

awesome-offline-rl / Open Source Software/Implementations

SCOPE-RL: A Python library for offline reinforcement learning, off-policy evaluation, and selection 114 8 months ago [ ] [ ] [ ]
Open Bandit Pipeline: a research framework for bandit algorithms and off-policy evaluation 645 6 months ago [ ] [ ] [ ]
pyIEOE: Towards An Interpretable Evaluation for Offline Evaluation 31 about 3 years ago [ ]
d3rlpy: An Offline Deep Reinforcement Learning Library 1,327 13 days ago [ ] [ ] [ ]
MINERVA: An out-of-the-box GUI tool for data-driven deep reinforcement learning 95 over 3 years ago [ ] [ ]
Minari 294 14 days ago
CORL: Clean Offline Reinforcement Learning 482 10 months ago [ ]
COBS: Caltech OPE Benchmarking Suite 61 over 2 years ago [ ]
Benchmarks for Deep Off-Policy Evaluation 85 4 months ago [ ]
DICE: The DIstribution Correction Estimation Library 99 4 months ago [ ]
RL Unplugged: Benchmarks for Offline Reinforcement Learning 13,250 26 days ago [ ] [ ]
D4RL: Datasets for Deep Data-Driven Reinforcement Learning 1,346 18 days ago [ ] [ ]
V-D4RL: Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations 95 6 months ago [ }
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware 17 10 months ago [ ]
RLDS: Reinforcement Learning Datasets 293 about 2 months ago [ ]
OEF: Offline Equilibrium Finding 3 over 2 years ago [ ]
ExORL: Exploratory Data for Offline Reinforcement Learning 105 almost 3 years ago [ ]
RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System 220 10 months ago [ ] ]
NeoRL: Near Real-World Benchmarks for Offline Reinforcement Learning [ ] [ ]
The Industrial Benchmark Offline RL Datasets 126 over 1 year ago [ ]
ARLO: A Framework for Automated Reinforcement Learning 10 over 2 years ago [ ]
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising 467 over 3 years ago [ ]
MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces 51 10 months ago [ ] [ ]
A Reinforcement Learning-based Volt-VAR Control Dataset 20 over 2 years ago [ ]

awesome-offline-rl / Blog/Podcast / Blog

Counterfactual Evaluation for Recommendation Systems
Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
D4RL: Building Better Benchmarks for Offline Reinforcement Learning
Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?
Tackling Open Challenges in Offline Reinforcement Learning
An Optimistic Perspective on Offline Reinforcement Learning
Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning
Introducing completely free datasets for data-driven deep reinforcement learning
Offline (Batch) Reinforcement Learning: A Review of Literature and Applications
Data-Driven Deep Reinforcement Learning

awesome-offline-rl / Blog/Podcast / Podcast

AI Trends 2023: Reinforcement Learning – RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine
Bandits and Simulators for Recommenders with Olivier Jeunen
Sergey Levine on Robot Learning & Offline RL
Off-Line, Off-Policy RL for Real-World Decision Making at Facebook
Xianyuan Zhan | TalkRL: The Reinforcement Learning Podcast
MOReL: Model-Based Offline Reinforcement Learning with Aravind Rajeswaran
Trends in Reinforcement Learning with Chelsea Finn
Nan Jiang | TalkRL: The Reinforcement Learning Podcast
Scott Fujimoto | TalkRL: The Reinforcement Learning Podcast
CONSEQUENCES (RecSys 2023)
Offline Reinforcement Learning (NeurIPS 2022)
Reinforcement Learning for Real Life (NeurIPS 2022)
CONSEQUENCES + REVEAL (RecSys 2022)
Offline Reinforcement Learning (NeurIPS 2021)
Reinforcement Learning for Real Life (ICML 2021)
Reinforcement Learning Day 2021
Offline Reinforcement Learning (NeurIPS 2020)
Reinforcement Learning from Batch Data and Simulation
Reinforcement Learning for Real Life (RL4RealLife 2020)
Safety and Robustness in Decision Making (NeurIPS 2019)
Reinforcement Learning for Real Life (ICML 2019)
Real-world Sequential Decision Making (ICML 2019)

awesome-offline-rl / Tutorials/Talks/Lectures

Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
Counterfactual Evaluation and Learning for Interactive Systems
Representation Learning for Online and Offline RL in Low-rank MDPs
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment
Deep Reinforcement Learning with Real-World Data
Planning with Reinforcement Learning
Imitation learning vs. offline reinforcement learning
Tutorial on the Foundations of Offline Reinforcement Learning
Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances [ ]
Offline Reinforcement Learning
Offline Reinforcement Learning
Fast Rates for the Regret of Offline Reinforcement Learning
Bellman-consistent Pessimism for Offline Reinforcement Learning
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
Is Pessimism Provably Efficient for Offline RL?
Adaptive Estimator Selection for Off-Policy Evaluation
What are the Statistical Limits of Offline RL with Linear Function Approximation?
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
A Gentle Introduction to Offline Reinforcement Learning
Principles for Tackling Distribution Shift: Pessimism, Adaptation, and Anticipation
Offline Reinforcement Learning: Incorporating Knowledge from Data into RL
Offline RL
Learning a Multi-Agent Simulator from Offline Demonstrations
Towards Reliable Validation and Evaluation for Offline RL
Batch RL Models Built for Validation
Offline Reinforcement Learning: From Algorithms to Practical Challenges
Data Scalability for Robot Learning
Statistically Efficient Offline Reinforcement Learning
Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry
Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings
Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning
Scaling Probabilistically Safe Learning to Robotics
Deep Reinforcement Learning in the Real World

Backlinks from these awesome lists:

More related projects: