awesome-offline-rl |
| Haruka Kiyohara | | | (Cornell University) |
| Yuta Saito | | | (Hanjuku-kaso Co., Ltd. / Cornell University) |
awesome-offline-rl / Table of Contents |
| Papers | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents / Papers |
| Review/Survey/Position Papers | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents / Papers / Review/Survey/Position Papers |
| Offline RL | 942 | over 1 year ago | |
| Off-Policy Evaluation and Learning | 942 | over 1 year ago | |
| Related Reviews | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents / Papers |
| Offline RL: Theory/Methods | 942 | over 1 year ago | |
| Offline RL: Benchmarks/Experiments | 942 | over 1 year ago | |
| Offline RL: Applications | 942 | over 1 year ago | |
| Off-Policy Evaluation and Learning: Theory/Methods | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents / Papers / Off-Policy Evaluation and Learning: Theory/Methods |
| Off-Policy Evaluation: Contextual Bandits | 942 | over 1 year ago | |
| Off-Policy Evaluation: Reinforcement Learning | 942 | over 1 year ago | |
| Off-Policy Learning | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents / Papers |
| Off-Policy Evaluation and Learning: Benchmarks/Experiments | 942 | over 1 year ago | |
| Off-Policy Evaluation and Learning: Applications | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents |
| Open Source Software/Implementations | 942 | over 1 year ago | |
| Blog/Podcast | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents / Blog/Podcast |
| Blog | 942 | over 1 year ago | |
| Podcast | 942 | over 1 year ago | |
awesome-offline-rl / Table of Contents |
| Related Workshops | 942 | over 1 year ago | |
| Tutorials/Talks/Lectures | 942 | over 1 year ago | |
awesome-offline-rl / Papers / Review/Survey/Position Papers |
| Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | | | |
| A Survey on Offline Model-Based Reinforcement Learning | | | |
| Foundation Models for Decision Making: Problems, Methods, and Opportunities | | | |
| A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems | | | |
| Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems | | | |
| A Review of Off-Policy Evaluation in Reinforcement Learning | | | |
| On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems | | | |
| Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization | | | |
| Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives | | | |
| A Survey on Transformers in Reinforcement Learning | | | |
| Deep Reinforcement Learning: Opportunities and Challenges | | | |
| A Survey on Model-based Reinforcement Learning | | | |
| Survey on Fair Reinforcement Learning: Theory and Practice | | | |
| Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation | | | |
| A Survey of Generalisation in Deep Reinforcement Learning | | | |
awesome-offline-rl / Papers / Offline RL: Theory/Methods |
| Value-Aided Conditional Supervised Learning for Offline RL | | | |
| Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning | | | |
| DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching | | | |
| Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning | | | |
| Context-Former: Stitching via Latent Conditioned Sequence Modeling | | | |
| Adversarially Trained Actor Critic for offline CMDPs | | | |
| Optimistic Model Rollouts for Pessimistic Offline Policy Optimization | | | |
| Solving Continual Offline Reinforcement Learning with Decision Transformer | | | |
| MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning | | | |
| Reframing Offline Reinforcement Learning as a Regression Problem | | | |
| Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback | | | |
| Policy-regularized Offline Multi-objective Reinforcement Learning | | | |
| Differentiable Tree Search in Latent State Space | | | |
| Learning from Sparse Offline Datasets via Conservative Density Estimation | | | |
| Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model | | | |
| PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning | | | |
| Critic-Guided Decision Transformer for Offline Reinforcement Learning | | | |
| CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning | | | |
| Neural Network Approximation for Pessimistic Offline Reinforcement Learning | | | |
| A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning | | | |
| The Generalization Gap in Offline Reinforcement Learning | | | |
| Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills | | | |
| MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator | | | |
| Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization | | | |
| Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning | | | |
| Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning | | | |
| Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees | | | |
| Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning | | | |
| Hierarchical Decision Transformer | | | |
| Prompt-Tuning Decision Transformer with Preference Ranking | | | |
| Context Shift Reduction for Offline Meta-Reinforcement Learning | | | |
| Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization | | | |
| Score Models for Offline Goal-Conditioned Reinforcement Learning | | | |
| Offline RL with Observation Histories: Analyzing and Improving Sample Complexity | | | |
| Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective | | | |
| Rethinking Decision Transformer via Hierarchical Reinforcement Learning | | | |
| Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning | | | |
| GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models | | | |
| SERA: Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning | | | |
| Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage | | | |
| Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning | | | |
| CROP: Conservative Reward for Model-based Offline Policy Optimization | | | |
| Towards Robust Offline Reinforcement Learning under Diverse Data Corruption | | | |
| Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias | | | |
| Boosting Continuous Control with Consistency Policy | | | |
| Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning | | | |
| Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning | | | |
| DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning | | | |
| Self-Confirming Transformer for Locally Consistent Online Adaptation in Multi-Agent Reinforcement Learning | | | |
| Learning to Reach Goals via Diffusion | | | |
| Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making | | | |
| Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning | | | |
| Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning | | | |
| Reasoning with Latent Diffusion in Offline Reinforcement Learning | | | |
| Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance | | | |
| Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness | | | |
| Robust Offline Reinforcement Learning -- Certify the Confidence Interval | | | |
| Stackelberg Batch Policy Learning | | | |
| H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps | | | |
| Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions | | | |
| DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning | | | |
| Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration | | | |
| Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning | | | |
| Reasoning with Latent Diffusion in Offline Reinforcement Learning | | | |
| Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance | | | |
| Multi-Objective Decision Transformers for Offline Reinforcement Learning | | | |
| AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning | | | |
| Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations | | | |
| PASTA: Pretrained Action-State Transformer Agents | | | |
| Towards A Unified Agent with Foundation Models | | | |
| Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning with Imbalanced Datasets | | | |
| LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning | | | |
| Elastic Decision Transformer | | | |
| Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning | | | |
| Is RLHF More Difficult than Standard RL? | | | |
| Supervised Pretraining Can Learn In-Context Reinforcement Learning | | | |
| Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching | | | |
| Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery | | | |
| CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning | | | |
| Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting | | | |
| Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning | | | |
| A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning | | | |
| HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach | | | |
| Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration | | | |
| In-Sample Policy Iteration for Offline Reinforcement Learning | | | |
| Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning | | | |
| Offline Prioritized Experience Replay | | | |
| Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding | | | |
| Offline Meta Reinforcement Learning with In-Distribution Online Adaptation | | | |
| Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning | | | |
| Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism | | | |
| MADiff: Offline Multi-agent Learning with Diffusion Models | | | |
| Provable Offline Reinforcement Learning with Human Feedback | | | |
| Think Before You Act: Decision Transformers with Internal Working Memory | | | |
| Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning | | | |
| Offline Primal-Dual Reinforcement Learning for Linear MDPs | | | |
| Federated Offline Policy Learning with Heterogeneous Observational Data | | | |
| Offline Reinforcement Learning with Additional Covering Distributions | | | |
| Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning | | | |
| Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems | | | |
| Federated Ensemble-Directed Offline Reinforcement Learning | | | |
| IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies | | | |
| Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments | | | |
| Reinforcement Learning from Passive Data via Latent Intentions | | | [ ] |
| Uncertainty-driven Trajectory Truncation for Model-based Offline Reinforcement Learning | | | |
| RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment | | | |
| Batch Quantum Reinforcement Learning | | | |
| Accelerating exploration and representation learning with offline pre-training | | | |
| On Context Distribution Shift in Task Representation Learning for Offline Meta RL | | | |
| Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning | | | |
| Learning Excavation of Rigid Objects with Offline Reinforcement Learning | | | |
| Goal-conditioned Offline Reinforcement Learning through State Space Partitioning | | | |
| Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies | | | |
| Deploying Offline Reinforcement Learning with Human Feedback | | | |
| Synthetic Experience Replay | | | |
| ENTROPY: Environment Transformer and Offline Policy Optimization | | | |
| Graph Decision Transformer | | | |
| Selective Uncertainty Propagation in Offline RL | | | |
| Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning | | | |
| Skill Decision Transformer | | | |
| Guiding Online Reinforcement Learning with Action-Free Offline Pretraining | | | |
| SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning | | | |
| APAC: Authorized Probability-controlled Actor-Critic For Offline Reinforcement Learning | | | |
| Designing an offline reinforcement learning objective from scratch | | | |
| Behaviour Discriminator: A Simple Data Filtering Method to Improve Offline Policy Learning | | | |
| Learning to View: Decision Transformers for Active Object Detection | | | |
| Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning | | | |
| Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization | | | |
| Contextual Conservative Q-Learning for Offline Reinforcement Learning | | | |
| Offline Policy Optimization in RL with Variance Regularizaton | | | |
| Transformer in Transformer as Backbone for Deep Reinforcement Learning | | | |
| SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning | | | |
| Revisiting the Minimalist Approach to Offline Reinforcement Learning | | | |
| Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning | | | |
| Supported Value Regularization for Offline Reinforcement Learning | | | |
| Conservative State Value Estimation for Offline Reinforcement Learning | | | |
| Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning | | | |
| Adversarial Model for Offline Reinforcement Learning | | | |
| Percentile Criterion Optimization in Offline Reinforcement Learning | | | |
| Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning | | | |
| HIQL: Offline Goal-Conditioned RL with Latent States as Actions | | | |
| Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning | | | |
| Offline RL with Discrete Proxy Representations for Generalizability in POMDPs | | | |
| Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization | | | |
| Bi-Level Offline Policy Optimization with Limited Exploration | | | |
| Provably (More) Sample-Efficient Offline RL with Options | | | |
| Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage | | | |
| AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation | | | |
| Budgeting Counterfactual for Offline RL | | | |
| Efficient Diffusion Policies for Offline Reinforcement Learning | | | |
| Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning | | | |
| Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data | | | |
| Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage | | | |
| Provably Efficient Offline Reinforcement Learning in Regular Decision Processes | | | |
| Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability | | | |
| On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling and Beyond | | | |
| Conservative Offline Policy Adaptation in Multi-Agent Games | | | |
| Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL | | | |
| Survival Instinct in Offline Reinforcement Learning | | | |
| Learning from Visual Observation via Offline Pretrained State-to-Go Transformer | | | |
| Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization | | | |
| Learning to Influence Human Behavior with Offline Reinforcement Learning | | | |
| Residual Q-Learning: Offline and Online Policy Customization without Value | | | |
| Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning | | | |
| Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets | | | |
| Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL | | | |
| Corruption-Robust Offline Reinforcement Learning with General Function Approximation | | | |
| Learning to Modulate pre-trained Models in RL | | | |
| Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning | | | |
| One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning | | | |
| Goal-Conditioned Predictive Coding for Offline Reinforcement Learning | | | |
| Mutual Information Regularized Offline Reinforcement Learning | | | |
| Offline RL With Heteroskedastic Datasets and Support Constraints | | | |
| Offline Reinforcement Learning with Differential Privacy | | | |
| Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples | | | |
| Reining Generalization in Offline Reinforcement Learning via Representation Distinction | | | |
| VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning | | | |
| SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations | | | |
| Hierarchical Diffusion for Offline Decision Making | | | |
| MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations | | | |
| Safe Offline Reinforcement Learning with Real-Time Budget Constraints | | | |
| Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints | | | |
| A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning | | | |
| Anti-Exploration by Random Network Distillation | | | |
| Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning | | | |
| PASTA: Pessimistic Assortment Optimization | | | |
| Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning | | | |
| Supported Trust Region Optimization for Offline Reinforcement Learning | | | |
| Principled Offline RL in the Presence of Rich Exogenous Information | | | |
| Efficient Online Reinforcement Learning with Offline Data | | | |
| Boosting Offline Reinforcement Learning with Action Preference Query | | | |
| Model-based Offline Reinforcement Learning with Count-based Conservatism | | | |
| Constrained Decision Transformer for Offline Safe Reinforcement Learning | | | |
| Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning | | | |
| Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources | | | |
| What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? | | | |
| Policy Regularization with Dataset Constraint for Offline Reinforcement Learning | | | |
| MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL | | | |
| Distance Weighted Supervised Learning for Offline Interaction Data | | | |
| Masked Trajectory Models for Prediction, Representation, and Control | | | |
| Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning | | | |
| Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models | | | |
| Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap | | | |
| Future-conditioned Unsupervised Pretraining for Decision Transformer | | | |
| PAC-Bayesian Offline Contextual Bandits With Guarantees | | | |
| Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL | | | |
| Jump-Start Reinforcement Learning | | | [ ] |
| Learning Temporally AbstractWorld Models without Online Experimentation | | | |
| A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback | | | |
| Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation | | | |
| Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories | | | |
| Actor-Critic Alignment for Offline-to-Online Reinforcement Learning | | | |
| Leveraging Offline Data in Online Reinforcement Learning | | | |
| Offline Reinforcement Learning with Closed-Form Policy Improvement Operators | | | |
| Offline Learning in Markov Games with General Function Approximation | | | |
| Offline Meta Reinforcement Learning with In-Distribution Online Adaptation | | | |
| Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL | | | |
| Confidence-Conditioned Value Functions for Offline Reinforcement Learning | | | |
| Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes | | | [ ] |
| Is Conditional Generative Modeling all you need for Decision-Making? | | | [ ] |
| Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization | | | |
| Extreme Q-Learning: MaxEnt RL without Entropy | | | |
| Dichotomy of Control: Separating What You Can Control from What You Cannot | | | |
| From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data | | | |
| VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation | | | |
| Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian | | | |
| The In-Sample Softmax for Offline Reinforcement Learning | | | |
| VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training | | | [ ] [ ] |
| Does Zero-Shot Reinforcement Learning Exist? | | | |
| Behavior Prior Representation learning for Offline Reinforcement Learning | | | |
| Mind the Gap: Offline Policy Optimization for Imperfect Rewards | | | |
| Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement | | | |
| User-Interactive Offline Reinforcement Learning | | | |
| Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data | | | |
| Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient | | | [ ] |
| Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting | | | |
| Efficient Offline Policy Optimization with a Learned Model | | | |
| Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning | | | |
| When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning | | | |
| In-sample Actor Critic for Offline Reinforcement Learning | | | |
| Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning | | | |
| Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization | | | |
| Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling | | | |
| Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient | | | |
| Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game | | | |
| Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes | | | |
| Hyper-Decision Transformer for Efficient Online Policy Adaptation | | | |
| Efficient Planning in a Compact Latent Action Space | | | |
| Preference Transformer: Modeling Human Preferences using Transformers for RL | | | [ ] |
| Behavior Proximal Policy Optimization | | | |
| Provably Efficient Neural Offline Reinforcement Learning via Perturbed Rewards | | | |
| The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning | | | |
| Decision Transformer under Random Frame Dropping | | | |
| Policy Expansion for Bridging Offline-to-Online Reinforcement Learning | | | |
| Finetuning Offline World Models in the Real World | | | |
| On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples | | | |
| Adaptive Policy Learning for Offline-to-Online Reinforcement Learning | | | |
| Safe Policy Improvement for POMDPs via Finite-State Controllers | | | |
| Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning | | | |
| On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation | | | |
| Contrastive Example-Based Control | | | |
| Curriculum Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning with On-Policy Q-Function Regularization | | | |
| Model-based Offline Policy Optimization with Adversarial Network | | | |
| Efficient experience replay architecture for offline reinforcement learning | | | |
| Automatic Trade-off Adaptation in Offline RL | | | |
| Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling | | | |
| Latent Variable Representation for Reinforcement Learning | | | |
| Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning | | | |
| State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning | | | |
| Masked Autoencoding for Scalable and Generalizable Decision Making | | | |
| Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning | | | |
| Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size | | | |
| Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows | | | |
| Model-based Trajectory Stitching for Improved Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning with Adaptive Behavior Regularization | | | |
| Contextual Transformer for Offline Meta Reinforcement Learning | | | |
| Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning | | | |
| ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data | | | |
| Contrastive Value Learning: Implicit Models for Simple Offline RL | | | |
| Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping | | | |
| Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian | | | |
| Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information | | | |
| Provable Safe Reinforcement Learning with Binary Feedback | | | |
| Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision | | | |
| Implicit Offline Reinforcement Learning via Supervised Learning | | | |
| Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation | | | |
| Boosting Offline Reinforcement Learning via Data Rebalancing | | | |
| ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning | | | [ ] |
| State Advantage Weighting for Offline RL | | | |
| Blessing from Experts: Super Reinforcement Learning in Confounded Environments | | | |
| DCE: Offline Reinforcement Learning With Double Conservative Estimates | | | |
| On the Opportunities and Challenges of using Animals Videos in Reinforcement Learning | | | |
| Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes | | | |
| Exploiting Reward Shifting in Value-Based Deep RL | | | |
| Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation | | | |
| C^2:Co-design of Robots via Concurrent Networks Coupling Online and Offline Reinforcement Learning | | | |
| Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments | | | |
| Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity | | | |
| AdaCat: Adaptive Categorical Discretization for Autoregressive Models | | | |
| Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning | | | |
| Offline Reinforcement Learning at Multiple Frequencies | | | [ ] |
| General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States | | | |
| Behavior Transformers: Cloning k modes with one stone | | | |
| Contrastive Learning as Goal-Conditioned Reinforcement Learning | | | |
| Federated Offline Reinforcement Learning | | | |
| Provable Benefit of Multitask Representation Learning in Reinforcement Learning | | | |
| Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward | | | |
| Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games | | | |
| Offline Reinforcement Learning with Causal Structured World Models | | | |
| Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning | | | |
| Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL | | | |
| Byzantine-Robust Online and Offline Distributed Reinforcement Learning | | | |
| Model Generation with Provable Coverability for Offline Reinforcement Learning | | | |
| You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments | | | |
| Multi-Game Decision Transformers | | | |
| Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning | | | |
| Distance-Sensitive Offline Reinforcement Learning | | | |
| No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL | | | |
| How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation | | | |
| Offline Visual Representation Learning for Embodied Navigation | | | |
| Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers | | | |
| BATS: Best Action Trajectory Stitching | | | |
| Settling the Sample Complexity of Model-Based Offline Reinforcement Learning | | | |
| PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations | | | |
| Offline Reinforcement Learning Under Value and Density-Ratio Realizability: the Power of Gaps | | | |
| Meta Reinforcement Learning for Adaptive Control: An Offline Approach | | | |
| The Efficacy of Pessimism in Asynchronous Q-Learning | | | |
| Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation | | | |
| A Regularized Implicit Policy for Offline Reinforcement Learning | | | |
| Reinforcement Learning in Possibly Nonstationary Environments | | | [ ] |
| Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons | | | |
| VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning | | | |
| Retrieval-Augmented Reinforcement Learning | | | |
| Online Decision Transformer | | | |
| Transferred Q-learning | | | |
| Settling the Communication Complexity for Distributed Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning with Realizability and Single-policy Concentrability | | | |
| Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL | | | |
| Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning | | | |
| Can Wikipedia Help Offline Reinforcement Learning? | | | |
| MOORe: Model-based Offline-to-Online Reinforcement Learning | | | |
| Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning | | | |
| Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning | | | |
| Single-Shot Pruning for Offline Reinforcement Learning | | | |
| Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations | | | [ ] [ ] |
| Data-Driven Offline Decision-Making via Invariant Representation Learning | | | |
| Bellman Residual Orthogonalization for Offline Reinforcement Learning | | | |
| A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP | | | |
| RORL: Robust Offline Reinforcement Learning via Conservative Smoothing | | | |
| On Gap-dependent Bounds for Offline Reinforcement Learning | | | |
| Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus | | | |
| Supported Policy Optimization for Offline Reinforcement Learning | | | |
| When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning | | | |
| Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters | | | |
| When does return-conditioned supervised learning work for offline reinforcement learning? | | | |
| Pessimism for Offline Linear Contextual Bandits using ℓp Confidence Sets | | | |
| RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning | | | |
| When is Offline Two-Player Zero-Sum Markov Game Solvable? | | | |
| Robust Reinforcement Learning using Offline Data | | | |
| Bidirectional Learning for Offline Infinite-width Model-based Optimization | | | |
| Mildly Conservative Q-Learning for Offline Reinforcement Learning | | | |
| Bootstrapped Transformer for Offline Reinforcement Learning | | | |
| LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation | | | |
| Latent-Variable Advantage-Weighted Policy Optimization for Offline RL | | | |
| Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination | | | |
| Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions | | | |
| Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression | | | |
| Dual Generator Offline Reinforcement Learning | | | |
| MoCoDA: Model-based Counterfactual Data Augmentation | | | |
| A Policy-Guided Imitation Approach for Offline Reinforcement Learning | | | [ ] |
| A Unified Framework for Alternating Offline Model Training and Policy Learning | | | |
| Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief | | | |
| S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning | | | |
| ASPiRe:Adaptive Skill Priors for Reinforcement Learning | | | |
| Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning | | | |
| Offline Multi-Agent Reinforcement Learning with Knowledge Distillation | | | |
| Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer | | | |
| Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning | | | |
| Offline RL Policies Should be Trained to be Adaptive | | | |
| Adversarially Trained Actor Critic for Offline Reinforcement Learning | | | |
| Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets | | | |
| How to Leverage Unlabeled Data in Offline Reinforcement Learning | | | |
| Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification | | | |
| Learning Pseudometric-based Action Representations for Offline Reinforcement Learning | | | |
| Offline Meta-Reinforcement Learning with Online Self-Supervision | | | |
| Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching | | | |
| Constrained Offline Policy Optimization | | | |
| Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations | | | |
| Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes | | | |
| Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity | | | |
| Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach | | | |
| Prompting Decision Transformer for Few-Shot Policy Generalization | | | |
| Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning | | | |
| On the Role of Discount Factor in Offline Reinforcement Learning | | | |
| Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics | | | |
| Representation Learning for Online and Offline RL in Low-rank MDPs | | | [ ] |
| Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage | | | [ ] |
| Revisiting Design Choices in Model-Based Offline Reinforcement Learning | | | |
| DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization | | | |
| COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation | | | |
| POETREE: Interpretable Policy Learning with Adaptive Decision Trees | | | |
| Planning in Stochastic Environments with a Learned Model | | | |
| Offline Reinforcement Learning with Value-based Episodic Memory | | | |
| When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? | | | |
| Learning Value Functions from Undirected State-only Experience | | | [ ] [ ] |
| Rethinking Goal-Conditioned Supervised Learning and Its Connection to Offline RL | | | |
| Offline Reinforcement Learning with Implicit Q-Learning | | | |
| RvS: What is Essential for Offline RL via Supervised Learning? | | | |
| Pareto Policy Pool for Model-based Offline Reinforcement Learning | | | |
| CrowdPlay: Crowdsourcing Human Demonstrations for Offline Learning | | | |
| COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks | | | |
| DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning | | | |
| Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism | | | |
| Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning | | | |
| Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization | | | |
| Generalized Decision Transformer for Offline Hindsight Information Matching | | | [ ] |
| Model-Based Offline Meta-Reinforcement Learning with Regularization | | | |
| AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale | | | [ ] |
| Dealing with the Unknown: Pessimistic Offline Reinforcement Learning | | | |
| You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL | | | |
| S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning | | | |
| A Workflow for Offline Model-Free Robotic Reinforcement Learning | | | [ ] |
| Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes | | | [ ] [ ] [ ] |
| Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions | | | |
| Offline Reinforcement Learning with Representations for Actions | | | |
| Towards Off-Policy Learning for Ranking Policies with Logged Feedback | | | |
| Safe Offline Reinforcement Learning Through Hierarchical Policies | | | |
| TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets | | | |
| Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks | | | |
| Model Selection in Batch Policy Optimization | | | |
| Learning Contraction Policies from Offline Data | | | |
| CoMPS: Continual Meta Policy Search | | | |
| MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance | | | |
| Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks | | | |
| Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms | | | |
| Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation | | | [ ] |
| UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning | | | |
| Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning | | | |
| Batch Reinforcement Learning from Crowds | | | |
| SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning | | | |
| Safely Bridging Offline and Online Reinforcement Learning | | | |
| Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information | | | |
| Value Penalized Q-Learning for Recommender Systems | | | |
| Offline Reinforcement Learning with Soft Behavior Regularization | | | |
| Planning from Pixels in Environments with Combinatorially Hard Search Spaces | | | |
| StARformer: Transformer with State-Action-Reward Representations | | | |
| Offline RL With Resource Constrained Online Deployment | | | [ ] |
| Lifelong Robotic Reinforcement Learning by Retaining Experiences | | | [ ] |
| Dual Behavior Regularized Reinforcement Learning | | | |
| DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning | | | [ ] [ ] |
| DROMO: Distributionally Robust Offline Model-based Policy Optimization | | | |
| Implicit Behavioral Cloning | | | |
| Reducing Conservativeness Oriented Offline Reinforcement Learning | | | |
| Policy Gradients Incorporating the Future | | | |
| Offline Decentralized Multi-Agent Reinforcement Learning | | | |
| OPAL: Offline Preference-Based Apprenticeship Learning | | | [ ] |
| Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning | | | |
| Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning | | | |
| The Least Restriction for Offline Reinforcement Learning | | | |
| Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble | | | |
| Causal Reinforcement Learning using Observational and Interventional Data | | | |
| On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data | | | |
| Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL | | | [ ] |
| On Multi-objective Policy Optimization as a Tool for Reinforcement Learning | | | |
| Offline Reinforcement Learning as Anti-Exploration | | | |
| Corruption-Robust Offline Reinforcement Learning | | | |
| Offline Inverse Reinforcement Learning | | | |
| Heuristic-Guided Reinforcement Learning | | | |
| Reinforcement Learning as One Big Sequence Modeling Problem | | | |
| Decision Transformer: Reinforcement Learning via Sequence Modeling | | | |
| Model-Based Offline Planning with Trajectory Pruning | | | |
| InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem | | | |
| Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm | | | [ ] |
| MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale | | | [ ] |
| Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L) | | | |
| Regularized Behavior Value Estimation | | | |
| Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning | | | |
| Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning | | | |
| GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning | | | |
| MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning | | | |
| Continuous Doubly Constrained Batch Reinforcement Learning | | | |
| Q-Value Weighted Regression: Reinforcement Learning with Limited Data | | | |
| Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency | | | |
| Fast Rates for the Regret of Offline Reinforcement Learning | | | [ ] |
| Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment | | | [ ] |
| Weighted Model Estimation for Offline Model-based Reinforcement Learning | | | |
| A Minimalist Approach to Offline Reinforcement Learning | | | |
| Conservative Offline Distributional Reinforcement Learning | | | |
| Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL | | | |
| Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning | | | |
| Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning | | | |
| Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs | | | |
| Offline Reinforcement Learning as One Big Sequence Modeling Problem | | | |
| Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism | | | [ ] |
| Offline Reinforcement Learning with Reverse Model-based Imagination | | | |
| Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies | | | |
| Nearly Horizon-Free Offline Reinforcement Learning | | | |
| Conservative Data Sharing for Multi-Task Offline Reinforcement Learning | | | |
| Online and Offline Reinforcement Learning by Planning with a Learned Model | | | |
| Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning | | | |
| Offline RL Without Off-Policy Evaluation | | | |
| Offline Model-based Adaptable Policy Learning | | | |
| COMBO: Conservative Offline Model-Based Policy Optimization | | | |
| PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators | | | |
| Near-Optimal Offline Reinforcement Learning via Double Variance Reduction | | | |
| Bellman-consistent Pessimism for Offline Reinforcement Learning | | | [ ] |
| The Difficulty of Passive Learning in Deep Reinforcement Learning | | | |
| Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble | | | |
| Towards Instance-Optimal Offline Reinforcement Learning with Pessimism | | | |
| EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL | | | |
| Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills | | | [ ] |
| Is Pessimism Provably Efficient for Offline RL? | | | [ ] |
| Representation Matters: Offline Pretraining for Sequential Decision Making | | | |
| Offline Reinforcement Learning with Pseudometric Learning | | | |
| Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment | | | |
| Offline Contextual Bandits with Overparameterized Models | | | |
| Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning | | | |
| Offline Reinforcement Learning with Fisher Divergence Critic Regularization | | | |
| OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation | | | |
| Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning | | | |
| Vector Quantized Models for Planning | | | |
| Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL | | | [ ] |
| Instabilities of Offline RL with Pre-Trained Neural Representation | | | |
| Offline Meta-Reinforcement Learning with Advantage Weighting | | | |
| Model-Based Offline Planning | | | [ ] |
| Batch Reinforcement Learning Through Continuation Method | | | |
| Model-Based Visual Planning with Self-Supervised Functional Distances | | | |
| Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization | | | |
| Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization | | | |
| DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs | | | |
| What are the Statistical Limits of Offline RL with Linear Function Approximation? | | | [ ] |
| Reset-Free Lifelong Learning with Skill-Space Planning | | | [ ] |
| Risk-Averse Offline Reinforcement Learning | | | |
| Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning | | | |
| Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework | | | |
| Efficient Self-Supervised Data Collection for Offline Robot Learning | | | |
| Boosting Offline Reinforcement Learning with Residual Generative Modeling | | | |
| BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning | | | |
| Behavior Constraining in Weight Space for Offline Reinforcement Learning | | | |
| Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents | | | |
| Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? | | | |
| Reinforcement Learning via Fenchel-Rockafellar Duality | | | [ ] |
| AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | | | [ ] [ ] [ ] |
| Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient | | | |
| A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting | | | |
| Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient | | | |
| Batch Value-function Approximation with Only Realizability | | | |
| DRIFT: Deep Reinforcement Learning for Functional Software Testing | | | |
| Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains | | | |
| Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion | | | [ ] |
| Semi-Supervised Reward Learning for Offline Reinforcement Learning | | | |
| Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation | | | |
| Offline Reinforcement Learning from Images with Latent Space Models | | | [ ] |
| POPO: Pessimistic Offline Policy Optimization | | | |
| Reinforcement Learning with Videos: Combining Offline Observations with Interaction | | | |
| Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones | | | [ ] |
| Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning | | | |
| OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning | | | [ ] |
| Batch Exploration with Examples for Scalable Robotic Reinforcement Learning | | | |
| Learning Dexterous Manipulation from Suboptimal Experts | | | [ ] |
| The Reinforcement Learning-Based Multi-Agent Cooperative Approach for the Adaptive Speed Regulation on a Metallurgical Pickling Line | | | |
| Overcoming Model Bias for Robust Offline Deep Reinforcement Learning | | | [ ] |
| Offline Meta Learning of Exploration | | | |
| EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL | | | |
| Hyperparameter Selection for Offline Reinforcement Learning | | | |
| Interpretable Control by Reinforcement Learning | | | |
| Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning | | | [ ] |
| Accelerating Online Reinforcement Learning with Offline Datasets | | | [ ] [ ] |
| DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction | | | [ ] |
| Critic Regularized Regression | | | |
| Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration | | | |
| Conservative Q-Learning for Offline Reinforcement Learning | | | [ ] [ ] [ ] |
| BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning | | | |
| MOPO: Model-based Offline Policy Optimization | | | [ ] |
| MOReL: Model-Based Offline Reinforcement Learning | | | [ ] |
| Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation | | | |
| Multi-task Batch Reinforcement Learning with Metric Learning | | | |
| Counterfactual Data Augmentation using Locally Factored Dynamics | | | [ ] |
| On Reward-Free Reinforcement Learning with Linear Function Approximation | | | |
| Constrained Policy Improvement for Safe and Efficient Reinforcement Learning | | | |
| BRPO: Batch Residual Policy Optimization | | | [ ] |
| Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning | | | |
| COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning | | | [ ] [ ] [ ] |
| Accelerating Reinforcement Learning with Learned Skill Priors | | | |
| PLAS: Latent Action Space for Offline Reinforcement Learning | | | [ ] [ ] |
| Scaling data-driven robotics with reward sketching and batch reinforcement learning | | | [ ] |
| Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping | | | |
| Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration | | | |
| Behavior Regularized Offline Reinforcement Learning | | | |
| Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift | | | |
| Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning | | | |
| AlgaeDICE: Policy Gradient from Arbitrary Experience | | | |
| Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction | | | [ ] [ ] [ ] |
| Off-Policy Deep Reinforcement Learning without Exploration | | | |
| Safe Policy Improvement with Baseline Bootstrapping | | | |
| Information-Theoretic Considerations in Batch Reinforcement Learning | | | |
| Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents | | | |
| Safe Policy Improvement with Soft Baseline Bootstrapping | | | |
| Importance Weighted Transfer of Samples in Reinforcement Learning | | | |
| Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | | | [ ] |
| Off-Policy Policy Gradient with State Distribution Correction | | | |
| Behavioral Cloning from Observation | | | |
| Diverse Exploration for Fast and Safe Policy Improvement | | | |
| Deep Exploration via Bootstrapped DQN | | | |
| Safe Policy Improvement by Minimizing Robust Baseline Regret | | | |
| Residential Demand Response Applications Using Batch Reinforcement Learning | | | |
| Structural Return Maximization for Reinforcement Learning | | | |
| Simultaneous Perturbation Algorithms for Batch Off-Policy Search | | | |
| Guided Policy Search | | | |
| Off-Policy Actor-Critic | | | |
| PAC-Bayesian Policy Evaluation for Reinforcement Learning | | | |
| Tree-Based Batch Mode Reinforcement Learning | | | |
| Neural Fitted Q Iteration–First Experiences with a Data Efficient Neural Reinforcement Learning Method | | | |
| Off-Policy Temporal-Difference Learning with Function Approximation | | | |
awesome-offline-rl / Papers / Offline RL: Benchmarks/Experiments |
| ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning | | | |
| Pearl: A Production-ready Reinforcement Learning Agent | | | |
| LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models | | | |
| Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning | | | |
| Datasets and Benchmarks for Offline Safe Reinforcement Learning | | | |
| Improving and Benchmarking Offline Reinforcement Learning Algorithms | | | |
| Benchmarks and Algorithms for Offline Preference-Based Reward Learning | | | |
| Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks | | | |
| CORL: Research-oriented Deep Offline Reinforcement Learning Library | | | [ ] |
| Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | | | [ ] |
| Train Offline, Test Online: A Real Robot Learning Benchmark | | | |
| Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation | | | |
| Real World Offline Reinforcement Learning with Realistic Data Source | | | [ ] [ ] |
| Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets | | | |
| B2RL: An open-source Dataset for Building Batch Reinforcement Learning | | | |
| An Empirical Study of Implicit Regularization in Deep Offline RL | | | |
| Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | | | |
| Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning | | | [ ] |
| The Challenges of Exploration for Offline Reinforcement Learning | | | |
| Offline Equilibrium Finding | | | [ ] |
| Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning | | | |
| Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data | | | |
| Dungeons and Data: A Large-Scale NetHack Dataset | | | |
| NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning | | | [ ] [ ] |
| A Closer Look at Offline RL Agents | | | |
| Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis | | | |
| On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning | | | |
| Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters | | | |
| d3rlpy: An Offline Deep Reinforcement Learning Library | | | [ ] |
| Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning | | | [ ] |
| Interpretable performance analysis towards offline reinforcement learning: A dataset perspective | | | |
| Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning | | | |
| RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning | | | [ ] |
| Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning Hands-On | | | |
| D4RL: Datasets for Deep Data-Driven Reinforcement Learning | | | [ ] [ ] [ ] |
| RL Unplugged: Benchmarks for Offline Reinforcement Learning | | | [ ] [ ] |
| Benchmarking Batch Deep Reinforcement Learning Algorithms | | | |
awesome-offline-rl / Papers / Offline RL: Applications |
| MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning | | | |
| P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer | | | |
| Online Symbolic Music Alignment with Offline Reinforcement Learning | | | |
| Advancing RAN Slicing with Offline Reinforcement Learning | | | |
| Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach | | | |
| Self-Driving Telescopes: Autonomous Scheduling of Astronomical Observation Campaigns with Offline Reinforcement Learning | | | |
| A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets | | | |
| STEER: Unified Style Transfer with Expert Reinforcement | | | |
| Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations | | | |
| Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning | | | |
| Offline Reinforcement Learning for Optimizing Production Bidding Policies | | | |
| End-to-end Offline Reinforcement Learning for Glycemia Control | | | |
| Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments | | | |
| Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach | | | |
| Uncertainty-Aware Decision Transformer for Stochastic Driving Environments | | | |
| Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills | | | |
| Robotic Offline RL from Internet Videos via Value-Function Pre-Training | | | |
| VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning | | | |
| RLSynC: Offline-Online Reinforcement Learning for Synthon Completion | | | |
| Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World | | | |
| Reinforced Self-Training (ReST) for Language Modeling | | | |
| Aligning Language Models with Offline Reinforcement Learning from Human Feedback | | | |
| Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation | | | |
| Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills | | | |
| Improving Offline RL by Blending Heuristics | | | |
| IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control | | | |
| Robust Reinforcement Learning Objectives for Sequential Recommender Systems | | | |
| The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning | | | |
| PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning | | | |
| Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure | | | |
| Offline Experience Replay for Continual Offline Reinforcement Learning | | | |
| Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning | | | |
| Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning | | | |
| User Retention-oriented Recommendation with Decision Transformer | | | |
| Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning | | | |
| INVICTUS: Optimizing Boolean Logic Circuit Synthesis via Synergistic Learning and Search | | | |
| Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings | | | |
| Winning Solution of Real Robot Challenge III | | | |
| Learning-based MPC from Big Data Using Reinforcement Learning | | | |
| Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management | | | |
| Beyond Reward: Offline Preference-guided Policy Optimization | | | |
| DevFormer: A Symmetric Transformer for Context-Aware Device Placement | | | |
| On the Effectiveness of Offline RL for Dialogue Response Generation | | | |
| Bidirectional Learning for Offline Model-based Biological Sequence Design | | | |
| ChiPFormer: Transferable Chip Placement via Offline Decision Transformer | | | |
| Semi-Offline Reinforcement Learning for Optimized Text Generation | | | |
| Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement | | | |
| Offline RL for Natural Language Generation with Implicit Language Q Learning | | | |
| Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning | | | |
| Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning | | | |
| Dialog Action-Aware Transformer for Dialog Policy Learning | | | |
| Can Offline Reinforcement Learning Help Natural Language Understanding? | | | |
| NeurIPS 2022 Competition: Driving SMARTS | | | |
| Controlling Commercial Cooling Systems Using Reinforcement Learning | | | |
| Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials | | | [ ] |
| Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning | | | |
| Learning-to-defer for sequential medical decision-making under uncertainty | | | |
| Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios | | | |
| Dialogue Evaluation with Offline Reinforcement Learning | | | |
| Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems | | | |
| A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning | | | |
| BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion | | | |
| Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space | | | |
| Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective | | | |
| ARLO: A Framework for Automated Reinforcement Learning | | | |
| A Reinforcement Learning-based Volt-VAR Control Dataset and Testing Environment | | | |
| CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes | | | [ ] |
| CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System | | | [ ] |
| A Conservative Q-Learning approach for handling distribution shift in sepsis treatment strategies | | | |
| Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning | | | |
| Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit | | | |
| Offline Reinforcement Learning for Mobile Notifications | | | |
| Offline Reinforcement Learning for Road Traffic Control | | | |
| Sustainable Online Reinforcement Learning for Auto-bidding | | | |
| Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare | | | |
| Multi-objective Optimization of Notifications Using Offline Reinforcement Learning | | | |
| Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning | | | |
| GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems | | | |
| Offline Reinforcement Learning for Visual Navigation | | | |
| Semi-Markov Offline Reinforcement Learning for Healthcare | | | |
| Automate Page Layout Optimization: An Offline Deep Q-Learning Approach | | | |
| RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System | | | [ ] [ ] |
| Compressive Features in Offline Reinforcement Learning for Recommender Systems | | | |
| Causal-aware Safe Policy Improvement for Task-oriented dialogue | | | |
| Offline Contextual Bandits for Wireless Network Optimization | | | |
| Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment | | | |
| Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement | | | |
| Medical Dead-ends and Learning to Identify High-risk States and Treatments | | | |
| An Offline Deep Reinforcement Learning for Maintenance Decision-Making | | | |
| Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation | | | |
| Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs | | | |
| Offline reinforcement learning with uncertainty for treatment strategies in sepsis | | | |
| Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL | | | |
| Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles | | | |
| pH-RL: A personalization architecture to bring reinforcement learning to health practice | | | |
| DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning | | | [ ] |
| Personalization for Web-based Services using Offline Reinforcement Learning | | | |
| BCORLE(λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market | | | |
| Safe Driving via Expert Guided Policy Optimization | | | [ ] [ ] |
| A General Offline Reinforcement Learning Framework for Interactive Recommendation | | | |
| Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms | | | |
| Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning | | | |
| Learning robust driving policies without online exploration | | | |
| Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation | | | |
| Network Intrusion Detection Based on Extended RBF Neural Network With Offline Reinforcement Learning | | | |
| Towards Accelerating Offline RL based Recommender Systems | | | |
| Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation | | | |
| Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation | | | |
| An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare | | | |
| Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP | | | |
| Remote Electrical Tilt Optimization via Safe Reinforcement Learning | | | |
| An Optimistic Perspective on Offline Reinforcement Learning | | | [ ] [ ] |
| Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning | | | |
| Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation | | | |
| Human-centric Dialog Training via Offline Reinforcement Learning | | | |
| Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning | | | |
| Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning | | | |
| Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog | | | |
| Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning | | | |
| A Clustering-Based Reinforcement Learning Approach for Tailored Personalization of E-Health Interventions | | | |
| Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming | | | |
| End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient | | | |
| Batch Reinforcement Learning on the Industrial Benchmark: First Experiences | | | |
| Policy Networks with Two-Stage Training for Dialogue Systems | | | |
| Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning | | | |
awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Theory/Methods |
| Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction | | | |
| Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits | | | |
| Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling | | | |
| Multiply Robust Off-policy Evaluation and Learning under Truncation by Death | | | |
| Off-Policy Evaluation of Ranking Policies under Diverse User Behavior | | | |
| Policy-Adaptive Estimator Selection for Off-Policy Evaluation | | | |
| Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits | | | |
| Offline Policy Evaluation in Large Action Spaces via Outcome-Oriented Action Grouping | | | |
| Off-Policy Evaluation for Large Action Spaces via Policy Convolution | | | |
| Distributional Off-Policy Evaluation for Slate Recommendations | | | |
| Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits | | | |
| Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces | | | |
| Offline Policy Evaluation with Out-of-Sample Guarantees | | | |
| Quantile Off-Policy Evaluation via Deep Conditional Generative Learning | | | |
| Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model | | | [ ] |
| Off-Policy Evaluation for Large Action Spaces via Embeddings | | | [ ] [ ] |
| Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning | | | |
| Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions | | | |
| Conformal Off-Policy Prediction in Contextual Bandits | | | |
| Off-Policy Evaluation with Policy-Dependent Optimization Response | | | |
| Off-Policy Evaluation with Deficient Support Using Side Information | | | |
| Towards Robust Off-Policy Evaluation via Human Inputs | | | |
| Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model | | | |
| Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation | | | |
| Anytime-valid off-policy inference for contextual bandits | | | |
| Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency | | | |
| Off-Policy Evaluation in Embedded Spaces | | | |
| Safe Exploration for Efficient Policy Evaluation and Comparison | | | |
| Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias | | | |
| Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning | | | |
| Control Variates for Slate Off-Policy Evaluation | | | |
| Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings | | | |
| Optimal Off-Policy Evaluation from Multiple Logging Policies | | | [ ] |
| Off-policy Confidence Sequences | | | |
| Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting | | | [ ] |
| Off-Policy Evaluation Using Information Borrowing and Context-Based Switching | | | |
| Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation | | | |
| Robust On-Policy Data Collection for Data-Efficient Policy Evaluation | | | |
| Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits | | | |
| Off-Policy Risk Assessment in Contextual Bandits | | | |
| Off-Policy Evaluation of Slate Policies under Bayes Risk | | | |
| A Practical Guide of Off-Policy Evaluation for Bandit Problems | | | |
| Off-Policy Evaluation and Learning for External Validity under a Covariate Shift | | | |
| Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions | | | |
| Doubly robust off-policy evaluation with shrinkage | | | |
| Adaptive Estimator Selection for Off-Policy Evaluation | | | [ ] |
| Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits | | | |
| Improving Offline Contextual Bandits with Distributional Robustness | | | |
| Balanced Off-Policy Evaluation in General Action Spaces | | | |
| Policy Evaluation with Latent Confounders via Optimal Balance | | | |
| On the Design of Estimators for Bandit Off-Policy Evaluation | | | |
| CAB: Continuous Adaptive Blending for Policy Evaluation and Learning | | | |
| Focused Context Balancing for Robust Offline Policy Evaluation | | | |
| When People Change their Mind: Off-Policy Evaluation in Non-Stationary Recommendation Environments | | | |
| Policy Evaluation and Optimization with Continuous Treatments | | | |
| Confounding-Robust Policy Improvement | | | |
| Balanced Policy Evaluation and Learning | | | |
| Offline Evaluation of Ranking Policies with Click Models | | | |
| Effective Evaluation using Logged Bandit Feedback from Multiple Loggers | | | |
| Off-policy Evaluation for Slate Recommendation | | | |
| Optimal and Adaptive Off-policy Evaluation in Contextual Bandits | | | |
| Data-Efficient Policy Evaluation Through Behavior Policy Search | | | |
| Doubly Robust Policy Evaluation and Optimization | | | |
| Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms | | | |
| Distributional Off-policy Evaluation with Bellman Residual Minimization | | | |
| Future-Dependent Value-Based Off-Policy Evaluation in POMDPs | | | |
| Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits | | | |
| State-Action Similarity-Based Representations for Off-Policy Evaluation | | | |
| Off-Policy Evaluation for Human Feedback | | | |
| Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation | | | |
| An Instrumental Variable Approach to Confounded Off-Policy Evaluation | | | |
| Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes | | | |
| Distributional Offline Policy Evaluation with Predictive Error Guarantees | | | |
| The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation | | | |
| Revisiting Bellman Errors for Offline Model Selection | | | [ ] |
| Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction | | | |
| Variational Latent Branching Model for Off-Policy Evaluation | | | |
| Multiple-policy High-confidence Policy Evaluation | | | |
| Off-Policy Evaluation with Online Adaptation for Robot Exploration in Challenging Environments | | | |
| Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation | | | |
| Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards | | | |
| When is Offline Policy Selection Sample Efficient for Reinforcement Learning? | | | |
| Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks | | | |
| Evaluation of Active Feature Acquisition Methods for Static Feature Settings | | | |
| Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework | | | |
| Marginalized Importance Sampling for Off-Environment Policy Evaluation | | | |
| Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning | | | |
| Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments | | | |
| Off-policy Evaluation in Doubly Inhomogeneous Environments | | | |
| Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data | | | |
| π2vec : Policy Representations with Successor Features | | | |
| Conformal Off-Policy Evaluation in Markov Decision Processes | | | |
| Hallucinated Adversarial Control for Conservative Offline Policy Evaluation | | | |
| Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders | | | |
| Minimax Weight Learning for Absorbing MDPs | | | |
| Improving Monte Carlo Evaluation with Offline Data | | | |
| First-order Policy Optimization for Robust Policy Evaluation | | | |
| A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes | | | |
| On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation | | | |
| Learning Bellman Complete Representations for Offline Policy Evaluation | | | |
| Supervised Off-Policy Ranking | | | |
| Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory | | | |
| Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions | | | |
| Oracle Inequalities for Model Selection in Offline Reinforcement Learning | | | |
| Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models | | | |
| Off-Policy Evaluation for Action-Dependent Non-stationary Environments | | | |
| Stateful Offline Contextual Policy Evaluation and Learning | | | |
| Off-Policy Risk Assessment for Markov Decision Processes | | | |
| Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information | | | |
| Offline Policy Evaluation and Optimization under Confounding | | | |
| Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies | | | |
| Safe Evaluation For Offline Learning: Are We Ready To Deploy? | | | |
| Low Variance Off-policy Evaluation with State-based Importance Sampling | | | |
| Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach | | | |
| Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency | | | |
| Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks | | | |
| A Sharp Characterization of Linear Estimators for Offline Policy Evaluation | | | |
| A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets | | | [ ] |
| A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation | | | |
| SOPE: Spectrum of Off-Policy Estimators | | | |
| Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation | | | |
| Variance-Aware Off-Policy Evaluation with Linear Function Approximation | | | |
| Universal Off-Policy Evaluation | | | |
| Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning | | | |
| Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings | | | |
| State Relevance for Off-Policy Evaluation | | | |
| Bootstrapping Fitted Q-Evaluation for Off-Policy Inference | | | |
| Deeply-Debiased Off-Policy Interval Estimation | | | |
| Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization | | | |
| Minimax Model Learning | | | |
| Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders | | | |
| High-Confidence Off-Policy (or Counterfactual) Variance Estimation | | | |
| Debiased Off-Policy Evaluation for Recommendation Systems | | | |
| Pessimistic Model Selection for Offline Deep Reinforcement Learning | | | |
| Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes | | | |
| Off-Policy Evaluation in Partially Observed Markov Decision Processes | | | |
| A Spectral Approach to Off-Policy Evaluation for POMDPs | | | |
| Projected State-action Balancing Weights for Offline Reinforcement Learning | | | s |
| Active Offline Policy Selection | | | |
| On Instrumental Variable Regression for Deep Offline Policy Evaluation | | | |
| Average-Reward Off-Policy Policy Evaluation with Function Approximation | | | |
| Sequential causal inference in a single world of connected units | | | |
| Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding | | | |
| CoinDICE: Off-Policy Confidence Interval Estimation | | | |
| Off-Policy Interval Estimation with Lipschitz Value Iteration | | | |
| Off-Policy Evaluation via the Regularized Lagrangian | | | |
| Minimax Value Interval for Off-Policy Evaluation and Policy Optimization | | | |
| GenDICE: Generalized Offline Estimation of Stationary Values | | | |
| Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies | | | |
| Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation | | | |
| Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning | | | |
| GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values | | | |
| Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation | | | |
| Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions | | | |
| Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation | | | |
| Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling | | | |
| Minimax Weight and Q-Function Learning for Off-Policy Evaluation | | | |
| Accountable Off-Policy Evaluation With Kernel Bellman Statistics | | | |
| Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning | | | |
| Batch Stationary Distribution Estimation | | | |
| Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control | | | [ ] |
| Defining Admissible Rewards for High Confidence Policy Evaluation in Batch Reinforcement Learning | | | |
| Offline Policy Selection under Uncertainty | | | |
| Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning | | | |
| Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies | | | |
| Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning | | | |
| Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation | | | |
| Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning | | | |
| Off-Policy Evaluation in Partially Observable Environments | | | |
| Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning | | | |
| Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling | | | |
| Off-Policy Evaluation via Off-Policy Classification | | | |
| DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections | | | [ ] |
| Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy | | | |
| Batch Policy Learning under Constraints | | | [ ] [ ] |
| More Efficient Off-Policy Evaluation through Regularized Targeted Learning | | | |
| Combining parametric and nonparametric models for off-policy evaluation | | | |
| Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models | | | |
| Importance Sampling Policy Evaluation with an Estimated Behavior Policy | | | |
| Representation Balancing MDPs for Off-policy Policy Evaluation | | | |
| Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation | | | |
| More Robust Doubly Robust Off-policy Evaluation | | | |
| Importance Sampling for Fair Policy Selection | | | |
| Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing | | | |
| Consistent On-Line Off-Policy Evaluation | | | |
| Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation | | | |
| Doubly Robust Off-policy Value Evaluation for Reinforcement Learning | | | |
| Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning | | | |
| High Confidence Policy Improvement | | | |
| High Confidence Off-Policy Evaluation | | | |
| Eligibility Traces for Off-Policy Policy Evaluation | | | |
| Sequential Counterfactual Risk Minimization | | | |
| Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning | | | |
| Multi-Task Off-Policy Learning from Bandit Feedback | | | |
| Exponential Smoothing for Off-Policy Learning | | | |
| Counterfactual Learning with General Data-generating Policies | | | |
| Distributionally Robust Policy Gradient for Offline Contextual Bandits | | | |
| Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits | | | |
| Pessimistic Off-Policy Multi-Objective Optimization | | | |
| Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective | | | |
| Uncertainty-Aware Off-Policy Learning | | | |
| Fair Off-Policy Learning from Observational Data | | | |
| Interpretable Off-Policy Learning via Hyperbox Search | | | |
| Offline Policy Optimization with Eligible Actions | | | |
| Towards Robust Off-policy Learning for Runtime Uncertainty | | | |
| Safe Optimal Design with Applications in Off-Policy Learning | | | |
| Off-Policy Actor-critic for Recommender Systems | | | |
| MGPolicy: Meta Graph Enhanced Off-policy Learning for Recommendations | | | |
| Distributionally Robust Policy Learning with Wasserstein Distance | | | |
| Local Policy Improvement for Recommender Systems | | | |
| Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality | | | |
| Fast Offline Policy Optimization for Large Scale Recommendation | | | |
| Practical Counterfactual Policy Learning for Top-K Recommendations | | | |
| Boosted Off-Policy Learning | | | |
| Semi-Counterfactual Risk Minimization Via Neural Networks | | | |
| IMO^3: Interactive Multi-Objective Off-Policy Optimization | | | |
| Pessimistic Off-Policy Optimization for Learning to Rank | | | |
| Non-Stationary Off-Policy Optimization | | | |
| Learning from eXtreme Bandit Feedback | | | |
| Generalizing Off-Policy Learning under Sample Selection Bias | | | |
| Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values | | | |
| Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies | | | |
| From Importance Sampling to Doubly Robust Policy Gradient | | | |
| Efficient Policy Learning from Surrogate-Loss Classification Reductions | | | [ ] |
| Off-policy Bandits with Deficient Support | | | |
| Off-policy Learning in Two-stage Recommender Systems | | | |
| More Efficient Policy Learning via Optimal Retargeting | | | |
| Learning When-to-Treat Policies | | | |
| Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks | | | |
| Bandit Overfitting in Offline Policy Learning | | | |
| Counterfactual Learning of Continuous Stochastic Policies | | | |
| Top-K Off-Policy Correction for a REINFORCE Recommender System | | | |
| Semi-Parametric Efficient Policy Learning with Continuous Actions | | | |
| Efficient Counterfactual Learning from Bandit Feedback | | | |
| Deep Learning with Logged Bandit Feedback | | | |
| The Self-Normalized Estimator for Counterfactual Learning | | | |
| Counterfactual Risk Minimization: Learning from Logged Bandit Feedback | | | |
awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Benchmarks/Experiments |
| Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation | | | |
| SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation | | | |
| Offline Policy Comparison with Confidence: Benchmarks and Baselines | | | |
| Extending Open Bandit Pipeline to Simulate Industry Challenges | | | |
| Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation | | | [ ] [ ] |
| Evaluating the Robustness of Off-Policy Evaluation | | | [ ] |
| Benchmarks for Deep Off-Policy Evaluation | | | [ ] |
| Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning | | | [ ] |
awesome-offline-rl / Papers / Off-Policy Evaluation and Learning: Applications |
| HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare | | | |
| When is Off-Policy Evaluation Useful? A Data-Centric Perspective | | | |
| Counterfactual Evaluation of Peer-Review Assignment Policies | | | |
| Balanced Off-Policy Evaluation for Personalized Pricing | | | |
| Multi-Action Dialog Policy Learning from Logged User Feedback | | | |
| CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong | | | |
| Reward Shaping for User Satisfaction in a REINFORCE Recommender | | | |
| Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service | | | |
| Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach | | | |
| Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings | | | |
| Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling | | | |
| Offline Evaluation to Make Decisions About Playlist Recommendation | | | |
| Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters | | | |
| Evaluating Reinforcement Learning Algorithms in Observational Health Settings | | | |
| Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems | | | |
| Offline A/B testing for Recommender Systems | | | |
| Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback | | | |
| Handling Confounding for Realistic Off-Policy Evaluation | | | |
| Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising | | | |
awesome-offline-rl / Open Source Software/Implementations |
| SCOPE-RL: A Python library for offline reinforcement learning, off-policy evaluation, and selection | 117 | over 1 year ago | [ ] [ ] [ ] |
| Open Bandit Pipeline: a research framework for bandit algorithms and off-policy evaluation | 648 | over 1 year ago | [ ] [ ] [ ] |
| pyIEOE: Towards An Interpretable Evaluation for Offline Evaluation | 31 | about 4 years ago | [ ] |
| d3rlpy: An Offline Deep Reinforcement Learning Library | 1,349 | 12 months ago | [ ] [ ] [ ] |
| MINERVA: An out-of-the-box GUI tool for data-driven deep reinforcement learning | 98 | over 4 years ago | [ ] [ ] |
| Minari | 310 | 11 months ago | |
| CORL: Clean Offline Reinforcement Learning | 491 | almost 2 years ago | [ ] |
| COBS: Caltech OPE Benchmarking Suite | 61 | over 3 years ago | [ ] |
| Benchmarks for Deep Off-Policy Evaluation | 85 | over 1 year ago | [ ] |
| DICE: The DIstribution Correction Estimation Library | 99 | over 1 year ago | [ ] |
| RL Unplugged: Benchmarks for Offline Reinforcement Learning | 13,329 | 12 months ago | [ ] [ ] |
| D4RL: Datasets for Deep Data-Driven Reinforcement Learning | 1,371 | 12 months ago | [ ] [ ] |
| V-D4RL: Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations | 94 | over 1 year ago | [ } |
| Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | 17 | almost 2 years ago | [ ] |
| RLDS: Reinforcement Learning Datasets | 302 | about 1 year ago | [ ] |
| OEF: Offline Equilibrium Finding | 3 | over 3 years ago | [ ] |
| ExORL: Exploratory Data for Offline Reinforcement Learning | 105 | almost 4 years ago | [ ] |
| RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System | 220 | almost 2 years ago | [ ] ] |
| NeoRL: Near Real-World Benchmarks for Offline Reinforcement Learning | | | [ ] [ ] |
| The Industrial Benchmark Offline RL Datasets | 126 | over 2 years ago | [ ] |
| ARLO: A Framework for Automated Reinforcement Learning | 10 | over 3 years ago | [ ] |
| RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising | 469 | over 4 years ago | [ ] |
| MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces | 51 | almost 2 years ago | [ ] [ ] |
| A Reinforcement Learning-based Volt-VAR Control Dataset | 20 | over 3 years ago | [ ] |
awesome-offline-rl / Blog/Podcast / Blog |
| Counterfactual Evaluation for Recommendation Systems | | | |
| Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications | | | |
| AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | | | |
| D4RL: Building Better Benchmarks for Offline Reinforcement Learning | | | |
| Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning? | | | |
| Tackling Open Challenges in Offline Reinforcement Learning | | | |
| An Optimistic Perspective on Offline Reinforcement Learning | | | |
| Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning | | | |
| Introducing completely free datasets for data-driven deep reinforcement learning | | | |
| Offline (Batch) Reinforcement Learning: A Review of Literature and Applications | | | |
| Data-Driven Deep Reinforcement Learning | | | |
awesome-offline-rl / Blog/Podcast / Podcast |
| AI Trends 2023: Reinforcement Learning – RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine | | | |
| Bandits and Simulators for Recommenders with Olivier Jeunen | | | |
| Sergey Levine on Robot Learning & Offline RL | | | |
| Off-Line, Off-Policy RL for Real-World Decision Making at Facebook | | | |
| Xianyuan Zhan | TalkRL: The Reinforcement Learning Podcast | | | |
| MOReL: Model-Based Offline Reinforcement Learning with Aravind Rajeswaran | | | |
| Trends in Reinforcement Learning with Chelsea Finn | | | |
| Nan Jiang | TalkRL: The Reinforcement Learning Podcast | | | |
| Scott Fujimoto | TalkRL: The Reinforcement Learning Podcast | | | |
| |
| CONSEQUENCES (RecSys 2023) | | | |
| Offline Reinforcement Learning (NeurIPS 2022) | | | |
| Reinforcement Learning for Real Life (NeurIPS 2022) | | | |
| CONSEQUENCES + REVEAL (RecSys 2022) | | | |
| Offline Reinforcement Learning (NeurIPS 2021) | | | |
| Reinforcement Learning for Real Life (ICML 2021) | | | |
| Reinforcement Learning Day 2021 | | | |
| Offline Reinforcement Learning (NeurIPS 2020) | | | |
| Reinforcement Learning from Batch Data and Simulation | | | |
| Reinforcement Learning for Real Life (RL4RealLife 2020) | | | |
| Safety and Robustness in Decision Making (NeurIPS 2019) | | | |
| Reinforcement Learning for Real Life (ICML 2019) | | | |
| Real-world Sequential Decision Making (ICML 2019) | | | |
awesome-offline-rl / Tutorials/Talks/Lectures |
| Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs | | | |
| Counterfactual Evaluation and Learning for Interactive Systems | | | |
| Representation Learning for Online and Offline RL in Low-rank MDPs | | | |
| Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation | | | |
| Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment | | | |
| Deep Reinforcement Learning with Real-World Data | | | |
| Planning with Reinforcement Learning | | | |
| Imitation learning vs. offline reinforcement learning | | | |
| Tutorial on the Foundations of Offline Reinforcement Learning | | | |
| Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances | | | [ ] |
| Offline Reinforcement Learning | | | |
| Offline Reinforcement Learning | | | |
| Fast Rates for the Regret of Offline Reinforcement Learning | | | |
| Bellman-consistent Pessimism for Offline Reinforcement Learning | | | |
| Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage | | | |
| Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism | | | |
| Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm | | | |
| Is Pessimism Provably Efficient for Offline RL? | | | |
| Adaptive Estimator Selection for Off-Policy Evaluation | | | |
| What are the Statistical Limits of Offline RL with Linear Function Approximation? | | | |
| Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL | | | |
| A Gentle Introduction to Offline Reinforcement Learning | | | |
| Principles for Tackling Distribution Shift: Pessimism, Adaptation, and Anticipation | | | |
| Offline Reinforcement Learning: Incorporating Knowledge from Data into RL | | | |
| Offline RL | | | |
| Learning a Multi-Agent Simulator from Offline Demonstrations | | | |
| Towards Reliable Validation and Evaluation for Offline RL | | | |
| Batch RL Models Built for Validation | | | |
| Offline Reinforcement Learning: From Algorithms to Practical Challenges | | | |
| Data Scalability for Robot Learning | | | |
| Statistically Efficient Offline Reinforcement Learning | | | |
| Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning | | | |
| Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation | | | |
| Beyond the Training Distribution: Embodiment, Adaptation, and Symmetry | | | |
| Combining Statistical methods with Human Input for Evaluation and Optimization in Batch Settings | | | |
| Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning | | | |
| Scaling Probabilistically Safe Learning to Robotics | | | |
| Deep Reinforcement Learning in the Real World | | | |