Awesome Deep Reinforcement Learning / General guidances |
| Awesome Offline RL | 942 | over 1 year ago | |
| Reinforcement Learning Today | | | |
| Multiagent Reinforcement Learning by Marc Lanctot RLSS @ Lille | | | 11 July 2019 |
| RLDM 2019 Notes by David Abel | | | 11 July 2019 |
| A Survey of Reinforcement Learning Informed by Natural Language | | | 10 Jun 2019 |
| Challenges of Real-World Reinforcement Learning | | | 29 Apr 2019 |
| Ray Interference: a Source of Plateaus in Deep Reinforcement Learning | | | 25 Apr 2019 |
| Principles of Deep RL by David Silver | | | |
| University AI's General introduction to deep rl (in Chinese) | | | |
| OpenAI's spinningup | | | |
| The Promise of Hierarchical Reinforcement Learning | | | 9 Mar 2019 |
| Deep Reinforcement Learning that Matters | | | 30 Jan 2019 |
Awesome Deep Reinforcement Learning / 2024 |
| Foundation Policies with Hilbert Representations | | | 23 Feb 2024 |
Awesome Deep Reinforcement Learning / 2022 |
| arxiv | | | Reinforcement Learning with Action-Free Pre-Training from Videos |
Awesome Deep Reinforcement Learning / Generalist policies |
| Foundation Policies with Hilbert Representations | | | 23 Feb 2024 |
Awesome Deep Reinforcement Learning / Foundations and theory |
| General non-linear Bellman equations | | | 9 July 2019 |
| Monte Carlo Gradient Estimation in Machine Learning | | | 25 Jun 2019 |
Awesome Deep Reinforcement Learning / General benchmark frameworks |
| Brax | 2,397 | 11 months ago | |
| Android-Env | 1,029 | 11 months ago | |
| MuJoCo | | | | |
| Unsupervised RL Benchmark | 335 | about 3 years ago | |
| Dataset for Offline RL | 1,371 | 12 months ago | |
| Spriteworld: a flexible, configurable python-based reinforcement learning environment | 369 | over 5 years ago | |
| Chainerrl Visualizer | 54 | almost 3 years ago | |
| Behaviour Suite for Reinforcement Learning | | | 13 Aug 2019 | |
| Quantifying Generalization in Reinforcement Learning | | | 20 Dec 2018 |
| S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning | | | 25 Sept 2018 |
| dopamine | 10,591 | about 1 year ago | |
| StarCraft II | 8,046 | over 1 year ago | |
| tfrl | 3,136 | almost 3 years ago | |
| chainerrl | 1,179 | over 4 years ago | |
| PARL | 3,296 | over 1 year ago | |
| DI-engine: a generalized decision intelligence engine. It supports various Deep RL algorithms | 3,143 | 11 months ago | |
| PPO x Family: Course in Chinese for Deep RL | 2,017 | over 1 year ago | |
Awesome Deep Reinforcement Learning / Unsupervised |
| URLB: Unsupervised Reinforcement Learning Benchmark | | | 28 Oct 2021 |
| APS: Active Pretraining with Successor Feature | | | 31 Aug 2021 |
| Behavior From the Void: Unsupervised Active Pre-Training | | | 8 Mar 2021 |
| Reinforcement Learning with Prototypical Representations | | | 22 Feb 2021 |
| Efficient Exploration via State Marginal Matching | | | 12 Jun 2019 |
| Self-Supervised Exploration via Disagreement | | | 10 Jun 2019 |
| Exploration by Random Network Distillation | | | 30 Oct 2018 |
| Diversity is All You Need: Learning Skills without a Reward Function | | | 16 Feb 2018 |
| Curiosity-driven Exploration by Self-supervised Prediction | | | 15 May 2017 |
Awesome Deep Reinforcement Learning / Offline |
| PerSim: Data-efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators | | | 10 Nov 2021 |
| A General Offline Reinforcement Learning Framework for Interactive Recommendation | | | AAAI 2021 |
Awesome Deep Reinforcement Learning / Value based |
| Harnessing Structures for Value-Based Planning and Reinforcement Learning | | | 5 Feb 2020 | |
| Recurrent Value Functions | | | 23 May 2019 |
| Stochastic Lipschitz Q-Learning | | | 24 Apr 2019 |
| TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning | | | 8 Mar 2018 |
| DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY | | | 2 Mar 2018 |
| Rainbow: Combining Improvements in Deep Reinforcement Learning | | | 6 Oct 2017 |
| Learning from Demonstrations for Real World Reinforcement Learning | | | 12 Apr 2017 |
| Dueling Network Architecture | | | |
| Double DQN | | | |
| Prioritized Experience | | | |
| Deep Q-Networks | | | |
Awesome Deep Reinforcement Learning / Policy gradient |
| Phasic Policy Gradient | | | 9 Sep 2020 |
| An operator view of policy gradient methods | | | 22 Jun 2020 |
| Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces | | | 14 Jun 2019 |
| Policy Gradient Search: Online Planning and Expert Iteration without Search Trees | | | 7 Apr 2019 |
| SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT LEARNING | | | 24 Dec 2018 |
| PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation | | | 5 Oct 2018 |
| Clipped Action Policy Gradient | | | 22 June 2018 |
| Expected Policy Gradients for Reinforcement Learning | | | 10 Jan 2018 |
| Proximal Policy Optimization Algorithms | | | 20 July 2017 |
| Emergence of Locomotion Behaviours in Rich Environments | | | 7 July 2017 |
| Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning | | | 1 Jun 2017 |
| Equivalence Between Policy Gradients and Soft Q-Learning | | | |
| Trust Region Policy Optimization | | | |
| Reinforcement Learning with Deep Energy-Based Policies | | | |
| Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC | | | |
Awesome Deep Reinforcement Learning / Explorations |
| Entropic Desired Dynamics for Intrinsic Control | | | 2021 |
| Self-Supervised Exploration via Disagreement | | | 10 Jun 2019 |
| Approximate Exploration through State Abstraction | | | 24 Jan 2019 |
| The Uncertainty Bellman Equation and Exploration | | | 15 Sep 2017 |
| Noisy Networks for Exploration | | | 30 Jun 2017 |
| Count-Based Exploration in Feature Space for Reinforcement Learning | | | 25 Jun 2017 |
| Count-Based Exploration with Neural Density Models | | | 14 Jun 2017 |
| UCB and InfoGain Exploration via Q-Ensembles | | | 11 Jun 2017 |
| Minimax Regret Bounds for Reinforcement Learning | | | 16 Mar 2017 |
| Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models | | | |
| EX2: Exploration with Exemplar Models for Deep Reinforcement Learning | | | |
Awesome Deep Reinforcement Learning / Actor-Critic |
| Generalized Off-Policy Actor-Critic | | | 27 Mar 2019 |
| Soft Actor-Critic Algorithms and Applications | | | 29 Jan 2019 |
| The Reactor: A Sample-Efficient Actor-Critic Architecture | | | 15 Apr 2017 |
| SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY | | | |
| REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS | | | |
| Continuous control with deep reinforcement learning | | | |
Awesome Deep Reinforcement Learning / Model-based |
| Self-Consistent Models and Values | | | 25 Oct 2021 |
| When to use parametric models in reinforcement learning? | | | 12 Jun 2019 |
| Model Based Reinforcement Learning for Atari | | | 5 Mar 2019 |
| Model-Based Stabilisation of Deep Reinforcement Learning | | | 6 Sep 2018 |
| Learning model-based planning from scratch | | | 19 July 2017 |
Awesome Deep Reinforcement Learning / Model-free + Model-based |
| Imagination-Augmented Agents for Deep Reinforcement Learning | | | 19 July 2017 |
Awesome Deep Reinforcement Learning / Hierarchical |
| WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING? | | | 23 Sep 2019 |
| Language as an Abstraction for Hierarchical Deep Reinforcement Learning | | | 18 Jun 2019 |
Awesome Deep Reinforcement Learning / Option |
| Variational Option Discovery Algorithms | | | 26 July 2018 |
| A Laplacian Framework for Option Discovery in Reinforcement Learning | | | 16 Jun 2017 |
Awesome Deep Reinforcement Learning / Connection with other methods |
| Robust Imitation of Diverse Behaviors | | | |
| Learning human behaviors from motion capture by adversarial imitation | | | |
| Connecting Generative Adversarial Networks and Actor-Critic Methods | | | |
Awesome Deep Reinforcement Learning / Connecting value and policy methods |
| Bridging the Gap Between Value and Policy Based Reinforcement Learning | | | |
| Policy gradient and Q-learning | | | |
Awesome Deep Reinforcement Learning / Reward design |
| End-to-End Robotic Reinforcement Learning without Reward Engineering | | | 16 Apr 2019 |
| Reinforcement Learning with Corrupted Reward Channel | | | 23 May 2017 |
Awesome Deep Reinforcement Learning / Unifying |
| Multi-step Reinforcement Learning: A Unifying Algorithm | | | |
Awesome Deep Reinforcement Learning / Faster DRL |
| Neural Episodic Control | | | |
Awesome Deep Reinforcement Learning / Multi-agent |
| No Press Diplomacy: Modeling Multi-Agent Gameplay | | | 4 Sep 2019 |
| Options as responses: Grounding behavioural hierarchies in multi-agent RL | | | 6 Jun 2019 |
| Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination | | | 18 Jun 2019 |
| A Regularized Opponent Model with Maximum Entropy Objective | | | 17 May 2019 |
| Deep Q-Learning for Nash Equilibria: Nash-DQN | | | 23 Apr 2019 |
| Malthusian Reinforcement Learning | | | 3 Mar 2019 |
| Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning | | | 4 Nov 2018 |
| INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE IN MULTI-AGENT RL | | | 19 Oct 2018 |
| QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning | | | 30 Mar 2018 |
| Modeling Others using Oneself in Multi-Agent Reinforcement Learning | | | 26 Feb 2018 |
| The Mechanics of n-Player Differentiable Games | | | 15 Feb 2018 |
| Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments | | | 10 Oct 2017 |
| Learning with Opponent-Learning Awareness | | | 13 Sep 2017 |
| Counterfactual Multi-Agent Policy Gradients | | | |
| Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments | | | 7 Jun 2017 |
| Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games | | | 29 Mar 2017 |
Awesome Deep Reinforcement Learning / New design |
| IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | | | 9 Feb 2018 |
| Reverse Curriculum Generation for Reinforcement Learning | | | |
| Trial without Error: Towards Safe Reinforcement Learning via Human Intervention | | | |
| Learning to Design Games: Strategic Environments in Deep Reinforcement Learning | | | 5 July 2017 |
Awesome Deep Reinforcement Learning / Multitask |
| Kickstarting Deep Reinforcement Learning | | | 10 Mar 2018 |
| Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning | | | 7 Nov 2017 |
| Distral: Robust Multitask Reinforcement Learning | | | 13 July 2017 |
Awesome Deep Reinforcement Learning / Observational Learning |
| Observational Learning by Reinforcement Learning | | | 20 Jun 2017 |
| |
| Discovery of Useful Questions as Auxiliary Tasks | | | 10 Sep 2019 |
| Meta-learning of Sequential Strategies | | | 8 May 2019 |
| Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables | | | 19 Mar 2019 |
| Some Considerations on Learning to Explore via Meta-Reinforcement Learning | | | 11 Jan 2019 |
| Meta-Gradient Reinforcement Learning | | | 24 May 2018 |
| ProMP: Proximal Meta-Policy Search | | | 16 Oct 2018 |
| Unsupervised Meta-Learning for Reinforcement Learning | | | 12 Jun 2018 |
Awesome Deep Reinforcement Learning / Distributional |
| GAN Q-learning | | | 20 July 2018 |
| Implicit Quantile Networks for Distributional Reinforcement Learning | | | 14 Jun 2018 |
| Nonlinear Distributional Gradient Temporal-Difference Learning | | | 20 May 2018 |
| DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS | | | 23 Apr 2018 |
| An Analysis of Categorical Distributional Reinforcement Learning | | | 22 Feb 2018 |
| Distributional Reinforcement Learning with Quantile Regression | | | 27 Oct 2017 |
| A Distributional Perspective on Reinforcement Learning | | | 21 July 2017 |
Awesome Deep Reinforcement Learning / Planning |
| Search on the Replay Buffer: Bridging Planning and Reinforcement Learning | | | 12 June 2019 |
Awesome Deep Reinforcement Learning / Safety |
| Robust Reinforcement Learning for Continuous Control with Model Misspecification | | | 18 Jun 2019 |
| Verifiable Reinforcement Learning via Policy Extraction | | | 22 May 2018 |
Awesome Deep Reinforcement Learning / Inverse RL |
| ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS IN INVERSE REINFORCEMENT LEARNING | | | 9 Sep 2018 |
Awesome Deep Reinforcement Learning / No reward RL |
| Fast Task Inference with Variational Intrinsic Successor Features | | | 2 Jun 2019 |
| Curiosity-driven Exploration by Self-supervised Prediction | | | 15 May 2017 |
Awesome Deep Reinforcement Learning / Time |
| Interval timing in deep reinforcement learning agents | | | 31 May 2019 |
| Time Limits in Reinforcement Learning | | | |
Awesome Deep Reinforcement Learning / Adversarial learning |
| Sample-efficient Adversarial Imitation Learning from Observation | | | 18 Jun 2019 |
Awesome Deep Reinforcement Learning / Use Natural Language |
| Using Natural Language for Reward Shaping in Reinforcement Learning | | | 31 May 2019 |
Awesome Deep Reinforcement Learning / Generative and contrastive representation learning |
| Unsupervised State Representation Learning in Atari | | | 19 Jun 2019 |
Awesome Deep Reinforcement Learning / Belief |
| Shaping Belief States with Generative Environment Models for RL | | | 24 Jun 2019 |
Awesome Deep Reinforcement Learning / PAC |
| Provably Convergent Off-Policy Actor-Critic with Function Approximation | | | 11 Nov 2019 |
Awesome Deep Reinforcement Learning / Applications |
| Benchmarks for Deep Off-Policy Evaluation | | | 30 Mar 2021 |
| Learning Reciprocity in Complex Sequential Social Dilemmas | | | 19 Mar 2019 |
| DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills | | | 9 Apr 2018 |
| TUNING RECURRENT NEURAL NETWORKS WITH REINFORCEMENT LEARNING | | | |