Statistically and computationally efficient reinforcement learning
If you want to make good decisions from data, you need good data. Traditional statistics and machine learning has made great progress in learning form fixed datasets, but even an optimal learning algorithm for learning from a fixed dataset can be arbitrarily bad when the decisions it makes affect the data it gets. I'm trying to design algorithms which can learn to take good actions (and so may affect its environment) in a manner which is simultaneously computationally tractable and statistically efficient.
For full list see google scholar.
Statistically efficient RL requires "deep exploration".
Previous approaches to deep exploration have not been computationally tractable beyond small scale problems.
This dissertation presents an alternative approach through the use of randomized value functions.
Journal paper that takes the best parts of my PhD thesis, and lots of other pieces of work from our group over the past few years.
Bootstrapping for uncertainty estimates has an obvious flaw in "sparse reward" environments... why wouldn't you just always predict zero?
We provide a simple solution: add a random untrainable "prior function" to each ensemble member.
Matches Bayesian inference for linear systems, good empirical results scaling to Montezuma's revenge!
Imagine you have a team of agents working in parallel, how can you do efficient exploration for the group as a whole?
Turns out that "randomized value functions" scale to this domain in a natural way... if you do it in the right way!
An introduction to, and overview of, the Thompson sampling principle: what does it mean, what are its benefits and how can it be applied? A great overview of what this algorithm really does, without getting too drawn into "regret bounds" and analysis... lot's of good examples!
We very rarely want to deploy an RL system without any prior knowledge of how to behave.
We look at a simple + scalable approach to incorporating expert knowledge with deep RL.
This leads to many state of the art scores on Atari 2600 games.
Some of the previously published results for posterior sampling without episodic reset are incorrect.
This note clarifies some of the issues in this space and presents some conjectures towards future solutions.
A previously published proof for the lower bounds on what is possible for any reinforcement learning algorithm are incorrect.
posterior sampling without episodic reset is incorrect.
This note clarifies some of the issues in this space and presents some further conjectures on what might be true in this space.
Computational results demonstrate that PSRL dramatically outperforms UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it.
Deep exploration and deep reinforcement learning. Takes the insight from efficient exploration via randomized value functions and attains state of the art results on Atari. Includes some sweet vids.
A principled approach to efficient exploration with generalization that can be implemented for deep learning models at scale. Use an augmented bootstrap to approximate the posterior distribution.
You can combine efficient exploration and generalization, all without a model-based planning step. Some cool empirical results and also some theory. My favorite paper.
The first general analysis of model based RL in terms of the dimensionality, rather than the cardinality, of the system. Several new state of the art results including linear systems.
If the environment is a structured graph (aka factored MDP), then you can exploit that to learn quickly. You can adapt UCB-style approaches for this, posterior sampling gets it for free.
You don't need to use loose UCB-style algorithms to get regret bounds for reinforcement learning. Posterior sampling is more efficient in terms of computation and data and shares similar gaurantees.
We apply deep learning techniques to energy load forecasting across 20 geographic regions.
We found that recurrent network architectures were particularly suited to this task.
Class project for CS 229 in my first quarter at Stanford.
Follow the links below.
Or, check out my CV.