About me

Ian Osband

I lead the Science of Post Training team at Google DeepMind in London. My research is on reinforcement learning, with a particular focus on exploration: how to make good decisions when you do not yet know the world. For a longer take, see this podcast interview.

Before this, I spent two years at OpenAI working on reinforcement learning and reasoning for ChatGPT. Earlier, I was a staff research scientist at Google DeepMind, first in London and then in San Francisco helping build the efficient agent team. I completed my Ph.D. at Stanford University under Benjamin Van Roy. My thesis Deep Exploration via Randomized Value Functions won second place in the Dantzig dissertation award.

Before Stanford, I studied maths at Oxford and worked as a credit derivatives strategist for J.P. Morgan in London and New York. For the latest, see my twitter.

Research Highlights

For full list see Google Scholar.

Talk RL Podcast

Talk RL Podcast

Interview & Research Overview

Conversation with Robin Chauhan about my research. Great place to start for a big picture overview.

Epistemic Neural Networks

Epistemic Neural Networks

NeurIPS 2023 Spotlight

Scalable uncertainty in deep learning. Better performance than an ensemble of size=100 at cost <2 particles.

The Neural Testbed

The Neural Testbed: Evaluating Joint Predictions

NeurIPS 2022 Spotlight

Benchmark for the prediction quality of approximate posterior inference in deep learning. Popular Bayesian deep learning approaches do not do well.

Deep Exploration via Randomized Value Functions

Deep Exploration via Randomized Value Functions

JMLR 2019

Journal paper that embodies the best pieces of research from my PhD and other pieces of work from our group.

A Tutorial on Thompson Sampling

A Tutorial on Thompson Sampling

Foundations and Trends in Machine Learning 2020

A thoughtful tutorial around Thompson sampling to balance exploration with exploitation.

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

ICML 2016 (full oral), EWRL 2016

Computational results show PSRL dramatically outperforms UCRL2. We provide insight into why.

Behaviour Suite for Reinforcement Learning

Behaviour Suite for Reinforcement Learning

ICLR 2020 Spotlight

Methodical benchmark for the scaling properties of RL algorithms, with open source code.

Reinforcement Learning, Bit by Bit

Reinforcement Learning, Bit by Bit

Foundations and Trends in Machine Learning 2023

An elegant perspective on reinforcement learning through the lens of information theory.

Want more?

Follow the links below.

Or, check out my CV.