Please note: This seminar will be given online.
Ruosong Wang
School of Computer Science, Carnegie Mellon University
In recent years, reinforcement learning algorithms have achieved strong empirical success on a wide variety of real-world problems. However, these algorithms usually require a huge number of samples even just for solving simple tasks. It is unclear if there are fundamental statistical limits on such methods, or such sample complexity burden can be alleviated by a better algorithm. In this talk, I will give an overview of my research efforts towards bridging the gap between the theory and the practice of reinforcement learning.
In the first part of the talk, I will show that under conditions that permit sample-efficient supervised learning, any offline reinforcement learning algorithm still requires exponential number of samples information-theoretically, due to a geometric amplification of the estimation error. Moreover, through extensive experiments on a range of tasks, I will show that substantial error amplification does occur in practical scenarios. Our results highlight a crucial difference between offline reinforcement learning and supervised learning. I will conclude this part by suggesting possible ways to improve the performance of practical reinforcement learning systems based on our new insights.
In the second part of the talk, I will focus on the horizon-dependence of the sample complexity of tabular reinforcement learning. I will show the first tabular reinforcement learning algorithm whose sample complexity is completely independent of the horizon length. Our result resolves a fundamental open problem in reinforcement learning theory.
Bio: Ruosong Wang is currently a Ph.D. student at Carnegie Mellon University, advised by Prof. Ruslan Salakhutdinov. He did his undergraduate study at Yao Class, Tsinghua University. He has also spent time at Simons Institute and Microsoft Research. He is broadly interested in the theory and the practice of modern machine learning paradigms with a focus on reinforcement learning.