WNCG Seminar: Statistical State Compression and Primal-Dual Pi Learning

Seminar
Friday, January 25, 2019
11:00am - 12:00pm
EER 3.646 - Blaschke Conference Room

*PLEASE NOTE CORRECTION: Seminar will take place in EER 3.646 (North Tower)

This talk focuses on the statistical sample complexity and model reduction of Markov decision process (MDP).  We begin by surveying recent advances on the complexity for solving MDP, without any dimension reduction. In the first part we study the statistical state compression of general Markov processes. We propose a spectral state compression method for learning state features and aggregation structures from data. The state compression method is able to “ sketch” a black-box Markov process from its empirical data, for which both minimax statistical guarantees and scalable computational tools are provided. In the second part, we propose a bilinear primal-dual pi learning method for finding the optimal policy, which utilizes given state and action features. The method is motivated from a saddle point formulation of the Bellman equation. Its sample complexity depends only on the number of parameters and is variant with respect to the dimension of the problem, making high-dimensional reinforcement learning possible using “small” data.  

Speaker

Assistant Professor
Princeton University

Mengdi Wang is an assistant professor at the Department of Operations Research and Financial Engineering at Princeton University. She is also affiliated with the Department of Computer Science and Princeton’s Center for Statistics and Machine Learning. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, and the Google Faculty Award. She is currently serving as an associate editor for Operations Research.