报告题目: Fast Linear Programming Based Approaches for Solving Markov Decision Processes
报 告 人:陈彩华 副教授 南京大学
照 片:
邀 请 人: 刘三阳 教授
报告时间: 2020年11月18日(星期三)14:30-16:30
报告地点: 南校区信远二区206会议室
报告人简介:陈彩华,副教授,南京大学理学博士,新加坡国立大学联合培养博士,曾赴新加坡国立大学、香港中文大学等学习与访问。主持/完成的基金包括国家自然科学基金面上项目、青年项目,江苏省自然科学基金面上项目、青年项目,参与国家自然科学基金重点项目,代表作发表在《Mathematical Programming》,《SIAM Journal on Optimization》,《SIAM Journal on Imaging Science》及CVPR、NIPS等国际知名学术期刊与会议,其中多篇论文入选ESI高被引论文。获华人数学家联盟最佳论文奖(2017、2018连续两年),中国运筹学会青年科技奖(2018),南京大学青年五四奖章(2019),入选首批南京大学仲英青年学者(全校10人,2018)及江苏省社科优青(2019)。
报告摘要:Markov Decision Processes (MDPs) are widely applied in the study of sequential decision making, whose applications are almost endless. In practice, MDPs are often solved using dynamic programming method such as policy iteration and value iteration. Compared with the dynamic programming method, the linear programming based approach for MDPs is powerful in theory but often not quite efficient in practice as a solution method. To improve its practicability, this paper studies the linear programming based approach for MDPs from a computational perspective. Specifically, we consider a discrete stage, infinite horizon discounted MDP, which has a nice property ensuring the existence of deterministic optimal policies. In this respect, we introduce a weakly and a strongly constrained sparse formulation for MDPs to find optimal deterministic policies approximately. By exploiting the sparse structure in the formulations, we propose a class of multi-block alternating direction methods of multipliers (ADMM) to solve MDPs efficiently. The numerical study shows that, ADMM applied to the strongly constrained formulation achieves a comparable runtime as policy iteration, and beats other methods by a large margin. To the best of our knowledge, this is the first linear programming based method that achieves a similar performance as policy iteration. Theoretically, we also prove the convergence of the multi-block ADMM using recent techniques of convex and nonconvex optimization. These findings are encouraging that we take a first step in showing that linear programming based methods can be efficient for solving MDPs, which offers a different perspective, including its elegant theory and great toolbox, to the study of sequential decision making.