Skip to main content Start main content

Risk-Sensitive Markov Decision Processes and Reinforcement Learning

Research Seminar Series

20211111_Li Xia_Event Banner
  • Date

    11 Nov 2021

  • Organiser

    Department of Industrial and Systems Engineering, PolyU

  • Time

    10:00 - 11:20

  • Venue

    Online via ZOOM  


Prof. Li Xia


Meeting link will be sent to successful registrants

20211111_Li Xia_Poster


Reinforcement learning (RL) has been receiving intensive research attention since the significant success of AlphaGo. Markov decision processes (MDPs) are used as the mathematical model of RL. However, most of the current RL and MDP studies focus on the optimization objective of cumulative discounted rewards. Risk-related objectives are also important for many practical systems, such as the risk management in finance. In this talk, we will introduce some theoretical results on the long-run variance optimization problems in the framework of MDPs, where variance is a widely used metric for measuring risk. Because of the quadratic form of variance function, the long-run variance cost function depends on the whole policy and is not Markovian. The long-run variance optimization problem is not a standard MDP model and the classical Bellman optimality equation does not hold. We study this problem from a new perspective called the sensitivity-based optimization, which help us derive some new advances: by defining a pseudo-variance quantity, we derive a variance difference formula which has an elegant form to quantify the difference of long-run variances under any two policies. Based on the variance difference formula, we obtain a necessary and sufficient condition for the local optimum in the mixed policy space, which is only a necessary condition for the global optimum. We further derive the so-called Bellman local optimality equation and the optimality of deterministic policies is also proved. We further develop a policy iteration type algorithm to minimize the long-run variance and its local convergence is also proved. Finally, we extend the theoretical results to RL algorithmic studies and apply them to several practical problems, including the power fluctuation reduction of wind and battery storage system and the portfolio management problem in financial engineering. The latest research discovers that our work is able to extend to MDPs with CVaR (Conditional Value-at-Risk) objectives.

Keynote Speaker

Prof. Li Xia

Prof. Li Xia

Professor, School of Business
Sun Yat-Sen University 
Guangzhou, China.


Li Xia is a professor with the Business School, Sun Yat-Sen University, Guangzhou, China. He received the Bachelor and the Ph.D. degree in control theory both from Tsinghua University, Beijing, China, in 2002 and 2007, respectively. After PhD graduation, he worked at IBM Research China and the King Abdullah University of Science and Technology (KAUST) Saudi Arabia. Then he returned to Tsinghua University as a faculty in 2011. In 2019, he joined Sun Yat-Sen University as a full professor. He was a visiting scholar at Stanford University, the Hong Kong University of Science and Technology, etc. He serves as an associate editor of IEEE Transactions on Automation Science and Engineering, Discrete Event Dynamic Systems, etc. His research interests include the methodology research in Markov decision processes, reinforcement learning, queueing theory, and the application research in energy systems, financial technology, etc. Read More

Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here