Deep Reinforcement Learning for Aging Cheese Inventory Management Under Correlated Prices

Distinguished Research Seminar Series

Add to Calendar

Date

09 Mar 2026
Organiser

Department of Industrial and Systems Engineering, PolyU
Time

15:30 - 17:00
Venue

Online via ZOOM

Speaker

Prof. Martin Grunow

Remarks

Meeting link will be sent to successful registrants. If you have enquiries regarding E-certificate after the seminar, please contact david.kuo@polyu.edu.hk.

Summary

In aging cheese inventory management, producers must jointly consider auto-regressive and correlated prices for raw milk and for age-differentiated cheese products. Purchasing decisions determine the youngest inventory additions, while production and issuance decisions balance immediate revenues against further maturation. Issuance flexibility arises because the cheese age label represents a minimum maturation time, allowing deliberate excess.

We develop a generic Markov decision process model for ameliorating inventory systems that captures auto-regressive and correlated purchase and sales prices. Because the state and action spaces grow exponentially with the number of age classes, we solve the problem using deep reinforcement learning (DRL). However, standard actor-critic DRL algorithms struggle with the high-dimensional action spaces induced by issuance flexibility. We therefore introduce and compare three strategies to address this challenge. First, we consider reducing the action space dimensionality by curtailing issuance flexibility. The second approach reduces dimensionality via actor pipelining, allowing issuance actions to be determined by simple decision rules. Third, we analyze multi-agent actor designs that distribute learning across multiple neural networks.

Our results show that actor pipelining provides the best trade-off between policy quality and computational effort, outperforming multi-agent designs. Compared to a rolling-horizon planning benchmark, our DRL policies increase profits by 16.7% on average. Moreover, modeling cheese-specific price dynamics yields a 7.1% profit improvement relative to policies based on simplified price assumptions. Full factorial experiments reveal that, in contrast to prior work, the benefit of issuance flexibility is limited under cheese-specific price processes and becomes substantial only in highly volatile, fast mean-reverting environments.

Keynote Speaker

Prof. Martin Grunow

Professor
School of Management, Technical University of Munich, Germany

Martin Grunow optimizes production systems and supply chains with data-driven methods from operations research, mathematics, and computer science. His research covers the automotive and electronics industries and various branches of the process industry, including the chemical, pharmaceutical, and food sectors. His work ranges from AI-based real-time control of unit operations to the configuration of production and logistics systems and the robust design of global value chains with stochastic optimization. Martin Grunow studied industrial engineering at TU Berlin, where he earned his PhD on the optimization of electronics assembly lines. He later conducted research at Degussa AG, a producer of specialty chemicals, and returned to TU Berlin to complete his postdoctoral teaching qualification (habilitation). After several international appointments, he joined TU Denmark, where he was head of the Operations Management Department. Since 2010, he has been Professor of Production and Supply Chain Management at Technical University of Munich. Martin Grunow teaches MOOCs on Quality Engineering and Lean to more than 400,000 participants worldwide. He is head of the German Operations Research Society’s Supply Chain Management Section, a co-author of more than 50 publications in leading international research journals, and a co-author of a textbook on Advanced Planning Systems. He was an editor of three international research journals.

Previous Event Next Event