From Audio Reasoning to Emotionally Expressive Generation: Toward Next-Generation Audio Intelligence
-
Date
31 Mar 2026
-
Organiser
Department of Land Surveying and Geo-Informatics (LSGI) & Research Institute for Land and Space (RILS)
-
Time
14:30 - 15:30
-
Venue
N002 Map
Speaker
Prof. Li LIU
Remarks
Prof. Xintao LIU, Associate Professor, LSGI, member of RILS
Summary
Recent progress in large audio-language models has greatly improved audio perception, yet two important capabilities remain underexplored: deep reasoning over complex acoustic scenes and fine-grained emotionally expressive audio generation. These capabilities are crucial for building next-generation audio AI systems that can not only recognize sounds, but also interpret, explain, and generate them in a more human-like way.
In this talk, I will present our recent work along these two directions. On the understanding side, I will introduce AudioDeepThinker, a reinforcement learning framework for audio-grounded chain-of-thought reasoning, and AsymAudio, a capabilityaware multi-agent framework for deep audio reasoning. On the generation side, I will present EmoSteer-TTS, a training-free method for fine-grained and continuous emotion control in text-to-speech. I will also briefly introduce PhyAVBench, a benchmark for evaluating the physical grounding ability of audio-visual generation models. Together, these works point toward a new generation of audio AI systems that can reason over complex auditory scenes and generate expressive, controllable, and physically grounded audio.
Keynote Speaker
Prof. Li LIU
Assistant Professor
Hong Kong University of Science and Technology (Guangzhou)
Li LIU is an Assistant Professor at the Hong Kong University of Science and Technology (Guangzhou). She received her Ph.D. from Université Grenoble Alpes (GIPSA-lab), France, and was previously a postdoctoral researcher at Ryerson University, Canada. Her research focuses on speech processing, audio-visual understanding and generation, and trustworthy AI. She has published more than 70 papers as first or corresponding author in leading journals and conferences, including IEEE TPAMI, IEEE TMM, IEEE TASLP, NeurIPS. She currently serves as Chair of the Member Nominations and Election Subcommittee of the IEEE Machine Learning for Signal Processing Technical Committee (MLSP TC), and has served as Local Chair for ICASSP 2022 (China site) and Area Chair for ICASSP 2024-2026. She has led multiple competitive research projects, including the NSFC General Program, NSFC Young Scientist Fund, NSFC Key Project subtask, the CCF-Tencent Rhino-Bird Research Fund, the CCF–Kuaishou Large Model Explorer Fund, and the Alibaba Innovation Research Program. Her work has received multiple honors, including the IEEE Multimedia Signal Processing Rising Star Runner-up Award, the Sephora Berribi Women in Science Award (France), and the CCFTencent Rhino-Bird Research Excellence Award. Her team won First Place in the Audio Reasoning Challenge (Single-Model Track) at Interspeech 2026.