From Audio Reasoning to Emotionally Expressive Generation: Toward Next-Generation Audio Intelligence

Add to Calendar

Date

31 Mar 2026
Organiser

Department of Land Surveying and Geo-Informatics (LSGI) & Research Institute for Land and Space (RILS)
Time

14:30 - 15:30
Venue

N002 Map

Speaker

Prof. Li LIU

Remarks

Prof. Xintao LIU, Associate Professor, LSGI, member of RILS

Summary

Recent progress in large audio-language models has greatly improved audio perception, yet two important capabilities remain underexplored: deep reasoning over complex acoustic scenes and fine-grained emotionally expressive audio generation. These capabilities are crucial for building next-generation audio AI systems that can not only recognize sounds, but also interpret, explain, and generate them in a more human-like way.

In this talk, I will present our recent work along these two directions. On the understanding side, I will introduce AudioDeepThinker, a reinforcement learning framework for audio-grounded chain-of-thought reasoning, and AsymAudio, a capabilityaware multi-agent framework for deep audio reasoning. On the generation side, I will present EmoSteer-TTS, a training-free method for fine-grained and continuous emotion control in text-to-speech. I will also briefly introduce PhyAVBench, a benchmark for evaluating the physical grounding ability of audio-visual generation models. Together, these works point toward a new generation of audio AI systems that can reason over complex auditory scenes and generate expressive, controllable, and physically grounded audio.

POSTER

Keynote Speaker

Prof. Li LIU

Assistant Professor

Hong Kong University of Science and Technology (Guangzhou)

View Speaker's Website

Li LIU is an Assistant Professor at the Hong Kong University of Science and Technology (Guangzhou). She received her Ph.D. from Université Grenoble Alpes (GIPSA-lab), France, and was previously a postdoctoral researcher at Ryerson University, Canada. Her research focuses on speech processing, audio-visual understanding and generation, and trustworthy AI. She has published more than 70 papers as first or corresponding author in leading journals and conferences, including IEEE TPAMI, IEEE TMM, IEEE TASLP, NeurIPS. She currently serves as Chair of the Member Nominations and Election Subcommittee of the IEEE Machine Learning for Signal Processing Technical Committee (MLSP TC), and has served as Local Chair for ICASSP 2022 (China site) and Area Chair for ICASSP 2024-2026. She has led multiple competitive research projects, including the NSFC General Program, NSFC Young Scientist Fund, NSFC Key Project subtask, the CCF-Tencent Rhino-Bird Research Fund, the CCF–Kuaishou Large Model Explorer Fund, and the Alibaba Innovation Research Program. Her work has received multiple honors, including the IEEE Multimedia Signal Processing Rising Star Runner-up Award, the Sephora Berribi Women in Science Award (France), and the CCFTencent Rhino-Bird Research Excellence Award. Her team won First Place in the Audio Reasoning Challenge (Single-Model Track) at Interspeech 2026.

Previous Event Next Event