Skip to main content Start main content

From Audio Reasoning to Emotionally Expressive Generation: Toward Next-Generation Audio Intelligence

20260331_2
  • Date

    31 Mar 2026

  • Organiser

    Department of Land Surveying and Geo-Informatics (LSGI) & Research Institute for Land and Space (RILS)

  • Time

    14:30 - 15:30

  • Venue

    N002 Map  

Speaker

Prof. Li LIU

Remarks

Prof. Xintao LIU, Associate Professor, LSGI, member of RILS

Summary

Recent progress in large audio-language models has greatly improved audio perception, yet two important capabilities remain underexplored: deep reasoning over complex acoustic scenes and fine-grained emotionally expressive audio generation. These capabilities are crucial for building next-generation audio AI systems that can not only recognize sounds, but also interpret, explain, and generate them in a more human-like way.

In this talk, I will present our recent work along these two directions. On the understanding side, I will introduce AudioDeepThinker, a reinforcement learning framework for audio-grounded chain-of-thought reasoning, and AsymAudio, a capabilityaware multi-agent framework for deep audio reasoning. On the generation side, I will present EmoSteer-TTS, a training-free method for fine-grained and continuous emotion control in text-to-speech. I will also briefly introduce PhyAVBench, a benchmark for evaluating the physical grounding ability of audio-visual generation models. Together, these works point toward a new generation of audio AI systems that can reason over complex auditory scenes and generate expressive, controllable, and physically grounded audio. 

POSTER

Keynote Speaker

Prof. Li LIU

Assistant Professor

Hong Kong University of Science and Technology (Guangzhou) 

Li LIU is an Assistant Professor at the Hong Kong University of Science and Technology (Guangzhou). She received her Ph.D. from Université Grenoble Alpes (GIPSA-lab), France, and was previously a postdoctoral researcher at Ryerson University, Canada. Her research focuses on speech processing, audio-visual understanding and generation, and trustworthy AI. She has published more than 70 papers as first or corresponding author in leading journals and conferences, including IEEE TPAMI, IEEE TMM, IEEE TASLP, NeurIPS. She currently serves as Chair of the Member Nominations and Election Subcommittee of the IEEE Machine Learning for Signal Processing Technical Committee (MLSP TC), and has served as Local Chair for ICASSP 2022 (China site) and Area Chair for ICASSP 2024-2026. She has led multiple competitive research projects, including the NSFC General Program, NSFC Young Scientist Fund, NSFC Key Project subtask, the CCF-Tencent Rhino-Bird Research Fund, the CCF–Kuaishou Large Model Explorer Fund, and the Alibaba Innovation Research Program. Her work has received multiple honors, including the IEEE Multimedia Signal Processing Rising Star Runner-up Award, the Sephora Berribi Women in Science Award (France), and the CCFTencent Rhino-Bird Research Excellence Award. Her team won First Place in the Audio Reasoning Challenge (Single-Model Track) at Interspeech 2026.

Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here