News | PolyU Academy for Artificial Intelligence (PAAI)

PAAI co-hosted the the 19th International Congress on Logistics and SCM Systems (ICLS 2025) on AI-empowered logistics and supply chain.

PAAI co-hosted the 19th International Congress on Logistics and SCM Systems (ICLS 2025) on AI-empowered logistics and supply chain. The conference featured a rich program, including panel discussions, workshops, and special sessions, with contributions from academia, industry, and management agencies. Attendees engaged in dynamic exchanges of ideas, fostering interdisciplinary collaboration to enhance the efficiency, resilience, and sustainability of global supply chains. With its high-level discussions and networking opportunities, the conference successfully attracted worldwide attention from scholars and industry talents, reinforcing Hong Kong’s role as a hub for innovation and international exchange. By bridging academia and industry, the event facilitated meaningful dialogue on AI-driven solutions for resilient and sustainable supply chains. Prof. Ming Li, Assistant Director of PAAI, remarked, "This conference exemplified the power of AI in revolutionizing logistics and SCM. Through our partnership, we have strengthened the synergy between research and real-world applications, paving the way for smarter global supply chains." The organizers extended their gratitude to all speakers, attendees, sponsors, and partners for making the event a milestone in the field. With its impactful discussions and networking opportunities, the congress concluded on a high note, setting the stage for future innovations.

9 Jul, 2025

Scholarly Engagement

Prof. Hongxia Yang's Team Unlocks SOTA Multimodal Math Reasoning with Small Language Models

In the field of artificial intelligence, large language models (LLMs) have made remarkable strides in reasoning capabilities. However, when these capabilities are extended to multimodal scenarios—where models must process both text and images—researchers face considerable challenges. These challenges are especially pronounced for small multimodal language models with limited parameter sizes. A research team led by Professor Hongxia Yang at The Hong Kong Polytechnic University has proposed a training framework called Infi-MMR, which leverages an innovative three-phase reinforcement learning strategy. This framework successfully unlocks the multimodal reasoning potential of small language models, achieving state-of-the-art (SOTA) performance across several mathematical reasoning benchmarks—even surpassing some larger models in the process. The team's findings are detailed in their recent preprint titled “Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models”, now available on arXiv. The paper lists Zeyu Liu, a research assistant at The Hong Kong Polytechnic University, and Yuhang Liu, a master's student at Zhejiang University, as co-first authors. Professor Hongxia Yang is the corresponding author. The team aims to extend rule-based reinforcement learning achievements from the text domain (such as those from DeepSeek-R1) to the multimodal domain, while addressing inherent challenges in multimodal reinforcement learning. Small language models (SLMs), due to their limited number of parameters, face three core challenges: Low-Quality Multimodal Reasoning Data Rule-based reinforcement learning requires verifiable answers. However, most multimodal tasks focus on image captioning, description, or visual question answering, which lack rigorous reasoning elements. Existing datasets rarely offer complex reasoning tasks paired with verifiable outputs. Degradation of Core Reasoning Abilities When multimodal LLMs integrate visual and textual data, they often compromise their core reasoning skills—a problem especially severe in smaller models. Moreover, the complexity of cross-modal fusion can disrupt structured reasoning, leading to reduced task performance. Complex but Unreliable Reasoning Paths When trained directly on multimodal data using reinforcement learning, models tend to generate overly complex and often inaccurate reasoning processes. The Infi-MMR framework addresses these issues through its three-stage curriculum learning approach: Stage 1: Foundational Reasoning Activation Instead of using multimodal inputs directly, this phase uses high-quality textual reasoning data to activate the model's reasoning capabilities through reinforcement learning. This approach builds a solid logical reasoning foundation and mitigates the degradation seen in standard multimodal models. Stage 2: Cross-Modal Reasoning Adaptation With the foundation in place, this phase gradually transitions the model to the multimodal domain using question-answer pairs supplemented with explanatory textual information. This helps the model adapt its reasoning skills to handle multimodal inputs. Stage 3: Multimodal Reasoning Enhancement To simulate real-world multimodal scenarios—where image descriptions may be missing—this stage removes textual hints and trains the model to perform reasoning directly from raw visual inputs. This reduces linguistic bias and promotes robust multimodal reasoning. Notably, the team introduced caption-augmented multimodal data, which aids the model in transferring its text-based reasoning skills to multimodal contexts and enables more reliable cross-modal reasoning. Using the Infi-MMR framework, the team fine-tuned Qwen2.5-VL-3B into Infi-MMR-3B, a small multimodal model focused on mathematical reasoning. The results are striking: On the MathVerse benchmark—which spans domains like algebra and geometry—Infi-MMR-3B achieved 43.68% accuracy, outperforming models of the same scale and even surpassing some 8-billion-parameter models. On the MathVista benchmark, which assesses comprehensive reasoning ability, it achieved 67.2% accuracy, a 3.8% improvement over the baseline. Impressively, its performance on MathVerse is approaching that of proprietary models such as GPT-4o (39.4%). These achievements validate the effectiveness of the Infi-MMR framework and demonstrate the successful transfer of reasoning capabilities to the multimodal domain. The team emphasizes that while Infi-MMR-3B is tailored for mathematical reasoning, its core reasoning abilities are generalizable to other fields that require complex decision-making, such as education, healthcare, and autonomous driving. Looking ahead, the team will continue exploring ways to enhance reasoning in multimodal models, aiming to empower small models with robust and transferable reasoning capabilities.

24 Jun, 2025

PAAI Research Achievement

Prof. Yang Hongxia Wins the Prestigious Funding Support from RAISe+ Scheme

We are proud to announce that our professor, Prof. YANG Hongxia, Executive Director of PolyU Academy for Artificial Intelligence, Associate Dean (Global Engagement) of Faculty of Computer and Mathematical Sciences, has been selected for funding support under the second batch of the Research, Academic and Industry Sectors One-plus (RAISe+) Scheme. This recognition underscores Prof. Yang’s exceptional research achievements and her leadership in advancing innovation in computing technologies. Prof. Yang’s funded project, titled “Reallm: World-leading Enterprise GenAI Infrastructure Solution”, aims to develop a comprehensive Generative Artificial Intelligence (GenAI) infrastructure tailored for enterprise applications. The project will: Establish a decentralised architecture for pretraining and post-training to support distributed model training frameworks; Develop a domain-adaptive continual training system that optimises large language models using domain-specific unlabelled data, enabling seamless adaptation to target domain distributions; Design a low-bit training framework that requires only half the computational and storage resources of traditional training, while still achieving high-quality, end-to-end training from pretraining to post-training—significantly lowering the entry barrier for enterprises. Ultimately, the project will launch an enterprise-grade GenAI platform to facilitate cross-domain collaboration, offering services across Software-as-a-Service, Platform-as-a-Service, and Infrastructure-as-a-Service. Inaugurated in 2023, the RAISe+ Scheme aims to provide funding, on a matching basis, for at least 100 research teams from universities funded by the University Grants Committee which demonstrate strong potential to evolve into successful startups. Each approved project will receive funding support ranging from HK$10 million to HK$100 million.

20 Jun, 2025

Funding & Donation

Small Team, Big Breakthrough: Prof. Yang Hongxia’s Team Overcomes the Three Major Barriers to Large Model Fusion with InfiFusion Framework

With just 100 GPUs, scientists have dismantled the “three walls” of large model fusion, making it possible to build stronger models from any open-source foundation. This breakthrough comes from the team led by Prof. Hongxia Yang, and is being hailed as a milestone in scalable, efficient large model integration. I. Breaking Through the “Three Walls” of Large Model Fusion According to the team, early attempts at large model fusion in the AI community often revolved around naïvely “stitching together” the parameters of multiple models. However, this approach quickly ran into three major barriers: Distillation mismatch caused by differing vocabularies across models. Semantic noise resulting from conflicting styles among multiple teacher models. Persistent concerns over values and safety even after capabilities were distilled. To address this, the team introduced a three-part fusion strategy: InfiFusion: Tackles the vocabulary mismatch using Universal Logit Distillation (ULD) with Top-K selection and logit standardization, achieving stable and effective cross-vocabulary distillation with minimal computational cost. InfiGFusion: Recognizes that aligning probability distributions is not enough—teacher models often encode different “syntactic skeletons.” This method treats logits as graphs and uses the Gromov-Wasserstein distance to perform structure-level alignment, resolving the second barrier. InfiFPO: Focuses on preference alignment in the final stage using a modified RLHF (Reinforcement Learning from Human Feedback) framework. By introducing multi-source probability fusion, length normalization, and probability truncation, it ensures the resulting model is not only capable and coherent but also safe and aligned with human values. “The trilogy of papers was designed to strengthen the three pillars of fusion: capability, structure, and value,” the team explained. II. From “Reinforcing Foundations” to “Correcting Course” Why were the three papers released in the order of distillation → structure → preference, rather than bundled together? According to the team, this reflects the rhythm of reinforcing foundations before correcting course. Initially, the team set out to fuse the strengths of three stylistically distinct teacher models—Qwen-Coder, Qwen-Instruct, and Mistral-Small—into a central model, Phi-4. But their first experiments revealed a major roadblock: vocabulary mismatches. The same Chinese idiom would be tokenized completely differently by each teacher, often using obscure suffix tokens. They focused first on the foundational distillation problem. In InfiFusion, they systematically swept the Top-K parameter and found that K = 10 captured almost all probability mass while minimizing gradient noise. They also applied Z-score standardization to logits before distillation, allowing the student model to focus on relative rankings rather than absolute values. “These technical details may seem trivial, but they’re what turn a ‘working’ distillation into a robust one,” the team noted. Once capability was firmly established, the next hurdle emerged: conflicting reasoning structures. For instance, in a multi-step reasoning task, one teacher model might filter sets first before calculating values, while another does the reverse. Though probabilities aligned, the solution paths clashed. InfiGFusion addressed this by modeling logits as graph structures and aligning them using Gromov-Wasserstein distance, helping the student learn not just probabilities but reasoning chains. With capability and structure integrated, they turned to preference alignment, a stage often ignored in model fusion. Existing techniques like RLHF and DPO focus on optimizing outputs using human preference data but don’t consider how to fuse preferences from multiple teacher models. To solve this, InfiFPO fuses probabilistic preferences from all teachers, applies length normalization and max-margin stabilization, and achieves safer, more aligned outputs. As a result, the fused Phi-4 model improved its aggregate score from 79.95 to 83.33. “We didn’t split the trilogy just for show—each stage exposed new bottlenecks that informed the next step,” the team said. “Every improvement fed directly into the following phase.” They also recalled the night they finalized the distillation loss function. After testing over 20 loss variants—from temperature-scaled KL divergence to OT-based Wasserstein-KL hybrids—they realized the flashy methods couldn’t scale due to memory and time constraints on large models. Ultimately, they returned to a more elegant and practical solution: Universal Logit Distillation (ULD) loss, which converges faster than KL and boosts training speed by nearly 30%, without increasing GPU memory usage. III. Building a Fused Phi-4 in 20 Hours—Democratizing Model Fusion for SMEs In practical terms, the team reports that using an 8×H800 NVIDIA server, it took only 20 hours to transform Phi-4 into a fused version using their pipeline. On math reasoning tasks (GSM8K and MATH), the fused Phi-4 achieved 3% higher accuracy than the standalone InfiFusion model. In code generation, its pass rate improved by about 2%. In multi-turn instruction following, refusal rates dropped dramatically—from nearly 50% to under 10%. Most importantly, compute costs fell from millions of GPU hours to just a hundred, enabling smaller teams to integrate “expert collectives” into a single model deployable even on an 80GB GPU. Two main application routes have emerged: Vertical industries like finance, healthcare, and law, which have proprietary expert models but need a unified generalist interface. The three-step fusion packs capability, structure, and values without requiring shared weights. Small and medium enterprises (SMEs) with limited compute and annotation resources. With this pipeline, they can simply plug in open-source teacher models and a small amount of domain-specific data to obtain a “custom expert team.” Looking ahead, the team aims to extend this approach beyond text models into vision and speech, allowing cross-modal fusion through the same streamlined pipeline. They are also working on tensor-level plug-and-play distillation, reducing inference costs to under 70% of the original model—making it feasible for mobile deployment. Will “fusion” become a product? The answer is yes. Prof. Yang’s team has already developed a “Fuse-as-a-Service” middleware platform, where users can upload models and minimal domain data, and the system automatically completes the three-stage pipeline, returning a lightweight fused model. “We’re currently piloting with three industry partners and aiming for a public beta of PI next year,” the team told DeepTech. In their view, the ultimate future of large models may not lie in training a single all-knowing behemoth—but in fusing thousands of specialized experts into one unified force. “Our InfiFusion series is just the first brick laid,” they concluded. “The true path to infinite fusion still lies ahead.”

12 Jun, 2025

PAAI Research Achievement

PAAI co-hosted the inaugural PolyU Master Lecture by Prof. Zhang Wenhong, Director of the National Medical Centre for Infectious Diseases.

The PolyU Academy for Artificial Intelligence (PAAI) co-hosted the inaugural PolyU Master Lecture by Prof. Zhang Wenhong, Director of the National Medical Centre for Infectious Diseases and Head of the Institute of Infection and Health at Fudan University. In his keynote speech, titled “The Race between Evolving Infectious Diseases and Human Technology,” Prof. Zhang shared insights on how medical innovation and technology can rapidly anticipate and counteract the challenges posed by the unpredictable progress of infectious diseases before the next pandemic emerges. Co-organised with the Department of Health Technology and Informatics (HTI) and the PolyU Academy for Interdisciplinary Research (PAIR), the event attracted approximately 450 participants, including PolyU faculty members, students, alumni, healthcare professionals and members of the public. At the event, Prof. Jin-Guang TENG, PolyU President, expressed gratitude to Prof. Zhang for sharing his profound insights on the prevention and control of infectious diseases, which enriched participants’ understanding. He remarked, “During the COVID-19 pandemic, the virus genome monitoring system developed by a PolyU research team became a pivotal tool for the HKSAR Government’s precise pandemic response. Prof. Zhang, a globally respected expert in infectious disease control, currently serves in key leadership roles at Fudan University and has been appointed as an Honorary Professor at PolyU’s Department of Health Technology and Informatics. He also serves on the Expert Advisory Committee for the proposed medical school. Widely recognised for his contributions during the COVID-19 pandemic, Prof. Zhang has received multiple national accolades for his work in infectious disease prevention and medical innovation. The lecture concluded with a Q&A session moderated by Prof. YANG Hongxia, Executive Director of PAAI, and Prof. Gilman SIU of HTI. Prof. Zhang engaged in a lively exchange with the audience, sparking thoughtful discussion and inspiring all attendees.

28 May, 2025

Scholarly Engagement

Prof. Hongxia Yang’s Project “Enhancing Edge-based Foundation Models for Advanced Reasoning” Approved under Cyberport Artificial Intelligence Subsidy Scheme

Professor Yang Hongxia, Associate Dean (Global Engagement) of the Faculty of Computer and Mathematical Sciences and Professor in the Department of Computing at The Hong Kong Polytechnic University (PolyU), has announced a major milestone in AI research and healthcare innovation. Professor Yang's team has developed the project “Enhancing Edge-based Foundation Models for Advanced Reasoning,” which leverages the computing power of Cyberport’s Artificial Intelligence Supercomputing Centre (AISC). The project adopts an innovative approach by integrating high-quality small language models to efficiently train large-scale models. This strategy significantly reduces dependency on centralized computing infrastructure while improving the accuracy of generated information by 28%. The research has also made substantial progress in medical application scenarios, particularly in the field of cancer treatment. The team is currently working in close collaboration with leading hospitals in Hong Kong and Mainland China, applying vertical large models and domain-specific models in tandem with supercomputing resources to enhance data analysis for oncology. This AI-driven approach enables more accurate and localized treatment planning, effectively reducing the need for complex diagnostic tests, and thus alleviating the physical and psychological burden on patients. Furthermore, it brings substantial efficiency gains by saving human resources and reducing time costs associated with clinical testing. This development underscores PolyU’s commitment to interdisciplinary AI innovation with tangible societal impact, particularly in healthcare. The initiative exemplifies how edge AI and federated modeling can help democratize access to powerful AI tools while addressing real-world challenges.

30 Apr, 2025

PAAI Research Achievement

Exploring Decentralized Artificial Intelligence: Advancing the Democratization of GenAI

With the rapid development of Generative AI (GenAI) technologies—such as Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Stable Diffusion—AI is increasingly permeating and transforming various industries, including life sciences, energy, finance, and entertainment. These technological breakthroughs not only accelerate innovation and enable personalized services but also significantly improve the efficiency of workflows. According to market forecasts, the global GenAI market is expected to grow from USD 40 billion in 2022 to USD 1.3 trillion over the next decade. Challenges to Widespread Adoption and Strategic Countermeasures Despite its promise, the widespread adoption of GenAI faces substantial challenges. One of the most pressing issues is the concentration of GPU resources among major technology firms, which restricts the capabilities of research institutions and enterprises in developing their own models. Many organizations are forced to rely on API-based solutions, which not only introduce latency and security risks but also limit the customizability of models. Although open-source models offer some flexibility, they are often not sufficiently adaptable to domain-specific knowledge, hindering deep engagement by researchers in the pretraining phase, a critical stage for creating powerful and domain-aligned models. In response, The Hong Kong Polytechnic University is pioneering an innovative GenAI infrastructure that enables enterprises and applications to independently pretrain their own GenAI models. This is achieved through a novel "Model over Models" (MoM) methodology to build foundation models. Specifically, global knowledge is divided into thousands of domains, with relatively lightweight Small Language Models (SLMs) trained for each. These smaller models demand far fewer resources—e.g., a 7-billion-parameter model can be continually pretrained using just 64 to 128 GPUs. Eventually, these SLMs can be integrated via the MoM framework to construct affordable and scalable Artificial General Intelligence (AGI) models, significantly lowering barriers to entry and enabling global participation in foundation model development. For more details, please visit: https://www.stheadline.com/knowledge/3406043/

30 Nov, 2024

Event & Publicity

Prof. Yang Hongxia, Director of PAAI: " Aims to Drive World-Class AI Innovation in the Greater Bay Area”

On November 19, the Boao Forum for Asia Youth Forum 2024 Hong Kong Conference was held at the Hong Kong Convention and Exhibition Centre under the theme "Leading the Future: The Role and Contribution of Youth." Focusing on Artificial Intelligence (AI) and climate change, the event gathered over 30 experts and scholars from around the world. Through roundtable discussions and international youth dialogues, the forum aimed to advance Hong Kong’s AI industry and foster cross-border exchanges and collaboration among young people across Asia. At the event, Professor Yang Hongxia, a renowned AI scientist and faculty member at The Hong Kong Polytechnic University, was interviewed by Guangzhou Daily. Professor Yang highlighted the promising prospects and robust talent pool for AI development in Hong Kong. She expressed her hope to collaborate with institutions and enterprises in the Guangdong-Hong Kong-Macao Greater Bay Area to achieve world-leading technological breakthroughs in AI. Professor Yang previously served as Chief Data Scientist at Yahoo and has held senior research and leadership roles at IBM, Alibaba DAMO Academy, and ByteDance. She has authored over 100 publications in top-tier conferences and journals, and holds more than 50 patents in the U.S. and China. She joined PolyU in July this year. Commenting on her transition from industry to academia, Professor Yang candidly shared her motivation to address the barriers to entry in training large-scale AI models. “Training a large model often requires thousands of high-end GPUs over extended periods of time—resources that are far beyond the reach of universities and startups. Only tech giants have the capacity to do this,” she noted. Despite the fact that more than one million professionals work in the AI field globally, fewer than 1,000 have access to the core processes of large model development. “Technology evolution inevitably moves from centralization to decentralization—just like how early computers were massive machines, and now everyone can own one. This is the direction I hope to see,” she emphasized. Regarding her decision to relocate to Hong Kong, Professor Yang expressed strong confidence in the city’s future in AI development. She cited Hong Kong’s world-class education system, which nurtures high-caliber talent for the AI sector, as well as the government’s strong policy and financial support for AI initiatives. “Having recently participated in several project applications, I can say the support has been incredibly beneficial,” she remarked. She also praised policies such as the Top Talent Pass Scheme, which she believes will attract more top-tier AI professionals and scholars from around the world to work and conduct research in Hong Kong. “When talent and policy come together, it creates an ideal environment for AI development. This synergy is extremely valuable,” she added. During her time in industry, Professor Yang had already collaborated extensively with leading universities in mainland China. Since relocating to Hong Kong, she has continued to foster academic partnerships, including with Southern University of Science and Technology (SUSTech) and local institutions such as The University of Hong Kong and The Hong Kong University of Science and Technology. “Looking ahead, we hope to work closely with schools and enterprises across the Greater Bay Area to achieve world-class outcomes in artificial intelligence research and innovation,” she concluded. For more details, please view via: https://huacheng.gz-cmc.com/pages/2024/11/20/86858f7b766c4daf9422b1bac1e954de.html

21 Nov, 2024

Media Coverage