Skip to main content Start main content

News

Updated 12 Jun 2025

Small Team, Big Breakthrough: Prof. Yang Hongxia’s Team Overcomes the Three Major Barriers to Large Model Fusion with InfiFusion Framework

With just 100 GPUs, scientists have dismantled the “three walls” of large model fusion, making it possible to build stronger models from any open-source foundation. This breakthrough comes from the team led by Prof. Hongxia Yang, and is being hailed as a milestone in scalable, efficient large model integration. I. Breaking Through the “Three Walls” of Large Model Fusion According to the team, early attempts at large model fusion in the AI community often revolved around naïvely “stitching together” the parameters of multiple models. However, this approach quickly ran into three major barriers: Distillation mismatch caused by differing vocabularies across models. Semantic noise resulting from conflicting styles among multiple teacher models. Persistent concerns over values and safety even after capabilities were distilled. To address this, the team introduced a three-part fusion strategy: InfiFusion: Tackles the vocabulary mismatch using Universal Logit Distillation (ULD) with Top-K selection and logit standardization, achieving stable and effective cross-vocabulary distillation with minimal computational cost. InfiGFusion: Recognizes that aligning probability distributions is not enough—teacher models often encode different “syntactic skeletons.” This method treats logits as graphs and uses the Gromov-Wasserstein distance to perform structure-level alignment, resolving the second barrier. InfiFPO: Focuses on preference alignment in the final stage using a modified RLHF (Reinforcement Learning from Human Feedback) framework. By introducing multi-source probability fusion, length normalization, and probability truncation, it ensures the resulting model is not only capable and coherent but also safe and aligned with human values. “The trilogy of papers was designed to strengthen the three pillars of fusion: capability, structure, and value,” the team explained. II. From “Reinforcing Foundations” to “Correcting Course” Why were the three papers released in the order of distillation → structure → preference, rather than bundled together? According to the team, this reflects the rhythm of reinforcing foundations before correcting course. Initially, the team set out to fuse the strengths of three stylistically distinct teacher models—Qwen-Coder, Qwen-Instruct, and Mistral-Small—into a central model, Phi-4. But their first experiments revealed a major roadblock: vocabulary mismatches. The same Chinese idiom would be tokenized completely differently by each teacher, often using obscure suffix tokens. They focused first on the foundational distillation problem. In InfiFusion, they systematically swept the Top-K parameter and found that K = 10 captured almost all probability mass while minimizing gradient noise. They also applied Z-score standardization to logits before distillation, allowing the student model to focus on relative rankings rather than absolute values. “These technical details may seem trivial, but they’re what turn a ‘working’ distillation into a robust one,” the team noted. Once capability was firmly established, the next hurdle emerged: conflicting reasoning structures. For instance, in a multi-step reasoning task, one teacher model might filter sets first before calculating values, while another does the reverse. Though probabilities aligned, the solution paths clashed. InfiGFusion addressed this by modeling logits as graph structures and aligning them using Gromov-Wasserstein distance, helping the student learn not just probabilities but reasoning chains. With capability and structure integrated, they turned to preference alignment, a stage often ignored in model fusion. Existing techniques like RLHF and DPO focus on optimizing outputs using human preference data but don’t consider how to fuse preferences from multiple teacher models. To solve this, InfiFPO fuses probabilistic preferences from all teachers, applies length normalization and max-margin stabilization, and achieves safer, more aligned outputs. As a result, the fused Phi-4 model improved its aggregate score from 79.95 to 83.33. “We didn’t split the trilogy just for show—each stage exposed new bottlenecks that informed the next step,” the team said. “Every improvement fed directly into the following phase.” They also recalled the night they finalized the distillation loss function. After testing over 20 loss variants—from temperature-scaled KL divergence to OT-based Wasserstein-KL hybrids—they realized the flashy methods couldn’t scale due to memory and time constraints on large models. Ultimately, they returned to a more elegant and practical solution: Universal Logit Distillation (ULD) loss, which converges faster than KL and boosts training speed by nearly 30%, without increasing GPU memory usage. III. Building a Fused Phi-4 in 20 Hours—Democratizing Model Fusion for SMEs In practical terms, the team reports that using an 8×H800 NVIDIA server, it took only 20 hours to transform Phi-4 into a fused version using their pipeline. On math reasoning tasks (GSM8K and MATH), the fused Phi-4 achieved 3% higher accuracy than the standalone InfiFusion model. In code generation, its pass rate improved by about 2%. In multi-turn instruction following, refusal rates dropped dramatically—from nearly 50% to under 10%. Most importantly, compute costs fell from millions of GPU hours to just a hundred, enabling smaller teams to integrate “expert collectives” into a single model deployable even on an 80GB GPU. Two main application routes have emerged: Vertical industries like finance, healthcare, and law, which have proprietary expert models but need a unified generalist interface. The three-step fusion packs capability, structure, and values without requiring shared weights. Small and medium enterprises (SMEs) with limited compute and annotation resources. With this pipeline, they can simply plug in open-source teacher models and a small amount of domain-specific data to obtain a “custom expert team.” Looking ahead, the team aims to extend this approach beyond text models into vision and speech, allowing cross-modal fusion through the same streamlined pipeline. They are also working on tensor-level plug-and-play distillation, reducing inference costs to under 70% of the original model—making it feasible for mobile deployment. Will “fusion” become a product? The answer is yes. Prof. Yang’s team has already developed a “Fuse-as-a-Service” middleware platform, where users can upload models and minimal domain data, and the system automatically completes the three-stage pipeline, returning a lightweight fused model. “We’re currently piloting with three industry partners and aiming for a public beta of PI next year,” the team told DeepTech. In their view, the ultimate future of large models may not lie in training a single all-knowing behemoth—but in fusing thousands of specialized experts into one unified force. “Our InfiFusion series is just the first brick laid,” they concluded. “The true path to infinite fusion still lies ahead.”  

12 Jun, 2025

PAAI Research Results

Updated 28 May 2025

PAAI co-hosted the inaugural PolyU Master Lecture by Prof. Zhang Wenhong, Director of the National Medical Centre for Infectious Diseases.

The PolyU Academy for Artificial Intelligence (PAAI) co-hosted the inaugural PolyU Master Lecture by Prof. Zhang Wenhong, Director of the National Medical Centre for Infectious Diseases and Head of the Institute of Infection and Health at Fudan University. In his keynote speech, titled “The Race between Evolving Infectious Diseases and Human Technology,” Prof. Zhang shared insights on how medical innovation and technology can rapidly anticipate and counteract the challenges posed by the unpredictable progress of infectious diseases before the next pandemic emerges. Co-organised with the Department of Health Technology and Informatics (HTI) and the PolyU Academy for Interdisciplinary Research (PAIR), the event attracted approximately 450 participants, including PolyU faculty members, students, alumni, healthcare professionals and members of the public.   At the event, Prof. Jin-Guang TENG, PolyU President, expressed gratitude to Prof. Zhang for sharing his profound insights on the prevention and control of infectious diseases, which enriched participants’ understanding. He remarked, “During the COVID-19 pandemic, the virus genome monitoring system developed by a PolyU research team became a pivotal tool for the HKSAR Government’s precise pandemic response. Prof. Zhang, a globally respected expert in infectious disease control, currently serves in key leadership roles at Fudan University and has been appointed as an Honorary Professor at PolyU’s Department of Health Technology and Informatics. He also serves on the Expert Advisory Committee for the proposed medical school. Widely recognised for his contributions during the COVID-19 pandemic, Prof. Zhang has received multiple national accolades for his work in infectious disease prevention and medical innovation. The lecture concluded with a Q&A session moderated by Prof. YANG Hongxia, Executive Director of PAAI, and Prof. Gilman SIU of HTI. Prof. Zhang engaged in a lively exchange with the audience, sparking thoughtful discussion and inspiring all attendees.

28 May, 2025

PAAI Scholarly Engagement

Updated 30 Apr 2025

Prof. Hongxia Yang’s Project “Enhancing Edge-based Foundation Models for Advanced Reasoning” Approved under Cyberport Artificial Intelligence Subsidy Scheme

Professor Yang Hongxia, Associate Dean (Global Engagement) of the Faculty of Computer and Mathematical Sciences and Professor in the Department of Computing at The Hong Kong Polytechnic University (PolyU), has announced a major milestone in AI research and healthcare innovation.   Professor Yang's team has developed the project “Enhancing Edge-based Foundation Models for Advanced Reasoning,” which leverages the computing power of Cyberport’s Artificial Intelligence Supercomputing Centre (AISC). The project adopts an innovative approach by integrating high-quality small language models to efficiently train large-scale models. This strategy significantly reduces dependency on centralized computing infrastructure while improving the accuracy of generated information by 28%.   The research has also made substantial progress in medical application scenarios, particularly in the field of cancer treatment. The team is currently working in close collaboration with leading hospitals in Hong Kong and Mainland China, applying vertical large models and domain-specific models in tandem with supercomputing resources to enhance data analysis for oncology.   This AI-driven approach enables more accurate and localized treatment planning, effectively reducing the need for complex diagnostic tests, and thus alleviating the physical and psychological burden on patients. Furthermore, it brings substantial efficiency gains by saving human resources and reducing time costs associated with clinical testing.   This development underscores PolyU’s commitment to interdisciplinary AI innovation with tangible societal impact, particularly in healthcare. The initiative exemplifies how edge AI and federated modeling can help democratize access to powerful AI tools while addressing real-world challenges.

30 Apr, 2025

PAAI Research Results

Updated 30 Nov 2024

Exploring Decentralized Artificial Intelligence: Advancing the Democratization of GenAI

With the rapid development of Generative AI (GenAI) technologies—such as Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and Stable Diffusion—AI is increasingly permeating and transforming various industries, including life sciences, energy, finance, and entertainment. These technological breakthroughs not only accelerate innovation and enable personalized services but also significantly improve the efficiency of workflows. According to market forecasts, the global GenAI market is expected to grow from USD 40 billion in 2022 to USD 1.3 trillion over the next decade.   Challenges to Widespread Adoption and Strategic Countermeasures Despite its promise, the widespread adoption of GenAI faces substantial challenges. One of the most pressing issues is the concentration of GPU resources among major technology firms, which restricts the capabilities of research institutions and enterprises in developing their own models. Many organizations are forced to rely on API-based solutions, which not only introduce latency and security risks but also limit the customizability of models. Although open-source models offer some flexibility, they are often not sufficiently adaptable to domain-specific knowledge, hindering deep engagement by researchers in the pretraining phase, a critical stage for creating powerful and domain-aligned models.   In response, The Hong Kong Polytechnic University is pioneering an innovative GenAI infrastructure that enables enterprises and applications to independently pretrain their own GenAI models. This is achieved through a novel "Model over Models" (MoM) methodology to build foundation models. Specifically, global knowledge is divided into thousands of domains, with relatively lightweight Small Language Models (SLMs) trained for each. These smaller models demand far fewer resources—e.g., a 7-billion-parameter model can be continually pretrained using just 64 to 128 GPUs. Eventually, these SLMs can be integrated via the MoM framework to construct affordable and scalable Artificial General Intelligence (AGI) models, significantly lowering barriers to entry and enabling global participation in foundation model development.   For more details, please visit: https://www.stheadline.com/knowledge/3406043/

30 Nov, 2024

PAAI Publicities

Updated 21 Nov 2024

Prof. Yang Hongxia, Director of PAAI: " Aims to Drive World-Class AI Innovation in the Greater Bay Area”

On November 19, the Boao Forum for Asia Youth Forum 2024 Hong Kong Conference was held at the Hong Kong Convention and Exhibition Centre under the theme "Leading the Future: The Role and Contribution of Youth." Focusing on Artificial Intelligence (AI) and climate change, the event gathered over 30 experts and scholars from around the world. Through roundtable discussions and international youth dialogues, the forum aimed to advance Hong Kong’s AI industry and foster cross-border exchanges and collaboration among young people across Asia. At the event, Professor Yang Hongxia, a renowned AI scientist and faculty member at The Hong Kong Polytechnic University, was interviewed by Guangzhou Daily. Professor Yang highlighted the promising prospects and robust talent pool for AI development in Hong Kong. She expressed her hope to collaborate with institutions and enterprises in the Guangdong-Hong Kong-Macao Greater Bay Area to achieve world-leading technological breakthroughs in AI. Professor Yang previously served as Chief Data Scientist at Yahoo and has held senior research and leadership roles at IBM, Alibaba DAMO Academy, and ByteDance. She has authored over 100 publications in top-tier conferences and journals, and holds more than 50 patents in the U.S. and China. She joined PolyU in July this year. Commenting on her transition from industry to academia, Professor Yang candidly shared her motivation to address the barriers to entry in training large-scale AI models. “Training a large model often requires thousands of high-end GPUs over extended periods of time—resources that are far beyond the reach of universities and startups. Only tech giants have the capacity to do this,” she noted. Despite the fact that more than one million professionals work in the AI field globally, fewer than 1,000 have access to the core processes of large model development. “Technology evolution inevitably moves from centralization to decentralization—just like how early computers were massive machines, and now everyone can own one. This is the direction I hope to see,” she emphasized. Regarding her decision to relocate to Hong Kong, Professor Yang expressed strong confidence in the city’s future in AI development. She cited Hong Kong’s world-class education system, which nurtures high-caliber talent for the AI sector, as well as the government’s strong policy and financial support for AI initiatives. “Having recently participated in several project applications, I can say the support has been incredibly beneficial,” she remarked. She also praised policies such as the Top Talent Pass Scheme, which she believes will attract more top-tier AI professionals and scholars from around the world to work and conduct research in Hong Kong. “When talent and policy come together, it creates an ideal environment for AI development. This synergy is extremely valuable,” she added. During her time in industry, Professor Yang had already collaborated extensively with leading universities in mainland China. Since relocating to Hong Kong, she has continued to foster academic partnerships, including with Southern University of Science and Technology (SUSTech) and local institutions such as The University of Hong Kong and The Hong Kong University of Science and Technology. “Looking ahead, we hope to work closely with schools and enterprises across the Greater Bay Area to achieve world-class outcomes in artificial intelligence research and innovation,” she concluded.   For more details, please view via: https://huacheng.gz-cmc.com/pages/2024/11/20/86858f7b766c4daf9422b1bac1e954de.html

21 Nov, 2024

PAAI Media Coverage

Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here