PolyU Gen AI app updates on 1 June 2026 | Information Technology Services Office

GPT-5.5 (2026-04-24)

OpenAI released GPT-5.5 on 24 April 2026. GPT‑5.5 excels at writing and debugging code, conducting online research, analyzing data, creating documents and spreadsheets, and moving across agentic tools until a task is completed.

GPT-5.5 supports a context size of up to 1 million tokens, excel in complex reasoning and long-context analysis, and delivers higher-quality outputs with fewer retries and thus less tokens. The model’s training data covers information up to December 2025.

The PolyU Gen AI app released GPT-5.5 (2026-04-24) on 1 June 2026, replacing GPT-5.1 (2025-11-13).

Hosted on Azure Cloud and accessed via the PolyU GenAI app, GPT-5.5 consumes:

5 credits per 1K tokens for text or image input
30 credits per 1K tokens for text output, including chain-of-thought tokens

Staff and students can leverage the “reasoning effort” parameter to control the extent of reasoning (inference looping) performed by the GPT-5.5 model. Additionally, the “Reasoning summary” feature is enabled for all users to display summaries of the model's chain-of -thought reasoning.

For more details on GPT-5.5, please refer to Introducing GPT‑5.5 and OpenAI's GPT-5.5 in Microsoft Foundry.

Gemini 3.5 Flash (GA)

Google Cloud released Gemini 3.5, its latest family of models combining frontier intelligence with action, on 19 May 2026.

Gemini 3.5 Flash is designed for the agentic era. it excels at sub-agent deployment, multi-step workflows, and long-horizon tasks at scale. This model is particularly effective for rapid agentic loops involving complex coding cycles and iterations. It’s training data covers information up to January 2025.

The PolyU Gen AI app released Gemini 3.5 Flash model on 1 June 2026, replacing Gemini 3 Flash Preview.

Hosted on Google Cloud and accessed via the PolyU GenAI app, Gemini 3.5 Flash consumes:

1.5 credits per 1K tokens for text or image input
9 credits per 1K tokens for text output, including chain-of-thought tokens

Staffs and students can leverage the “reasoning effort” parameter to control the extent of reasoning (inference looping) carried out by the Gemini 3.5 Flash model. The “Reasoning summary” feature is also enabled for all users to display summaries of the model's chain-of-thought reasoning.

In addition, Gemini 3.5 Flash model costs 1.68 credits per image input, regardless of image size.

For more details, please refer to the Introduction to Gemini 3.5 Flash, Gemini 3.5 Flash Official Blog and the Gemini 3.5 Flash Model Card.

DeepSeek-V4-Pro & DeepSeek-V4-Flash

DeepSeek AI introduced DeepSeek-V4-Pro & DeepSeek-V4-Flash on 24 April 2026, supporting a context length of one million tokens.

The DeepSeek-V4 series incorporates several key upgrades in architecture and optimization:

Hybrid Attention Architecture:
A hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) is designed to significantly improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of the single-token inference FLOPs and 10% of the KV cache compared with DeepSeek-V3.2.
Manifold-Constrained Hyper-Connections (mHC):
mHC allows the LLM model to deeply decouple memory and compute, meaning that the model can keep large and varied amounts of information active in parallel without destabilizing the training loss.
Muon Optimizer:
The Muon optimizer is employed to enable faster convergence and greater training stability.

The PolyU GenAI app introduced DeepSeek-V4-Pro and DeepSeek-V4-Flash on 1 June 2026, replacing DeepSeek-V3.2.

The DeepSeek-V4-Pro service, powered by Alibaba Cloud Bailian, incurs a usage fee of:

1.74 credits per 1,000 tokens for input
3.48 credits per 1,000 tokens for output plus the cost of chain-of-thought tokens

Users may choose to enable or disable the thinking mode of DeepSeek-V4-Pro and DeepSeek-V4-Flash hosted on Alibaba Cloud Bailian.

Thinking Mode (Max): The model takes time to reason step by step before delivering the final answer. The mode is ideal for complex problems requiring deeper thought.
Non-Thinking Mode: The model provides quick, near-instant responses, making it suitable for simpler questions where speed is more important than depth.

For more information, please refer to the DeepSeek-V4-Pro Official Repository, DeepSeek-V4-Flash Official Repository, and DeepSeek-V4 Technical Report.

Prompt Enhancer & Guardrail Layer for Stable Image Ultra

A new Text-to-Image Prompt Enhancer has also been introduced for Stable Image Ultra.

This feature is designed to improve the quality, clarity, and usability of user-provided prompts before they are sent to an image generation model. Its primary goal is to preserve the user’s original intent while transforming brief, incomplete, or ambiguous input into a more structured and visually descriptive prompt that can produce stronger and more consistent image outputs.

In practice, the enhancer helps by:

identifying the core subject and required elements of the request
enriching the scene with concrete visual details such as composition, lighting, materials, atmosphere, and spatial relationships
ensuring that any text to appear inside the image is described explicitly and accurately
applying safety and compliance controls so that inappropriate, harmful, or disallowed content is filtered or reformulated into acceptable alternatives

More broadly, this serves as a prompt-optimization layer between the user and the image generation system. Rather than requiring users to manually write highly detailed prompts, the enhancer helps bridge the gap between natural user input and the level of precision that text-to-image models typically require for high-quality generation. This improves accessibility for general users, increases consistency in output quality, and supports more reliable downstream image creation workflows.

Although the current implementation is available for Stable Image Ultra only, the longer-term objective is to expand this capability to additional image generation models. This will provide a more consistent prompt enhancement experience across a broader range of image generation platforms.