Skip to main content
Start main content

Journal Paper Published

Rearch

UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions

Myrgyyassov, A., Song, Z., Sun, Y., Wang, B. X., Wong, M. N., & Zheng, Y.* (2026). UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions. IEEE Journal of Biomedical and Health Informatics.
 
DOI:  https://doi.org/10.1109/JBHI.2026.3691369

 

Abstract

Ultrasound tongue imaging (UTI) provides a non-invasive, cost-effective modality for investigating speech articulation, speech motor control, and speech-related disorders. However, real-time tongue contour segmentation remains a significant challenge due to the inherently low signal-to-noise ratio, variability in imaging conditions, and computational demands of real-time performance. In this study, we proposed UltraUNet, a lightweight and efficient encoder-decoder architecture specifically optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet introduces several domain-informed innovations, including lightweight Squeeze-and-Excitation blocks for channel-wise feature recalibration in deeper layers, Group Normalization for enhanced stability in small-batch training, and summation-based skip connections to minimize memory and computational overhead. These architectural refinements enabled UltraUNet to achieve a high segmentation accuracy while maintaining an exceptional processing speed of 250 frames per second, making it suitable for real-time clinical workflows. UltraUNet integrates ultrasound-specific augmentation techniques, including denoising and blur simulation using point spread function. Additionally, we annotated UTI images from 8 different datasets with various imaging conditions. Comprehensive evaluations demonstrated the model's robustness and precision, with superior segmentation metrics on single-dataset testing (Dice = 0.855, MSD = 0.993px) compared to established architectures. Furthermore, cross-dataset testing on 7 unseen datasets with 1 train dataset revealed UltraUNet's generalization capabilities and high accuracy, achieving average Dice Scores of 0.734 and 0.761, respectively, in Experiments 1 and 2. The proposed framework offers a competitive solution for time-critical applications in speech research, speech motor disorder analysis, and clinical diagnostics, with real-time performance in tongue functional analysis in diverse medical and research settings.

 

Keywords

Deep Learning, Medical Image Analysis, Squeeze-and-Excitation Blocks, Tongue Contour Segmentation, Ultrasound















Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here