Journal Paper Published
UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions
| Myrgyyassov, A., Song, Z., Sun, Y., Wang, B. X., Wong, M. N., & Zheng, Y.* (2026). UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions. IEEE Journal of Biomedical and Health Informatics. |
| DOI: https://doi.org/10.1109/JBHI.2026.3691369 |
|
|
|
Abstract
Ultrasound tongue imaging (UTI) provides a non-invasive, cost-effective modality for investigating speech articulation, speech motor control, and speech-related disorders. However, real-time tongue contour segmentation remains a significant challenge due to the inherently low signal-to-noise ratio, variability in imaging conditions, and computational demands of real-time performance. In this study, we proposed UltraUNet, a lightweight and efficient encoder-decoder architecture specifically optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet introduces several domain-informed innovations, including lightweight Squeeze-and-Excitation blocks for channel-wise feature recalibration in deeper layers, Group Normalization for enhanced stability in small-batch training, and summation-based skip connections to minimize memory and computational overhead. These architectural refinements enabled UltraUNet to achieve a high segmentation accuracy while maintaining an exceptional processing speed of 250 frames per second, making it suitable for real-time clinical workflows. UltraUNet integrates ultrasound-specific augmentation techniques, including denoising and blur simulation using point spread function. Additionally, we annotated UTI images from 8 different datasets with various imaging conditions. Comprehensive evaluations demonstrated the model's robustness and precision, with superior segmentation metrics on single-dataset testing (Dice = 0.855, MSD = 0.993px) compared to established architectures. Furthermore, cross-dataset testing on 7 unseen datasets with 1 train dataset revealed UltraUNet's generalization capabilities and high accuracy, achieving average Dice Scores of 0.734 and 0.761, respectively, in Experiments 1 and 2. The proposed framework offers a competitive solution for time-critical applications in speech research, speech motor disorder analysis, and clinical diagnostics, with real-time performance in tongue functional analysis in diverse medical and research settings. |
|
Keywords
|