Conference Paper Published

Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth

Wu, H., Du, Y., Li, Z., Gu, Y., Jayaprakash, D. T., & Sheng, L. (2025). Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth. In Proceedings of Interspeech 2025, 3075-3079.

DOI: https://doi.org/10.21437/Interspeech.2025-2430

Abstract

Bilingualism is rising worldwide, yet bilingual child assessments face major challenges. A shortage of bilingual clinicians and the labor-intensive nature of speech data annotation often cause misdiagnoses, delaying care and research. Using a Mandarin-English adult-child speech dataset (53 telehealth sessions), we explore how speech models can automate the annotation of clinical data involving multi-languages, multi-speakers, children's speech, and code-switching utterances. Findings indicated that simple pre-processing improves automatic speech recognition (ASR) accuracy. Specifically, integrating speaker diarization with OpenAI's Whisper medium model reduces word error rates to 35% for child speech and 30% for code-switching, rivaling fine-tuned transformer models. As the first ASR pipeline evaluation for a Mandarin-English clinical dataset, our study highlights model limitations, establishes a benchmark for bilingual speech technology, and improves clinical services.

Keywords

clinical application, multi-language, multi-speaker, speech model