Skip to main content
Start main content

Conference Paper Published

Rearch

Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth

Wu, H., Du, Y., Li, Z., Gu, Y., Jayaprakash, D. T., & Sheng, L. (2025). Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth. In Proceedings of Interspeech 2025, 3075-3079.
 
DOI:  https://doi.org/10.21437/Interspeech.2025-2430

 

Abstract

Bilingualism is rising worldwide, yet bilingual child assessments face major challenges. A shortage of bilingual clinicians and the labor-intensive nature of speech data annotation often cause misdiagnoses, delaying care and research. Using a Mandarin-English adult-child speech dataset (53 telehealth sessions), we explore how speech models can automate the annotation of clinical data involving multi-languages, multi-speakers, children's speech, and code-switching utterances. Findings indicated that simple pre-processing improves automatic speech recognition (ASR) accuracy. Specifically, integrating speaker diarization with OpenAI's Whisper medium model reduces word error rates to 35% for child speech and 30% for code-switching, rivaling fine-tuned transformer models. As the first ASR pipeline evaluation for a Mandarin-English clinical dataset, our study highlights model limitations, establishes a benchmark for bilingual speech technology, and improves clinical services.

 

Keywords

clinical application, multi-language, multi-speaker, speech model

 

 



Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here