Conference Paper Published
Study
Experience and Opportunities
| Wu, H., Du, Y., Li, Z., Gu, Y., Jayaprakash, D. T., & Sheng, L. (2025). Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth. In Proceedings of Interspeech 2025, 3075-3079. |
| DOI: https://doi.org/10.21437/Interspeech.2025-2430 |
|
|
|
Abstract Bilingualism is rising worldwide, yet bilingual child assessments face major challenges. A shortage of bilingual clinicians and the labor-intensive nature of speech data annotation often cause misdiagnoses, delaying care and research. Using a Mandarin-English adult-child speech dataset (53 telehealth sessions), we explore how speech models can automate the annotation of clinical data involving multi-languages, multi-speakers, children's speech, and code-switching utterances. Findings indicated that simple pre-processing improves automatic speech recognition (ASR) accuracy. Specifically, integrating speaker diarization with OpenAI's Whisper medium model reduces word error rates to 35% for child speech and 30% for code-switching, rivaling fine-tuned transformer models. As the first ASR pipeline evaluation for a Mandarin-English clinical dataset, our study highlights model limitations, establishes a benchmark for bilingual speech technology, and improves clinical services. |
|
Keywords clinical application, multi-language, multi-speaker, speech model |
We use Cookies to give you a better experience on our website. By continuing to browse the site without changing your privacy settings, you are consenting to our use of Cookies. For more information, please see our Privacy Policy Statement.
Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.
You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here