Chao Prize Lecture 2026: TalkBank and AI
Conference/Seminar
-
Date
08 May 2026
-
Organiser
Faculty of Humanities
-
Time
17:15 - 18:15
-
Venue
Function Room 5-6, Basement 1, Hotel ICON
Remarks
The Lecture will be recorded for promotional and education purpose and it will be conducted in English.
Summary
Abstract
The training and fine-tuning of AI systems depends on large amounts of accurately recorded data. For spoken language data, the largest open-access source is the TalkBank system which is becoming increasingly prominent for automatic analysis of language disorders, language acquisition, and neurolinguistic modeling. Language development data in TalkBank are being used to train better automatic speech recognition for children. Data from people who stutter is being used to train systems for recognition of stuttered speech. Systems are using TalkBank data to detect cognitive decline, understand psychosis, profile types of aphasia, track recovery from traumatic brain injury, and follow patterns of code-switching. TalkBank also provides methods for detailed analysis of conversational interactions in classrooms, air traffic control, and pragmatic deficits. Although current data are heavily skewed toward English and European languages, data from other languages is growing rapidly through increasingly extensive data-sharing.
About the speaker
Prof. Brian MacWhinney's research focuses on understanding language structure, processing, and learning as emerging from competitive processes that operate across a variety of time/process scales with unique constraints. He applies this perspective in studies of first language acquisition, second language acquisition, language typology, sociolinguistics, conversational interaction, language disorders, and neurolinguistics. He has created the TalkBank system -- the world's largest open-access database on spoken language -- which now provides an essential component in the development of AI models for understanding human language.