Skip to main content
Start main content

Journal Paper Published

Rearch

Multilingual prediction of semantic norms with language models: a study on English and Chinese

Peng, B., Hsu, Y.-Y., Chersoni, E.*, Qiu, L., & Huang, C.-R. (2025). Multilingual prediction of semantic norms with language models: a study on English and Chinese. Language Resources and Evaluation, 59(4), 3911-3937. 
 
DOI:  https://doi.org/10.1007/s10579-025-09866-9

 

Abstract

Lexical semantic norms characterize each lexical concept in terms of a set of semantic features for the words of a language. They provide essential resources for behavioral, computational, and neuro-cognitive studies of language and human cognition. Recent research advocate for the need for cognitively motivated feature sets, arguing that semantic representations grounded in human cognition can facilitate cross-linguistic modeling and even enable the prediction of a word’s semantic features based on its translation in another language. In this study, we present a new dataset of brain-based, Binder-style semantic norms for Chinese. Using the corresponding English dataset and the representational power of multilingual language models, we conduct systematic experiments on semantic norm prediction both within and across languages. We evaluate monolingual and English-Chinese cross-lingual norm prediction using two different methods: embedding-based regression vs. prompting with large language models. Our results show that bidirectional models from the BERT family and GPT-4 achieve a good level of accuracy, with moderate-to-high correlations with human ratings. Notably, in the cross-lingual setting, the best and the worst predicted features align with the higher and lower end of levels of human agreement when comparing norms of words between translated words. Our results support a novel computational approach for supplementing and expanding cognitive semantic norms, highlighting the potential of language models to bridge cross-linguistic semantic representations.

 

Keywords

Cognitive modeling, Large language models, Multilinguality, Psycholinguistics, Semantic norms, Word embeddings

 

 



Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here