Skip to main content
Start main content

Conference Paper Published

Rearch

Reasoning or Memorization? Investigating LLMs' Capability in Restoring Chinese Internet Homophones

Ma, J., Feng, Z., Song, H., Chersoni, E., & Chen, Z. (2025). Reasoning or Memorization? Investigating LLMs' Capability in Restoring Chinese Internet Homophones. In Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), 120-139.
 
DOI:  https://doi.org/10.18653/v1/2025.knowllm-1.11

 

Abstract

Chinese homophones, prevalent in Internet culture, bring rich linguistic twists that are challenging for language models. While native speakers disambiguate them through phonological reasoning and contextual understanding, it remains untested how well LLMs perform on this task and whether LLMs also achieve this via similar reasoning processes or merely through memorization of homophone-original word pairs during training.In this paper, we present HomoP-CN, the first Chinese Internet homophones dataset with systematic perturbations for evaluating LLMs’ homophone restoration capabilities. Using this benchmark, we investigated the influence of semantic, phonological, and graphemic features on LLMs’ restoration accuracy, measured the reliance levels of each model on memorization during restoration through consistency ratios under controlled perturbations, and assessed the effectiveness of various prompting strategies, including contextual cues, pinyin augmentation, few-shot learning, and thought-chain approaches.

 
 

 

 


Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here