Conference Paper Published
Study
Experience and Opportunities
| Qiu, L., Chersoni, E., & Villavicencio, A. (2025). ChengyuSTS: An Intrinsic Perspective on Mandarin Idiom Representation. In Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (* SEM 2025), 1-12. |
| DOI: https://doi.org/10.18653/v1/2025.starsem-1.1 |
|
|
|
Abstract Chengyu, or four-character idioms, are ubiquitous in both spoken and written Chinese. Despite their importance, chengyu are often underexplored in NLP tasks, and existing evaluation frameworks are primarily based on extrinsic measures. In this paper, we introduce an intrinsic evaluation task for Chinese idiomatic understanding: idiomatic semantic textual similarity (iSTS), which evaluates how well models can capture the semantic similarity of sentences containing idioms. To this purpose, we present a curated dataset: ChengyuSTS. Our experiments show that current pre-trained sentence Transformer models generally fail to capture the idiomaticity of chengyu in a zero-shot setting. We then show results of fine-tuned models using the SimCSE contrastive learning framework, which demonstrate promising results for handling idiomatic expressions. We also presented the results of DeepSeek for reference. |
|
|
We use Cookies to give you a better experience on our website. By continuing to browse the site without changing your privacy settings, you are consenting to our use of Cookies. For more information, please see our Privacy Policy Statement.
Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.
You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here