Skip to main content
Start main content

Journal Paper Published

Rearch

Attention-Enabled Multi-layer Subword Joint Learning for Chinese Word Embedding

Xue, P., Xiong, J., Tan, L.*, Liu, Z., & Liu, K. (2025). Attention-Enabled Multi-layer Subword Joint Learning for Chinese Word Embedding. Cognitive Computation, 17(2), 75. 
 
DOI:  https://doi.org/10.1007/s12559-025-10431-3

 

Abstract

In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP). The complex structures and diverse influences of Chinese characters present distinct challenges for semantic representation. As a result, Chinese word embeddings are primarily investigated in conjunction with characters and their subcomponents. Previous research has demonstrated that word vectors frequently fail to capture the subtle semantics embedded within the complex structure of Chinese characters. Furthermore, they often neglect the varying contributions of subword information to semantics at different levels. To tackle these challenges, we present a weight-based word vector model that takes into account the internal structure of Chinese words at various levels. The model further categorizes the internal structure of Chinese words into six layers of subword information: words, characters, components, pinyin, strokes, and structures. The semantics of Chinese words can be derived by integrating the subword information from various layers. Moreover, the model considers the varying contributions of each subword layer to the semantics of Chinese words. It utilizes an attention mechanism to determine the weights between and within the subword layers, facilitating the comprehensive extraction of word semantics. The word-level subwords act as the attention mechanism query for subwords in other layers to learn semantic bias. Experimental results show that the proposed word vector model achieves enhancements in various evaluation metrics, such as word similarity, word analogy, text categorization, and case studies.

 

Keywords

Attention mechanism, Chinese word embedding, Feature substring, Morphological information, Pronunciation information, Semantic analysis

 

 









Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here