Journal Paper Published

Decoding and controlling emotion in LLMs through human-aligned representational geometry with enhanced interpretability

Wu, X., Wang, H., Yan, Z., Tang, X., Xu, P., Siok, W. T., Li, P., Gao, J. H.*, Lyu, B.*, & Qin, L.* (2026). Decoding and controlling emotion in LLMs through human-aligned representational geometry with enhanced interpretability. Computers in Human Behavior, 183, 109051.

DOI: https://doi.org/10.1016/j.chb.2026.109051

Abstract

Aligning the internal states of large language models (LLMs) with human emotion is a fundamental challenge for safety and interpretability of artificial intelligence (AI). However, whether the high-dimensional representations within LLMs encode emotion in a way that is structurally analogous to human affective perception, and whether these features can be used to causally control the emotional tone of model outputs, remains unknown. Here, we develop a concept-driven approach to extract a dictionary of interpretable, human-centric emotion features from multilingual LLMs. We show that these features form a high-dimensional emotion space that is remarkably structured by the core psychological dimensions of valence and arousal. This representational geometry is robust across different model families (Gemma and Llama) and generalizes across languages (English and Chinese), revealing a shared basis for cross-linguistic affective semantics. Crucially, we provide causal evidence for the functional relevance of these features by demonstrating that they can be used as ‘steering vectors’ to precisely and reliably control the emotional tone of model outputs in generative tasks. Our work provides a computational basis for human-aligned emotional representations in AI, offering a generalizable framework for identifying and controlling complex conceptual representations, thereby paving the way for safer and more interpretable models.

Keywords

Alignment, Emotion, GenAI, Human behaviour, Model steering, Sparse autoencoder