Large language models (LLMs) are trained to process and generate human-like text using vast amounts of data. Research suggests that they can form meaningful conceptual representations through language alone, contrasting with cognition theorists’ views that physical, sensory experiences are necessary for concept formation.

 

PolyU researchers, in collaboration with scholars from Ohio State University, Princeton University and the City University of New York, explored the similarities between LLMs and human representations. Their findings, published in Nature Human Behaviour, highlight how language shapes complex conceptual knowledge and how sensory input can enhance understanding.

 

Assessing LLM performance through grounding techniques

Led by PolyU Professor Li Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, the research team analysed conceptual word ratings from advanced LLMs, including ChatGPT (GPT-3.5, GPT-4) and Google LLMs (PaLM, Gemini). They compared these ratings with human-generated data for about 4,500 words across non-sensorimotor, sensory and motor domains, using the validated Glasgow and Lancaster Norms datasets.

 

Initially, the team compared individual human and LLM ratings to assess similarity, using human pairs as a benchmark. However, this method might overlook how multiple dimensions contribute to word representation. For instance, while “pasta” and “roses” may have similar olfactory ratings, “pasta” is more closely related to “noodles” in terms of appearance and taste. To gain a deeper understanding, the researchers conducted representational similarity analyses across various attributes.

 

The results showed that LLM representations were most similar to human representations in the non-sensorimotor domain, less so in sensory domains and least similar in motor domains. This indicates LLM limitations in capturing human conceptual understanding, particularly in areas involving sensory information and embodied experiences.

 

To investigate whether grounding could enhance LLM performance, the researchers compared LLMs trained on both language and visual input (GPT-4, Gemini) with those trained on language alone (GPT-3.5, PaLM). The grounded models demonstrated significantly higher similarity to human representations.

 

Advancing LLMs with multimodal learning and sensory input

Professor Li noted, “The availability of both LLMs trained on language alone and those trained on language and visual input provides a unique setting for research into the effect of sensory input on human conceptualisation.” He emphasised the potential of multimodal learning to foster more human-like representations and performance in LLMs.

 

The researchers envision a future where LLMs equipped with grounded sensory input – such as through humanoid robotics – can actively interpret and interact with the physical world. Professor Li stated, “These advances may enable LLMs to fully capture embodied representations that mirror the complexity and richness of human cognition, and a rose in LLM representation will then be indistinguishable from that of a human.”

 

This finding aligns with previous research on representational transfer. Discover how visual and tactile experiences influence object-shape knowledge by clicking here.

 

The research team analysed conceptual word ratings from advanced LLMs like ChatGPT and Google LLMs, comparing them to human-generated ratings of approximately 4,500 words across non-sensorimotor, sensory, and motor domains using the Glasgow and Lancaster Norms datasets.

The research team analysed conceptual word ratings from advanced LLMs like ChatGPT and Google LLMs, comparing them to human-generated ratings of approximately 4,500 words across non-sensorimotor, sensory, and motor domains using the Glasgow and Lancaster Norms datasets.