How do LLMs know the meaning of words: Exploring the limits of purely relational semantics

This is a draft by Dominik Lukeš in response to the question of whether LLMs can “understand”. It is more about how to think about meaning (and semantics) than how Large Language Models work. There are no references but it is based on an approach to language practiced by Construction Grammar and Cognitive Semantics. Comments welcome via @techczech or [email protected].

</aside>

What do we mean by meaning

Semantics is a fancy word for meaning and confusingly also the word for the study of meaning. And just all of the most famous terms associated with language, “meaning” does not have a clear unambiguous definition. I will not try to give one here because it would be futile. But I’ll try to contrast three ways of looking at meaning and show that much of the debate about Large Language Models and semantics comes from confusing the three distinct perspectives.

Representational semantics

Representational of semantics is how most people intuitively use the word “meaning” as in “what does this word mean”? In this view, meaning is the sort of thing that shows up on the right hand side of a dictionary. This kind of meaning is something you can point to, show a picture of, give a straightforward description of. It’s the kind of thing that when you say you see it, it’s either right or wrong.

The first meanings of words we acquire as children are representational. We point at things and people and say the word: “mama”, “truck”, “apple”. We may even describe feelings “sad”, “happy”, “hungry” or actions “run”, “play”, “cry”. Later we expand this and include words like “government” or “desolute” or even abstractions like “love”. They are harder to point to but are still fairly representational.

One property of “representational” semantics is that it is easy to translate. We can say “dog” is “chien” in French but “hund” in German. We can even say it for things like “in” is “v” in Czech.

Another property of representational semantics is that we can usually tell a very straightforward story of how we learned the representational meanings of words.

Relational semantics

But represenational semantics soon runs into problems. A simple glance at the right side of any dictionary entry reveals that pretty much no word has just one “meaning”. They all have multiple related or completely unrelated meanings. So “cat” could be a “domestic feline of genus something or other” or a “jazz musician” but only when used by some people. And all of a sudden saying that “cat” is “chat” in French does not seem as straightforward.

But there are more difficulties. Many words do not have a straightforward representation. Like “the” or “in” or even words like “get”. We cannot learn them by pointing at something or giving a single example. We can only learn them from context. And many of them, we could not even begin to define. What is the meaning of “the”? Why do we use it in a sentence like “I saw a book with all the pages torn out” or “Ah, you are the Mr Smith!”

And asking “how do you say ‘the’ in German” is a lot less informative as is asking how you say it in Russian which does not even have a definite article. This can be very confusing to people. I once had a student ask me “How do you say ‘have’ in Czech. As in ‘I have arrived’.” But that’s our representational instincts leaking into relational semantics.

When we learn something by pointing at it we’re learning a sort of name for a type of thing or action or property. But when we learn things from context, we learn a complex set of relationships. From these often emerge subtle ways in which we look at the world. And this is one of the reasons becoming proficient at another language is so damn hard.

Let’s take the humble duo “in” and “on”. They have a very clear distinction in meaning “inside of” vs “on top of”. And they are also very easy to translate into Czech as “v” and “na” respectively. But what about a “bird in a tree”, “crowd in the streets”, “a person working in the garden”. Czech uses “na” or “on” for all of these. English thinks of trees, streets, and gardens as a sort of container and Czech as just a plain surface.

How would you learn this? Well, from context and a lot of it. Some of it may seem somewhat random but there are many patterns to this. For example, Czech uses “in” for most countries except “islands” which it uses the word “on” for because they somehow feel less like containers. It uses “in” for small towns and villages except ones that are on top of a hill (and yes, that means sometimes locals use on and outsiders use in.)

Comparing two languages throws up a lot of relational weirdness but looking wider is even more puzzling. Some languages have two different words for “in” one of which means a tight fit as “water in a glass” and loose fit as in “a pebble in a glass”. Now, this is the easy representational bit, but then how do they decide which of the many uses are tight or loose fit? That’s entirely relational.

And that’s just relatively simple prepositions that have a clear core meaning we can draw a picture of. But most words do not even have that. “The” is one perfect example. The most frequent word in the typical English text, does not have a meaning.

And of course, we don’t just have to learn the meanings of words like “the”. There’s also endings like “-ing” and tenses such as “to have learned”. Outside things like saying “cat” and pointing at it, there are pretty much no utterances we ever say that rely purely on representational meaning.

Logical semantics

To confuse the picture slightly, there’s also logical semantics. Logical semantics is really the study of truth conditions of statements. It is not concerned with the meanings of words but how we can tell that saying the word is true or not. It is concerned with the difference between sentences like “there are two pears AND five apples” vs “there are two pears OR five apples” or “SOME apples are green” vs “ALL apples are red” . This may look very trivial but when a lot of these statements are combined, solving the complexity is incredibly useful.