Experimental pragmatics

↵ Back to class homepage

Throughout this class so far, the method we've been using to analyze pragmatics has mostly been introspection: we think about what various utterances might mean and we think about what we might say in various situations, and we use those ideas to support or criticize theories of pragmatics. (This approach is sometimes derisively called "armchair linguistics", because you can do it while relaxing in an armchair.) Another method we've used is observation: noticing interesting examples in the real world and thinking about how they support or challenge pragmatic theories.

In recent decades, though, there has been an increasing interest in doing controlled empirical experiments to study pragmatics. Hence a new field of study, experimental pragmatics, in which researchers use the methods and techniques of experimental psychology to figure out how pragmatics works and to try to solve questions and problems that so far have not been solved through traditional methods of introspection and observation. "Experimental pragmatics" started to be recognized as a field in the early 2000s, and really exploded into a major area of research in the 2010s and onward; for example, in 2021 the XPrag research network organized a symposium commemorating "20 years of experimental pragmatics".

(Note that here I'm using the term "experiment" to mean collecting data from lots of people in a controlled survey or test. It's different from "observation" because it involves the researcher setting up different conditions—e.g. seeing how people interpret a certain sentence in different contexts—whereas observation depends on just finding things as-is in the real world. And I'm also treating it as different from "introspection" because it involves the collection of observable data from other people, rather than the collection of the researcher's own intuitions. Technically, from a philosophy-of-science perspective, introspection might also be an "experiment", but that's not the sense in which I'm using the term here. Practically speaking, and at risk of oversimplifying a bit, "experiments" usually involve computers or at least paper-pencil surveys.)

Experimental pragmatics is a vast topic which we could easily spend a whole semester (or more) on. It's also a topic that's near and dear to my own heart, because it's the kind of research I've done; my own doctoral dissertation is a set of pragmatics experiments (even though I have since come to believe that much of my initial research was misguided). For this module, though, let's limit ourselves to two closely related case studies. (For more information on experimental pragmatics you should read Noveck, which is a great overview of the field; chapters 6 and 7 in particular are about the same sort of cases as what this module will examine.)

To learn about experimental pragmatics, we'll look at two experiments that are nice examples of this field. Both experiments are about scalar implicatures. So before we look at the experiments, let's take a quick look at what scalar implicatures are and what are some of the challenging questions about scalar implicatures that needed to be addressed through experimentation.

Scalar implicatures

Throughout many of the previous modules we've revisited the same made-up example, in which Rebecca says "Josh is smart" and thereby implicates that she thinks Josh is not brilliant.

This is an example of one of the most widely discussed phenomena in pragmatics, so-called "scalar implicatures". As we've seen, scalar implicatures are those that supposedly come from thinking about a stronger alternative that the speaker could have said but chose not to say (i.e., Rebecca could have "Josh is brilliant" but for some reason she didn't). In particular, "scalar" implicatures are ones where this stronger alternative comes from replacing one word in the utterance with a stronger word from an ordered "scale" of related words that exists in a person's vocabulary. For instance, in this example, maybe we assume that Rebecca and the listener both know that the words like brilliant, smart, and average form a scale <brilliant, smart, average>, from strongest to weakest. When Rebecca uses a word that's not the strongest one on the scale, the listener wonders why she didn't use a stronger word and infers that she doesn't think the stronger word applies. Many other words also exist in scales which may trigger scalar implicatures; for instance, an utterance like "X or Y" can implicate "X or Y but not both" because or is weaker than its scalemate and (<and, or> scale); maybe or probably can implicate that something is not certain (<certainly, probably, maybe> scale); and a <always, sometimes> scale can make sometimes implicate "not all the time", as illustrated near the beginning of this funny scene from the movie Best in Show.

Scalar implicatures are probably the biggest topic studied in experimental pragmatics, at least in terms of the number of experiments that have focused on them. Just like a good friend of mine once derisively told me that Cui Jian's "Nothing to My Name" (一无所有) is the most over-analyzed song in the history of Chinese rock and roll, scalar implicatures are the most over-analyzed thing in pragmatics. After having spent a number of years doing research on them, I no longer believe they're anything special (see, e.g., Geurts, for discussion on how scalar implicatures just follow from the same principles as other implicatures). Nevertheless, they are often a useful test case for designing experiments to test various theories of pragmatics. Below we will examine two experiments that were designed to test two different issues about scalar implicatures. For more detailed information about these and other issues, excellent summaries of scalar implicature research are available in Noveck & Sperber (2007), Katsos & Cummins (2010), Sauerland (2012), Chemla & Singh (2014), van Tiel et al. (2016, 2019), and Geurts.

Case study 1: "embedded implicatures"

So far our discussions of scalar implicature have assumed that they are conversational implicatures which arise when a speaker flouts the maxim of quantity. Specifically, our treatment has generally assumed that scalar implicatures are a sort of generalized conversational implicature which follows from some automatic inferential steps that happen in any context; but a very similar analysis could still work even if we treat scalar implicatures as particularized conversational implicatures. Either way, this is the most traditional view of scalar implicatures and it's based on Gricean ideas (although people often call the generalized-conversational-implicature view of them "neo-Gricean", because it's essentially based on Gricean pragmatics but it's a view that was further developed by other researchers after Grice).

This view is controversial, though, and there are other theories that propose very different explanations of how scalar implicatures work. The one most relevant to this experiment is something called the "grammatical theory", which claims that "scalar implicatures" are not implicatures at all, but rather they are interpretations based on syntax and semantics. According to this theory, scalar words have their literal meaning (e.g. sometimes means "at least some of the time", smart means "at least smart", etc.), but their meaning gets enriched by a grammatical operator when they're used in a syntactic context. You can imagine this operator as being like a silent "only"; the idea is that when someone says "Josh is smart", there is actually a silent operator in the sentence that makes it be interpreted as "Josh is ONLY smart" (i.e., only smart but not brilliant). Likewise for any other "scalar" terms; for instance, an utterance like "I sometimes remember to floss before brushing" gets interpreted as "I ONLY sometimes remember to floss before brushing" because of the insertion of this silent operator. Crucially, according to the grammatical theory, this process happens in syntax and/or semantics, not in pragmatics, and thus "scalar implicatures" are not actually implicatures; they are grammatical (syntactic and/or semantic) enrichments.

We don't have enough time or space here to get into all the details of the debate between these approaches. (But see Geurts, chapters 7-8, and Noveck, chapters 6-7, for much more discussion of these.) Let's just focus on one. One major point of contention between these theories has been the question of whether or not "scalar implicatures" occur in embedded contexts. All these theories agree that "scalar implicatures" can happen in matrix sentences; i.e., "I regret some of the things I said" may be taken to mean "I regret some, but not all, of the things I said". Where the theories disagree is on whether that "some" will still get interpreted as "not all" if it's in a syntactically embedded clause (e.g., "I think the president regrets some of the things he said"—according to the grammatical theory, this should mean that the speaker thinks the president does not regret all the things he said), or in the scope of another semantic operator (e.g., "All of the people involved in the scandal regret some of the things they said."—according to the grammatical theory this should mean that no person involved regrets everything they said, i.e., maybe person A regrets 80% of what he said and person B regrets 50% of what she said, but there is no person who regrets 100% of what he said.) Some of the pragmatic theories, on the other hand, might not predict "some" to get interpreted as "not all" in these situations, because implicatures are supposed to happen at the level of the full utterance rather than the level of a particular clause within the utterance (see, however, Geurts, chapters 7-8, for an explanation of how these theories can accommodate so-called "embedded implicatures" if those sorts of implicatures do indeed occur).

This is not really a question that can be answered by introspection, because people have very different intuitions about what these utterances mean. Researchers tend to be willing to interpret these weird sentences in whatever way ends up consistent with their theory. I don't trust my own intuitions with very complicated sentences like this; I feel like after a few years of doing pragmatics I've seen so many of these that my intuitions are messed up and are not the same as what "normal" people's intuitions are.

Geurts and Pouscoulous (2009) tested this by designing a clever survey. They showed volunteers scenarious and sentences like the one depicted in the figure below. In this example (copied from their paper), there is a picture showing three squares and three circles. Let's call the square at the top "square A", and the two squares at the bottom "square B" and "square C". Square B is connected to two of the circles, and Square C is also connected to two of the circles. Crucially, though, Square A is connected to all three circles. Along with this picture, the volunteers saw the sentence "All of the squares are connected with some of the circles"—i.e., a typical "embedded implicature" sort of example sentence like the ones we've seen above. The volunteers were asked to decide whether that sentence is true or false, with respect to the picture.

Figure from Geurts & Pouscoulous experiment; see main text for descrption.

Recall that the "grammatical theory" predicts that the scalar implicature should be realized in the embedded context; in other words, people should interpret this sentence as meaning "all of the squares are connected with not-all of the circles", i.e., no square can be connected to all of the circles. Since Square A is connected to all of the circles, people should consider this sentence "false", according to the grammatical theory. On the other hand, the Gricean theory supposedly predicts that people won't get scalar implicatures in this context; in other words, they won't think that "some" has to mean "not all" here. Therefore, according to that theory, volunteers should consider this sentence "true", even though Square A is connected to all of the circles.

In Geurts and Pouscoulous's experiment, every volunteer marked this sentence as "true". This seems like strong evidence in favor of the Gricean theory and against the grammatical theory. (Or, at the very least, strong evidence that people don't interpret "some" as "not all" in this context; whether the theories actually predict and explain that is a separate question that has been hotly debated in the years since, and proponents of the grammatical theory may claim that either (a) their theory also predicts this outcome, so Geurts and Pouscoulous actually mis-represented the predictions of the grammatical theory; or (b) there was some problem with the way the experiment was designed so its results are wrong.)

What makes these results so compelling and important is that Geurts and Pouscoulous didn't just sit down and say "I think this sentence doesn't mean 'not all', so my theory is right." They went to the trouble of designing an experiment and actually collecting data, to show that the interpretation predicted by their theory is really the way that real people interpret these sentences.

One limitation of these sorts of research is that they often rely on using very complex and crazy sentences which seem quite unusual for normal life. These sorts of experiments always remind me of Bilbo Baggins's birthday speech with the ridiculously complicated sets of embdeeded quantifiers ("I don't know half of you half as well as I should like, and I like less than half of you half as well as you deserve!"). We can't be sure that the way people understand these crazy sentences is a good reflection of the way they understand normal language. In our next case study below, we'll briefly see how we can use psycholinguistic techniques to measure how people process more normal, natural utterances.

Case study 2: is there a processing cost for implicatures?

Above, we learned about the debate over whether "scalar implicatures" are really implicatures. Another, quite separate, debate focuses on whether or not there is an extra cognitive processing cost to understand implicatures. According to one view, understanding an implicature should take more time and more cognitive effort. Understanding the literal meaning of a sentence just takes semantics, but understanding an implicature takes extra steps of logical inference (recall that, as we saw in the module on weak and strong implicatures, recovering a strong scalar implicature is a supposedly a four-step inferential process) and each of these steps must take some time and effort. Therefore, some researchers argue that if scalar implicatures are processed according to Gricean pragmatic reasoning, it should take more time and effort to interpret "some" as meaning "not all" (i.e., to interpret it pragmatically) than it does to interpret "some" as meaning "at least one and possibly all" (i.e., to interpret it literally/semantically).

To test that, we need some way of measuring how long it takes someone to interpret some utterance, and/or how much effort they are using to understand the utterance. Fortunately, psycholinguistics provides us with many tools for doing that. We can use special sorts of equipment to measure people's brain activity and/or eye movements while they're reading or hearing a sentence, or we can even use simple computer programs to time how long it takes them to read a sentence.

One of my own experiments did just that. We had people read very short stories like the ones below, and we used a computer to time how long it took them to read each phrase. (This experiment was inspired by a very similar 2006 experiment by Breheny, Katsos, & Williams. Their 2006 experiment was groundbreaking, in that it figured out this clever way to test how people read sentences with scalar implicatures, and it inspired a whole generation of follow-up research such as the one I'm describing below. But I'm using my own experiment as the example for this module because I think the sentences are a bit more straightforward; the Breheny et al. 2006 experiment is very similar but there is an extra difference between the key sentences which complicates the design.)

  1. Upper-bounded: Mary asked John whether he intended to host all of his relatives in his tiny apartment. John replied that he intended to host some of his relatives. The rest would stay in a nearby hotel.
  2. Lower-bounded: Mary asked John whether he intended to host any of his relatives in his tiny apartment. John replied that he intended to host some of his relatives. The rest would stay in a nearby hotel.

These two stories are almost completely identical, except for one word. In the "upper-bounded" context, Mary asks a question about whether John will host all of his relatives. In the "lower-bounded" context, on the other hand, Mary asks a question about whether John will host any of his relatives. This tiny difference should cause a difference in how the word "some" gets interpreted later. In the upper-bounded context, it is likely that readers would interpret John's response "some" as meaning that he will host "not all" of his relatives (since Mary directly asked a question about whether he'd host all of his relatives, and he didn't just say "yes"). On the other hand, in the lower-bounded context, it is likely that readers would not interpret "some" in that way, and they might think John is making no commitment about whether or not he would host all of them.

In other words, we expected that in the upper-bounded context readers would interpret "some" with an implicature, and in the lower-bounded context they would not (they would interpret it literally).

Having set up a way to get people to read the same sentence with or without an implicature (by reading the same sentence in different contexts), we could then measure how long they took to read each phrase in the sentence. The key phrase, underlined in the above examples, is "some of". As I mentioned above, some theories predict that it should take people extra time and effort to understand an implicature; if so, then people should need more time to read "some of" in the upper-bounded context than in the lower-bounded context.

You can see the key results from the experiment in the figure below. The graph shows the reading time for each phrase in the sentence; the reading times for the key phrase, "some of", are highlighted by a green circle. We can see that, contrary the prediction, people did not take more time to read this in the upper-bounded context than the lower-bounded context; if anything, they read slightly faster in the upper-bounded context.

Figure from Politzer-Ahles & Fiorentino experiment; see main text for descrption.

I actually found similar results as these in later studies measuring people's brain waves and eye movements as they heard or read sentences like these. Across all these studies, I never found evidence that people need extra time and effort to understand scalar implicatures. (But I may be in the minority here; most researchers writing about this topic seem to assume that it's now widely accepted that implicatures take time and effort. See Noveck, chapter 6, for a good and nuanced discussion of much of this research looked at together.)

One of the benefits of this type of research is that we don't have to explicitly ask people what they think a sentence means, and thus we don't have to worry about subjective responses or the possibility that people might not be able to accurately report what they think something means (or, even worse, that our very act of asking the question will influence how they interpret the sentence). We also don't need to present volunteers with crazy and complicated sentences. We can just give people relatively normal utterances, and have them listen to or read them in a relatively natural way. As long as we have a good experiment design, we can then use their objective behavioural or neural responses to make inferences about how pragmatics works. The drawback to all this is that experiments themselves involve a lot of assumptions and complicated methodology, and the link between the experiment and the phenomena we want to test may break down along any of these assumptions (i.e., using the experiment results to make conclusions about how scalar implicatures work requires assuming that our experiment is accurately measuring how people process scalar implicatures, so if there's any reason to doubt that assumption then that will also cast doubt on anything we learn about pragmatics through that experiment).

In-class activities

Take any pragmatics question from earlier in this class (it could be something students have raised before, something from one of the modules, or something you've thought of on your own). Have students try to think of a way they could design an experiment to test it.

⟵ Metaphor
Politeness ⟶

by Stephen Politzer-Ahles. Last modified on 2022-03-17. CC-BY-4.0.