How to score cognitive tasks (3 hours)

↵ Back to module homepage

So far, we have thought a lot about what some different cognitive tasks for measuring psychological mechanisms are, and what kinds of psychological mechanisms they might reflect (e.g., we might use a count span task to measure working memory, we might use a flanker task to measure executive function, etc.). But we haven't spent much time examining exactly how we can use those tasks to measure people's cognitive ability.

Ultimately, the purpose of these tasks is to label people. We want to be able to look at someone's performance on a reading span task (for example) and say "that's a person with high working memory". Or we want to be able to look at someone's performance on a Stroop task (for example) and say "that's a person with mid-low executive function". To do that, we need to be able to calculate a score for each person's performance on the task. Only after we have scores can we compare people (e.g., say "Person A got a score of 88, and person C got a score of 62, so Person A has better working memory") or group/label them (e.g., say "These six people have high executive function, those eight people have medium-level executive function, and these last five people have low executive function.")

So let's take a look at how we might turn results into scores.

Below I have put someone's responses from the Daneman & Carpenter reading span task (if you don't remember how this task works, check the example slides from the "working memory" section of this module). The column on the left shows the words that the person was supposed to recall in each block. The column on the right shows the words that the person actually recalled.

Should have said... Said....
affair
surprise
context
support
affair
surprise
context
reason
factory
excuse
pencil
illness
reason
illness
pencil
picture
pain
unity
picture
pain
unity
boat
wound
address
student
doctor
shoulder
shoulder
doctor
owner
fear
training
cupboard
cupboard
owner
fear
training
officer
train
desire
officer
train
desire
husband
wonder
flower
eyes
wonder
flower
eyes

I want you to make up a system for scoring these results.

There is no right or wrong answer here. There's no rule for what scale these should be on (you can give points, you can give a score out of 100%, or any other idea you have). I have no rules for what aspects you should or should not count when you make your score. That's up to you. Just think of some way you can assign this person a specific, numerical score which will reflect how well you think they did at the task.

Once you've figured that out, continue to the reflection questions below.

Describe your scoring system, in as much detail as possible. I.e., describe exactly what things you looked at and describe how you arrived at a score.

Also, state what score you gave this person.

When you had to make a system to give people a score, you probably had to deal with several issues or questions. Here are a few issues you might have encountered:

How did you handle the issues you encountered? Did you think about any of these concerns? If so, how did you decide how to handle them in your scoring system? Did you come up with any other problems that I didn't mention?

All of the scoring choices you make will have consequences—sometimes, drastic consequences.

Let's imagine that we decided we do want to consider correct order when we make our scores. In other words, we want people who remember words in the right order to get better scores than people who don't remember them in the right order.

If your previous scoring system already did this, then you don't need to change it now. If your previous scoring system did not consider order, then think about how you can include order in the scoring system now.


Now that you've done that, let's examine the results from four different people. Imagine that the participants were supposed to remember affair, surprise, context, support, in that order. Now, here are the words that four different people recalled:

Score these four participants strictly according to your scoring system. Then answer the following two questions:

  1. Who got the highest score?
  2. Do you think that's fair? (i.e., do you think it accurately reflects these people's working memory ability?)

Let's keep considering these participants' responses (when they were supposed to recall affair, surprise, context, support):

One popular way to calculate scores and include order is position-based scoring. In other words, a person gets a point if they recall "affair" in the first position, they get a point if they recall "surprise" in the second position, they get a point if they recall "context" in the third position, and they get a point if they recall "support" in the fourth position.

Under this scoring system, here are the points earned:

Does this seem fair?

One might argue that Participants A and B performed about equally well: each of them forgot just one word. But Participant B lost more points because they forgot an early word (which means the rest of the words get pushed to the wrong position), and Participant A lost fewer points because they forgot a late word, leaving the early words in the correct place.

One might also argue that Participant D got a pretty bad deal. They lost two points, because they had two words in the wrong position. But it's likely that they did that because they switched the order of two words. So in a way, it sounds like they're losing two points for one mistake. Does it really sound right that Participant D, who remembered all words but just made the mistake of switching the order, gets a lower score than Participant A, who made the [arguably] more serious mistake of forgetting a whole word entirely?

I have only discussed position-based scoring. There may be other ways to score this, too. You might have scored it in a different way.

The most important thing in the end is the ranking (which participant has the highest score, which has the second-highest score, etc.): eventually we would use these scores to identify who is a "high" scorer, who is a "low" scorer, etc. In the scoring method I just mentioned, Participant A got the top score, followed by Participant D, then Participant B, and the loser is Participant C.

Is there any other way you could score these, which would result in a different ranking of the participants? If you think of one, explain the scoring system and tell me what the ranking would be. (It's possible that your scoring system from the previous questions already does this.)

There are two things I hope you have learned from the previous reflection questions:

Conway et al. (2005) offer a very detailed discussion of different scoring schemes for working memory span tasks, and the pros and cons of each.

But overall, this problem illustrates a much more serious problem with research in general. When we do research, we are usually interested in constructs. A construct is an abstract thing in the world that we are trying to study: something like "intelligence", "language proficiency", "working memory ability", "how difficult a sentence is to understand", etc. Generally a construct is something that you can intuitively feel (e.g., in the "Intro to psycholinguistics model", you maybe could feel that the center-embedded sentence is harder to understand than the non-center-embedded one) but can't measure exactly. In fact, constructs can be "felt" but almost never can be directly observed or exactly measured.

Consider some examples. If you want to know someone's intelligence, you might measure it with an IQ test. If you want to know how hard a sentence was to understand, you might measure it by recording how long a person takes to read it. If you want to know how good someone's English level is, you might measure it by looking at their TOEFL, IELTS, or HKDSE scores.

The problem, however is that measures can never perfectly reflect constructs.

Do you know anyone who has a great TOEFL score but sucks at English? Do you know anyone who's really smart, but doesn't get good grades or test scores? Those are both examples of times when measures don't reflect constructs. Maybe someone gets a high IELTS score because they cram for the test a lot and memorize monologues they can do, even though their English is not very good. Maybe a smart person gets bad grades because the university does not recognize their learning style, or because they had many other life events preventing them from being able to do their school work. In any case, the measure (test score, class grades) is not accurately indicating the underlying abstract construct (English profienciency, intelligence).

This issue applies to measurements of working memory, executive function, and everything else we do in psycholinguistics. (Really it applies to things in most sciences, and especially to psychological sciences, which are interested in measuring stuff about the human mind; the human mind is infamously complicated and hard to measure.) Working memory test scores might not accurately reflect someone's working memory because of the choices we made in how to calculate the scores. Measures of how long someone took to read a sentence might not accurately reflect how difficult the sentence is, because the person might have gotten distracted during reading or something like that.

Unfortunately there's no easy solution to this. The only thing we can do is always remember that there is a difference between measures and constructs, and we should always be skeptical and always ask whether the measure we are using (IELTS test scores, working memory scores, etc.) is really a good measurement of the abstract construct (language proficiency, memory capacity, etc.) that we are actually interested in.

Can you think of one more example of a situation where a measure does not accurately reflect a construct?

Last question! Please write a self-reflection about what you learned in this module. That could mean summarizing the main points in your own words, or it could mean raising questions about something you didn't understand, problems or criticisms, pointing out something you disagreed with, suggesting some further issue that builds off of the things in this module, etc.

When you finish this activity, you are done with the module (assuming all your work on this and the previous tasks has been satisfactory). If you are interested in leading a discussion on this module, you can go on to see the suggested discussion topics. Otherwise, you can return to the module homepage to review this module, or return to the class homepage to select a different module or assignment to do now.


by Stephen Politzer-Ahles. Last modified on 2021-07-13. CC-BY-4.0.