What is Voice Onset Time? (2 hours)

↵ Back to module homepage

This experiment is related to something called voice onset time. Let's figure out what that means.

To see what voice onset time is, we're going to record and analyze our own speech. To do this, you will need to download a computer programme called Praat. This is a free software for doing linguistic analysis of speech ("praat" is the Dutch word for "speech", and this software was developed by some Dutch guys).

Towards the end of this page I have also uploaded a video demo of this activity.

Set up Praat

Praat is available for both Windows and Mac; make sure you choose the appropriate version to download. It's very easy to set up, you don't need to do any special "installation" procedure; just download the file and then "unzip" it, place the "Praat.exe" file (it looks like a pink mouth and ear) on your Desktop.

Once you have done that, double-click the Praat icon to open the programme. You should see something like this:

A screenshot of the Praat GUI. On the left side is a window labelled "Praat Objects", and on the right side is a pink window labelled "Praat picture"

I usually close the window that says "Praat Picture"; we are not going to need it. All our work will use the window that says "Praat Objects".

Record your voice

Now it's time to record yourself speaking. Make sure you're in a relatively quiet environment. You will also need to be using a computer with a microphone; most modern laptops have a built-in microphone, or if you have an external microphone you can plug that in.

In the Praat Objects window, click "New" (at the menu at the top of the window), and then click "Record Mono Sound". This will open a "SoundRecorder" window.

If you click "Record" and make some noise, you should see a green bar moving up and down. That is how you know the microphone is working. If you don't see any green bar, or if the green bar doesn't go up and down when you talk, that probably means your microphone is not working.

Your task now is to record yourself saying "pa". I usually record myself saying it several times—in case one recording is "bad" (e.g., if there's some background noise or my voice cracks), then I can have some extras to use. So usually I say "pa pa pa pa" or something like that.

When you are ready, click "Record", and speak into Praat. After you're done, click "Stop". You can give the sound a name if you want (the default is just "untitled"), and then click "Save to list & Close". Once you have done this, you will see the new sound available and highlighted in the "Praat Objects" window:

A screenshot of the "Praat Objects" window. Under the "Objects:" list, there is one item called "Sound untitled", selected and highlighted in blue.

Listen to your sound

When this sound is selected/highlighted, click "View and edit". You will see a window like this:

Screenshot of a sound waveform and spectrogram opened in Praat

For now, we only need to worry about the top half of the image; this is showing the sound wave. When there's a straight horizontal in the middle, that's when your voice was silent. When there's a big soundwave going up or down, that's when your voice was loud.

You can listen to this whole sound by clicking the area that says "Visible part" or the area that says "Total duration". But we only really need to focus on one "pa". So I usually like to "zoom in" and select just one sound to listen to. To do this, click at the beginning of one sound, and hold your button and drag your cursor to the end of the sound. This will highlight the sound in pink, as shown below:

Same as before, but now one syllable is selected and highlighted in pink

If you want to listen to just this sound, you can click the relevant section in the bar below. (Clicking "total duration" will play the whole recorded sound. Clicking "visible part" will play the whole part of the sound that's visible on the screen; right now the whole sound is visible on the screen so this "visible" part will play the same thing as "total duration", but if you zoom in further then the visible part will be less than the total duration. Finally, clicking the section that in my screenshot says "0.698558" will play just the selected sound, which happens to be 0.698558 seconds long.)

In my recording, some of the sounds actually sounded more like "ha" than "pa"; this happened because my room was a bit noisy when I recorded, and I didn't speak very close to the microphone, so the "p" wasn't very audible. This might happen to you as well. I recommend you try to find the "best" pa you can in your recording; you can select different sounds and listen to them, and choose the one that sounds the clearest.

Finally, I like to zoom in to see this sound clearly. After you have selected the sound so it's highlighted in pink, click the "sel" button at the bottom of the screen. This will focus the screen on the sound you've selected so that you can see it clearly. It will look something like this:

Same as before, but now zoomed in to a single syllable, with waveform and spectrogram visible.

Now we really can see the "pa" sound clearly.

Examine the sound

When you look at this recording of "pa", there are three parts that are clearly different. We can think about what these mean, and how they are related to what our mouth does when we pronounce "pa".

First, at the very beginning (and again at the end), there is silence. Say "pa" to yourself, and pay close attention to what your mouth does. (If you want to make it even clearer, say a vowel before you say "pa"; e.g., say "aaaaaapa".) You should notice that the first part of pronouncing "pa" is closing your mouth. Specifically, you close your lips together when you want to pronounce "pa". (Try to pronounce "pa" without closing your lips; it's impossible!)

Next, there is a period where the recording is not silent, but there's sort of low and random patterns in the sound wave. In fact we can barely even call it a "wave"; the line is not going up and down regularly like waves do, it seems more like just randomness. Again, say "pa" to yourself, and pay close attention to what your mouth does. After the first step (closing your mouth), the next step is that your mouth opens and releases a strong puff of air. (You can feel this if you put your hand in front of your mouth; you will feel warm breath hitting your hand when the puff of air comes out.)

Finally, there is a long period with larger, more regular sound waves. This part is much louder, and the sound waves aren't in a random pattern; they look more like repetitive "spikes". If you zoom in even closer at the sound waves in the middle of this part, you will see that they're showing the same pattern repeating over and over, as shown here:

Periodic sound waves from the middle of a vowel

This is the vowel. Again, pronounce "pa" to yourself and pay close attention to what happens. First there is silence, when your mouth is closed. Then there is a puff of air, the "p". Finally, there is the vowel "aaaaa". The puff of air causes a pretty much random burst of energy, whereas the vowel causes a smooth, repetitive pattern of air vibrations (you can feel this if you touch your throat while you say "aaaaa"; you will feel your throat vibrating quickly).

Record yourself saying "ba ba ba" for comparison

Repeat the steps above to record yourself saying "ba" instead of "pa", listen to the recording, zoom in one one syllable, and examine it.

What is the biggest difference you notice between "ba" and "pa"?

Voice Onset Time

Hopefully you noticed that the long "puff of air" present in "pa" is much shorter (or missing entirely) in "ba".

We can measure how long that puff of air is. Specifically, we can measure how long from the beginning of the sound (i.e., when the "silence" ended and the puff of air started) until the beginning of the vowel (the point when the repetitive waves start to emerge). We call this measure the Voice Onset Time: how much time passes until the onset (the beginning) of voicing (the vowel part of the sound). You can measure it by clicking at the beginning of the sound and dragging to the beginning of the vowel; the top part of the Praat window will show the duration, in seconds. Here is a screenshot of my measurement of the Voice Onset Time for my recording of "pa", which I measured as being 0.099375 seconds (i.e., 99 milliseconds):

As before, but with the aspiration selected and highlighted

And here is a screenshot of my measurement for "ba", just 7 milliseconds:

As before, but with the aspiration selected and highlighted

So the crucial difference we see is: pa has a long voice onset time (a long puff of air before the vowel), and ba has a very short one.

We often call that puff of air aspiration. We say that pa is an "aspirated" sound (it has a lot of aspiration—a long Voice Onset Time), and we say that ba is an "unaspirated" sound (it has little or no aspiration—a very short Voice Onset Time).

Here is a video of how to record and measure voice onset time for "pa":

Now that you understand what voice onset time is, it's time to link this concept back to the experiment you did before. Continue on to the next task: "Analyze your data".


by Stephen Politzer-Ahles. Last modified on 2021-07-13. CC-BY-4.0.