Filler words, filled pauses, hesitation markers, thinking sounds, call them what you will: these little sounds season and serve as added ingredients in our spoken word salads. We all use them and we all observe and have opinions about them. So what function do they serve?

Repetitive sounds, filler words, and discourse markers are universal and ubiquitous, serving cognitive and interactive functions. Their usage may be unconscious on the part of the speaker, while aiding others in processing and digesting what is being said. In this sense, filler words are essential elements of spoken discourse for both speaker and listener. They serve to hold the floor or maintain a turn, and signal that something is about to be said.

Michael Barbaro of the New York Times podcast The Daily contends that verbal fillers such as “hmm,” can be a way of “punctuating interviews in ways that reminded you that two humans were having a real conversation,” and to articulate interest without applying judgment, show curiosity, and keep the conversation going. Verbal fillers and conversational devices range from phonemic (the level of sound) to morphological (words) to syntactic (short phrases), and accomplish the feats of formulating thoughts, listening, politeness strategies, wait time, hedging, softening, and signaling approval or disdain, among others.

While small and seemingly harmless, verbal fillers can serve mighty syntactic roles. They keep us going when speaking, by helping form thoughts and moving the conversation along. Here are some examples:

I’m listening to you…

Yeah (yeah)
Go on
I am here for that
I love it!
I cannot.

I’m talking to you…

Like, I mean, um, actually
Okay, basically, literally, totally
Do you know what I mean?
Does that make sense?

To take over a turn:


At its simplest level, language is sound (phonetics and phonology); the most layered and complex is discourse and pragmatics. If you visualize language as a series of concentric circles, it traverses simple to complex—from the inner layers of phonetics and sounds to syllables, words and sentences, to the outer layer of discourse and pragmatics.

The Ubiquitous So, Well, Um

In spoken language, we see that many elements are universal, one being the way speakers listen and take turns in a conversation. These markers or thinking sounds (uh, uh huh, huh, hmm, er, like, right?) may be collections of sounds with meaningless lexical value, yet they pack a pragmatic punch.

They can be perceived with a range of filters, neutral or positive ones such as creating connection, agreement, and unity; or with a more negative view, such as a crutch, tic, parasitic word, or distracting habit. These exist in every language. The French utter eh bien; Portuguese have então, ta, pois; Japanese えーと (“eeto”), and なんか (“nanka”); Spanish – mira, vale, among others.

 Understood by several labels, verbal fillers and hesitation markers are some of these universal elements, a type of discourse marker. Interjections and rejoinders also come to mind—words a listener uses to keep the conversation going and show the speaker she understands and even sympathizes with them; for language learners, using them adeptly (with the right syntactic placement, intonation, and timing) may demonstrate further fluency. 

Rejoinder Examples

Happy: Really? Wow! That’s cool.
Sad: Oh no! Aww…that’s too bad.
Surprising: No way! For real? Wait, what? Are you serious?
Neutral: I see; Mm-hmm; Interesting; That’s nice.

When the speaker is not visible, verbal “signposts,” can be helpful, as they take the role, in part, that nonverbals do. Politicians, public speakers, and those making a formal presentation, on the other hand, might gather their thoughts and hold the floor with a “Soooo…,” a “Look,” or an “Ummm” in between points. Learners of more than one language might hear these signals early on, and start incorporating them, purposefully or not, to create added fluency and confidence in the new target language.

Judgments of and reactions to these elements of speech range from perceiving a mark of personality—it’s simply what the speaker does—to a more visceral judgment (distracting, inept, unprofessional, unpracticed). Examples of these utterances include the well-known “you know” and “like,” and the increasingly more frequent “right?”

You know creates an exchange structure focusing the listener’s attention on a specific piece of information provided by the speaker. That seems like a lot to pay for a coffee, you know?

Like is a big one for protecting oneself from potential disagreement: “Do you, like, want to have Indian food for lunch?” perhaps really means: “Will you have Indian food for lunch with me?,” or functions to reduce uncertainty and perhaps avoid rebuttal or being wrong in an utterance:

It’s, like, supposed to rain all afternoon.

Right: Has this one crept in quickly or slowly? The peppering of a “right(?)” in the telling of a story or sharing new information can take the place of “Do you know what I mean?” Additionally, depending on placement and tone, it can connote a spoken yet implied request that the listener agree, while the listener might be thinking, “Must I agree? (I don’t, actually: I just learned this; this is his story; or I don’t agree at all.”)

Consider the following examples:

We were driving, and on the way down there was a ton of traffic, right?
We’re creating this new AI system, right, that maps these utterances…

This instance is slightly different; Speaker B is articulating agreement, but perhaps as if he or she had thought of the idea first:

Speaker A: It’s so cold out!
Speaker B: I know, riiight?

Infrequent usage of “right” with a pause gives listener a chance to concur, and can lead to recognition that is collaborative and consensus building, rather than coming across as authoritative, didactic, or presumptive. Frequent habitual usage, on the other hand, could lead a listener into tacit “agreement,” especially if its delivery is painted with a particular intonation, cadence and energy.

Depending on tone, when the interlocutor (speaker) takes one too many turns using “right,” signifying that an agreement exists between speaker and listener, agreement may, in fact, not exist at all. Further, its usage could erode future potential agreement, as the conversational partner has been presumptively assigned the role of observer to a series of thoughts the speaker is sharing, rather than being embraced as an active participant with a differing opinion or stance. Perhaps the speaker is attempting to hold the listener’s attention while concurrently- and consciously, with intent or not- seeking agreement for what is he or she is saying.

Hesitation markers are ubiquitous in both native and nonnative speakers of language. The force of these is that they signal a pause and that a turn is not done, and the listener should continue listening.

Imagine how challenging it is to use verbal fillers accurately in a second or third language. Or, um, rather, since there is already a more spacious canvas, maybe one should not even bother sprinkling them in. Indeed, interpreters are trained to omit fillers, so as not to be perceived as uncertain or create doubt with the accuracy of an interpretation. Ironically, using filled pauses naturally, “accurately,” where they are expected in conversation, can result in the speaker being perceived as fluent, or at least communicatively competent in a second or third language.

The Demographics of English Filler Usage

An analysis of hundreds of transcriptions suggest that filler word use and discourse markers, used at comparable rates across age and gender, can be potential social and personality markers.

Mark Liberman discovered, though his parsing of 14,000 phone conversations, that “uh” increases with age, but that at every age, male speakers use it more than female speakers do; whereas, “um” decreases with age, but female speakers use it more than male ones do at each stage in life.

The state of affairs is actually simple and one of a common ground: we each have our own idiolect, a term coined by Bernard Bloch from the Greek idio- (personal, distinct) + -lect (social variety of a language), which is the unique speech of an individual. This term refers to the theory that, while being influenced by or sharing a common dialect, sociolect, culture, and environment, no two individuals have the exact same linguistic tastes and idiosyncratic features in their personal inventory of variants.

JSTOR is a digital library for scholars, researchers, and students. JSTOR Daily readers can access the original research behind our articles for free on JSTOR.

