122: The Interaction Engine (with Stephen Levinson)

How did language start? What do all languages have in common? How does language really work? Many answers have been posed to these questions, but one thing is for sure: interaction is the combustion chamber where everything happens. We’re having a chat with linguistic lion Stephen Levinson, author of The Interaction Engine.

Timestamps

Introductions: 0:19
These fascinating facts about language will make you (or Dr Levinson) a hit at any party: 3:47
The mechanics of speech production: 06:01
What’s going on when we’re talking or listening? 8:46
Cultural differences in conversational norms: 20:33
Universals of interaction: 22:10
Metaphors of space may have been a motivator for language: 25:53
The role of gesture in language development: 28:47
Cooperation and empathy in language: 34:59
What one thing explains the most about language?: 45:56

Disclosure: Hedvig has been employed at the Max Planck Institute for Psycholinguistics, where Dr Levinson is an emeritus director, and she currently works at the Max Planck Institute for Evolutionary Anthropology.

jump to transcript

Listen to this episode

Download this episode

Video

Watch this interview with Stephen Levinson

TikTok promo

@becauselangpod

Language is a wonder. As in: We wonder what makes it all work. So we're talking to linguistic lion Stephen Levinson of ‪the Max Planck Institute for Psycholinguistics‬ about his new book The Interaction Engine. (The book is open source, by the way!) Episode coming soon.

♬ original sound – Because Language

Patreon supporters

A huge thanks to all our patrons.

Become a Patreon supporter yourself and get access to bonus episodes and more!

Become a Patron!

Show notes

The Interaction Engine (2025)
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/A1B3AC730A9F185E51EBFA4E86EFD235/9781009570329AR.pdf/The_Interaction_Engine.pdf?event-type=FTLA

The Dark Matter of Pragmatics (2024)
https://www.cambridge.org/core/elements/dark-matter-of-pragmatics/D1982FE5FE9E5E60F1ADA27F797B114A

Timing in turn-taking and its implications for processing models of language
https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2015.00731/full

Breathing for answering: the time course of response planning in conversation
https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2015.00284/full

Eye blinks are perceived as communicative signals in human face-to-face interaction
https://pubmed.ncbi.nlm.nih.gov/30540819/

Gene Lerner: Turn-Sharing: The Choral Co-Production of Talk in Interaction
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9BCRDccAAAAJ&citation_for_view=9BCRDccAAAAJ:zYLM7Y9cAGgC

Transcript

[Transcript provided by SpeechDocs Podcast Transcription]

[BECAUSE LANGUAGE THEME]

DANIEL: Hello and welcome to Because Language, a show about linguistics, the science of language. My name’s Daniel Midgley. With me now is Hedvig Skirgård. Hey, Hedvig.

HEDVIG: Hello. I’m so excited to be here today.

DANIEL: I’m excited too because of our special guest.

HEDVIG: Yes. Yeah.

STEPHEN LEVINSON: You want me to introduce myself? Okay, my name is…

HEDVIG: No, no. I think Daniel…

DANIEL: Okay, that didn’t work. That didn’t work. I was going to say hello to Hedvig a bit more than I did, but I don’t know…

HEDVIG: Oh, I see.

DANIEL: …what to say. I feel like I know you already.

HEDVIG: There’s nothing more to say. Hello. Daniel and I have been recording a lot lately, so I feel like we’ve run out of small talk, I think, maybe.

DANIEL: [LAUGHS] Well, then let’s just…

STEPHEN: I’ve heard all about your cats.

[LAUGHTER]

DANIEL: Oh, no. He listens.

HEDVIG: One of them’s sleeping right here.

STEPHEN: Yep.

DANIEL: We have a very special guest for this episode. He’s the emeritus director of the Max Planck Institute for Psycholinguistics. He’s the author of over 400 publications on language, and he’s the author of The Interaction Engine: Language and Social Life and Human Evolution. It’s Professor Stephen Levinson. Hey, Steve. Thanks for coming on the show.

STEPHEN: Oh, it’s a pleasure to be here. As they say, I always wonder why here, but because I’m at a great distance from you.

DANIEL: And yet, here is a state of mind.

HEDVIG: We’re all here in the virtual space.

STEPHEN: [LAUGHS] Yeah.

HEDVIG: That is true.

DANIEL: Well, yes. What is the meaning of here? Hmm, that’s an interesting conundrum.

HEDVIG: Yeah. Stephen’s done a lot of thinking about space, haven’t you?

STEPHEN: I have. [LAUGHS]

HEDVIG: Yes, indeed.

DANIEL: Might have been one of the catalysts for why we even have language at all.

STEPHEN: Mm-hmm.

HEDVIG: Yes, indeed. So, you’ve just written this book, The Interaction Engine.

STEPHEN: Yes, which, by the way, has just actually got my first copy one hour ago.

DANIEL: Oh, nice.

STEPHEN: And you see that re-enlivened my enthusiasm for it, which is nice at this particular point. [LAUGHS] Yeah.

HEDVIG: How long have you been writing this book?

STEPHEN: Well, I wrote it, the bulk of it, during the lockdown phase, around 2020, 2022. But then, I had some friends read it, and then it went off to reviewers. And then, there were lots of delays that the poor press suffered. One of those sort of… Actually, I don’t really know, but I think it was held to ransom by folks. And so, it was offline for months. Yeah, so it’s delayed, but…

HEDVIG: That’s a lot of time.

STEPHEN: Yeah.

HEDVIG: And it also summarises work that you’ve been working on for the last 30, 40… How many years are we talking about? What do you think is the first publication of yours that you reference in your book?

STEPHEN: Probably my pragmatics textbook. I don’t remember, but that is, god, 40 odd years ago. Yes. Yeah.

HEDVIG: I have it in front of me here, so I believe…

DANIEL: What’s the title of that one?

HEDVIG: …1978, Sociolinguistic Universals, unpublished manuscript.

STEPHEN: Oh no, that’s true. Oh, gosh.

DANIEL: Oh, wow.

HEDVIG: Is the oldest one.

STEPHEN: Yeah. Yes, yes.

HEDVIG: Okay. All right.

HEDVIG: So, this is really a lot of things you’re summarising and it’s not… I think this is a good thing. It’s not a super thick book. It’s a very hard challenge you’ve had, summarising so much knowledge into under 200 pages, right?

STEPHEN: Yes. I just think people don’t really have the stomach for great, big books anymore. I mean, this is the age of the quickie. [LAUGHS]

HEDVIG: Well, also, depending on what kind of audience you’re after.

STEPHEN: Yes. Yes. Yes.

DANIEL: Yeah. Well, you certainly do have a lot of shorter pieces. I gave a presentation today and as I looked through the articles that I was using, I found that a rather indecent number of them had your name on them. You’ve looked at a lot of the base level things. Here are some of the titles that I used. Timing in Turn Taking and Its Implications for Processing Models of Language, in which you point out that there’s no way that we could possibly plan what to say after someone has already stopped saying their bit. We have to have been working on it while we were still listening to them. That was interesting.

HEDVIG: This is with Francisco Torreira, and this is also one of my favorite papers of yours, Steve, if I may say so. And I also used it when I did Intro to Linguistics class last year. It’s so much fun and people can relate to it.

DANIEL: Yep.

STEPHEN: Yeah, the turn taking just plays a rather important role in the book because it is a sort of a foundational fact. It seems to me it’s been rather neglected in the cognitive sciences, less so interactional linguistics, but still.

DANIEL: Okay.

STEPHEN: Yes.

DANIEL: Another one, Breathing for Answering: The Time Course of Response Planning in Conversation, in which you point out that we hardly know how a sentence that we begin is going to end. And yet, we somehow take enough breath beforehand — like I’ve just done it — to say about how long it was going to be. So, we must have known before we even launched into the sentence.

STEPHEN: Yes, yes. That’s one of those topics that’s still hugely underresearched. The whole sort of control of breathing, how it evolved. We know that apes don’t have it and we can tell a little bit about from the fossils as to when they got the breathing control.

DANIEL: How do we know that?

STEPHEN: Oh, that has to do with the canal through the backbone. And that has to be enlarged to give us the extra nerve control of the thoracic muscles.

DANIEL: Okay.

STEPHEN: Unfortunately, backbone is not generally very well preserved in the fossil record. But where it is, you could see, oh, 1.4 million years ago, that breathing control doesn’t seem to be there. By 800,000 or so, it must have been there, so yeah.

DANIEL: Okay.

HEDVIG: That’s very interesting.

STEPHEN: Yeah.

HEDVIG: And also, I spend some time around people who are Deaf since birth and sometimes they’ve been trained to make speech with varying degrees of accuracy or whatever you want to call it. And volume and breathing control is very hard for many…

STEPHEN: Yes, I can imagine.

HEDVIG: …because you don’t get any feedback. You don’t know what you sound like. And also, you don’t need to try to speak that often, hopefully.

DANIEL: The third one was Eye Blinks Are Perceived As Communicative Signals in Human Face-To-Face Interaction. When someone gives us a long, slow blink, we take that as evidence that they have understood us, but we’re not consciously aware of this. We just do it. I mean, there’s a lot of things that you have sort of been working on and unearthing that. Do you ever just drop these into conversations? “Hey, did you know that we use eye blinks when we’re figuring out if people understood us?”

STEPHEN: Probably not. [LAUGHS]

DANIEL: No? No. Okay.

HEDVIG: Because I was going to say, I think Steve has a knack for finding things that are relatable and interesting both for laypeople and linguists, which is kind of extraordinary because there are a lot of research questions that linguists care a lot about that laypeople couldn’t care less about, but like the nuances and finer points of how conversation and the pragmatics of conversation work, I think everyone can relate to times when it has failed or times when it really succeeded, but they don’t know why. So, I would believe that you would be a hit at a party with talking about your research, is that true?

STEPHEN: [LAUGHS] I don’t think so, really, because for the reason that I sort of sketch in the book that interaction is the medium we sort of swim in. It’s our natural medium. I don’t think we think about it objectively, very much. We don’t have a sort of metalinguistic awareness, tremendous… Unless we’re embarrassed or perhaps, one’s very tentative about interacting with somebody very famous or something. But I think generally speaking, it’s really pretty subconscious for what we’re doing. Yeah.

DANIEL: Okay, well, then you’re also the author of The Dark Matter of Pragmatics, in which you lay out the cognitive demands that conversation puts on us, which we’re also not consciously aware of. So, let’s just say that we’re having a conversation maybe face-to-face or maybe just right here. What’s going on with me in my brain and my body while I’m talking and while I’m not talking? What’s going on here?

STEPHEN: Well, that’s a very complicated question. So, the actual speech production mechanism has been well researched by my colleagues at the Max Planck Institute for Psycholinguistics. So, Pim Levelt in particular, but many people have worked with him, had sort of laid out the amazing sort of time course, millisecond by millisecond, of what happens in your brain and your mind as you crank out a sentence, as it were. So, we know something about each of these little stages of encoding a thought… Selecting a thought, encoding a thought, and then finally getting it into a sort of the sound clothing, as it were.

But that is the sort of mechanics of it. And we also know that part of this is running in parallel. So, you sort of start a process and that starts clanking away, and then you start another little process and so on. And so, these are running in parallel. It’s a complicated system. But anyway, the delineation of it is reasonably clear now, I think, the details people quarrel about, but the general picture seems kind of well worked out. So, we know that’s what is involved in speech.

I mean, I think one of the questions that we don’t quite know about is how much is really going on in parallel. To what extent, for example, you discard a possible sentence structure as you choose another. So, how much of that sort of stuff is really going on in the background? I don’t think we could have a clue. I mean, if you look at speech errors, there is some indication that perhaps people are entertaining two rival structures. [LAUGHS] But anyway, so that’s going once you start speaking.

HEDVIG: So, just to get a clarity on the beginning of that process, is it the case that we have phonological words in our heads that we store in different places, and we put them together and then we get the mechanics of the mouth moving? Or, do we have sort of abstract semantic things that then get linked to phonology? I think maybe I have an idea, but I think we should maybe lay it out a bit for the audience of what goes on.

STEPHEN: I’m not sure I’d be the person to really… You should talk to a processing psycholinguist if you want the full facts. But I think that it’s clear that there has to be… You know, you’re starting off with meaning in some way. Yes, might start off with the visual image about what you want to say and then… But anyway, somehow there’s a conceptual system that has to then look for the words that might get encoded and that’s of course partly a language-specific matter as to what you can say using words. So, that’s going to get filtered then, through vocabulary. You’ll look for the meaning of the words and then the words start to call up syntactic frames of various kinds into which they would fit and so on.

And then, you start to think about the… Since you’ve got one end of the word, the meaning end, that’ll give you the sound end and now you’re going to try and encode that into actual articulatory movements and then you kind of call on a sort of syllabary of pre-coded syllables, this is probably how it works. And then they’re going to come up and feed one by one, they get encoded linearly. And again, in speech errors, you sometimes see them switched around and so on. So, you could get some clues from speech errors as to how all this is done.

HEDVIG: Yeah, I tend to confuse things that are semantically close but can sometimes even be opposites, like FOREHEAD and CHIN or something like that. I’ll say one and mean the other. They must be stored close to each other.

STEPHEN: Yeah, I mean one of the puzzles that actually Pim Levelt spent a bit of time sort of puzzling about is how to avoid the fact that when you’re searching for DOG, you might just end up with ANIMAL because that’s a more general term, so how do you avoid that and come to the right level and so on. So yeah, there’s lots and lots of interesting puzzles in there.

HEDVIG: But of course, what we’re also doing while we’re doing all of this is there’s what we’re encoding in the linguistic signal itself. But as you talk about in the pragmatics book in particular, we are using a lot of other things than just the linguistic signal itself to get information across to another person. We are imagining their minds and thinking about what they might know. We are using our gestures, our hands, our eyebrows, our feet if we could if it was close enough to our face, I bet we would. So, what are all the different ways besides the signal, the sounds, the syllables itself that we can utilise?

STEPHEN: Well, all the things that you just mentioned. I think the whole human body, when we became bipedal, the whole front of our bodies became a signaling system. [LAUGHS] And…

DANIEL: Because these are free. Hands are free.

STEPHEN: Yes, hands are free. Body, torso, angles, nodding, head. And the other thing is, of course, relative hairlessness of humans makes the facial musculature much more visible than it is in apes. And so then, you’ve got all of these subtle facial expressions that can be encoded.

DANIEL: Mm. And you mentioned eyes as well.

STEPHEN: Yes, gaze, tremendously important.

DANIEL: We can see the whites of our eyes, so we can track each other’s eye gaze.

STEPHEN: Yes, yes. In The Dark Matter of Pragmatics, I sort of likened this to kind of orchestra. And the interesting thing is, how is that all coordinated? In the actual speech production system or language production system, how is that all coordinated? I think we don’t have a clue, actually. [LAUGHS]

HEDVIG: No.

DANIEL: [LAUGHS]

STEPHEN: I mean, because it’s really puzzling. I mean, some of it’s obviously a matter of pre-rehearsal and there are chunks of this kind of behavior. So, if you smile, then your eyes are going to also wrinkle and so on. Part of it is sort of coordinated by habit. But no, a lot of it is just an amazing coordination of independent articulators doing things, signaling things.

HEDVIG: But sometimes, also a lack of coordination. So apparently, I’ve been told that sometimes when someone tells me something and it’s of kind complicated to understand, I make a certain facial expression that looks like I’m very upset with the other person. But it’s just my “I’m really trying to think” face. And sometimes people think… People say to me, “Oh, you really didn’t like what I said.” I’m like, “No, no! I’m just desperately trying to think,” but I’m involuntarily sending out the signal that the other person is a terrible person or something. It’s very hard.

DANIEL: Okay, well, now that you’ve mentioned listening, Hedvig, we’ve talked about what this is like for the person who’s speaking.

STEPHEN: Yes, yes, yes.

DANIEL: But for the person who’s listening, there’s a lot of different stuff going on. I’m trying to put your words back into concepts, but then I’m also getting bored or distracted. I’m also making a lot of predictions about what you’re going to say. And then, I’m also planning my own utterance.

STEPHEN: Yes.

DANIEL: Have I missed anything? What else is that?

STEPHEN: No. I think probably you have things that you want to say anyway that you’re stacking up, as it were. [LAUGHS] So, let’s suppose you haven’t seen somebody for a while, you want to tell them about things. So, you’ve got an agenda of some kind, and then so you want to try and edge that in somehow, bit by bit. So anyway, this is all the considerations that are going on and then you’re dealing with the current last turn and wanting to respond to that. And as you say, we know from brain imaging that partway through, maybe even halfway through, I’m already sort of beginning to lower my attention to what you’re actually saying and switching over to how I’m going to respond.

HEDVIG: And is that because we can already predict the end of what you’re going to say?

STEPHEN: That seems to be the case. We have to predict in order to keep the timing. We have this extraordinary close timing in human interaction of about 200 milliseconds, 250 milliseconds between turns, which is very close to human minimal response time. So, we’re sort of the athletes of communication, as it were. I mean, it’s a puzzle why that is, which is another matter. But that means that I cannot wait till I’ve got the turn-ending cues that will come from you eventually to do with intonation and clause closure and so on. So, the syntax and prosody is going to tell me. And there’s always declination over the sort of voice group, the breath group. All of these things are sort of telling me, “Oh, you’re coming to an end, but I can’t wait till then. It’ll be too late. [LAUGHTER] So, I have got to guess what you’re going to say and how you’re going to end up and plan my response.”

HEDVIG: Have you seen these…? There’s two series of viral videos that have been going around a lot lately. One of them is two twins who are telling a story about an event that happened on the news. You might have seen this?

STEPHEN: No. I don’t think so.

HEDVIG: They are speaking in unison a lot.

STEPHEN: Mm-hmm.

HEDVIG: They’re telling actually a story about a violent event that, like, someone threatened their mum with a gun or something, but they keep… like, they get through… one of them gets through about half of the sentence and then the other one starts speaking and they say exactly the same thing and it is a little bit spooky. But when I looked at it and I thought of you… [LAUGHS]

STEPHEN: Yes. Yes.

HEDVIG: …I was going to talk to you. There’s another, a daughter and a mother, and the daughter just likes to tease the mum by saying the same things. So, she shows her something or she says something, and she knows her mother so well, she knows exactly what she’s going to say and she teases her mum by saying it and it drives her mother mad. [LAUGHTER] But I thought it was very beautiful. Maybe some people would say, “Oh, that’s so boring that they’re so predictable,” but I just took it as a sign of like ultimate intimacy. So, there’s some high performance, but a little bit of predicting probably also non-twins do all the time.

STEPHEN: This has actually been studied by some of the conversation analysts. Gene Lerner at UCB has written book-length manuscripts concerning this ability of people to finish other people’s sentences midstream. As he pointed out, it’s often occasioned by the fact that you have a bi-clausal structure of some kind. So, you say if and then the other person does the when part. You say because and then the other person does the consequence. But it’s actually fairly common thing, so it happens. So again, it shows that we are really in this process of predicting what the other person is going to say.

HEDVIG: But I wanted to ask you a question about that because you were saying that, and I think a lot of people can sympathise with the idea, you’re in a conversation and someone’s saying something and you think, “Oh, I have a related thought. Oh, I have a question. Oh, I want to get to my turn soon.” But in different cultures and different communities there are different sort of pragmatic norms of how to have conversations. And me and my mother, for example, will often interrupt each other and talk about three things at the same time. And a lot of people don’t like to do that. They want one person to talk about one topic until they finish talking about it and then the other person takes over.

And I’m not saying that either of these styles is necessarily better or worse, but I have experienced the friction when they clash. When you talk to someone who’s from one style and then the other, or like someone who expects a lot of humming and hawing and this kind of back-channel work, but the other one is expecting that you’re supposed to stay silent when the other one is talking. How many of these different pragmatic sort of normative cultures have you heard of?

STEPHEN: I don’t have a number in mind. [LAUGHS] Deborah Tannen has written a bit about just subcultures in the USA and how they differ in that sort of respect. Like New Yorkers, famously quick and to overlap and be brutal, as it were, interactively. No, I don’t really think we have a clue about that. But to what extent some of this stuff really does culturally vary in system, really systematically? I don’t think we really know.

DANIEL: Well, let’s go on to that because linguists have often looked for universals in language because we want to know what’s the same across languages. And very often this search has gone to syntax. We want to find syntactic universals. What’s the same across languages, what are the principles, what are the parameters? But I’ve often thought that if there are universals, they would be interactional. That’s where I put it. So, you’ve mentioned one interactional universal already, that there’s a tolerance for silence, only about 200 milliseconds. Some cultures may be a little longer, but not that much longer.

STEPHEN: Yes. No, no, correct.

DANIEL: It’s going to be like 200, 500, somewhere around there. So, that’s something that all humans do. What are some other universals that are pretty much the same across cultures for language?

STEPHEN: Well, I think the book does spell it’s out a bit insofar as we know. But one of the things is repair structure that’s studied especially by some of my colleagues, including Nick Enfield there in Sydney. So, repair structures seem to be very strongly universal. It seems to be this sort of various systems that are involved, reluctance to do the kind of nuclear repair, which is, “Huh?”

DANIEL: “What?”

STEPHEN: And try to be more specific to try to help the other guy by saying, “John’s going to do what?”

DANIEL: Right.

STEPHEN: So, you produce as much of the prior sentence as you did here or understand and then just request clarification of the bit you didn’t. So, that sort of thing seems to be… There’s a preference for helping the other guy or if you like, not requesting too much when you have an understanding problem.

DANIEL: And you’ve got to do it right then…

STEPHEN: Yes.

DANIEL: …as soon as you notice that a repair needs to happen, it’s got to be — BOOM — you’re on it.

STEPHEN: Yeah.

HEDVIG: So, we generally in conversation obviously want to collaborate and we want to cooperate. So, we’re trying to… If something is vague, we’re thinking, “Oh, well, what could Steve possibly be referring to? It must be this.” So, we try and fill it in or ask for repair. Are there any situations…? I was trying to think of situations where people engage in non-cooperative communication. I guess it would be something like politicians and journalists when they’re trying to be obtuse or something.

STEPHEN: Yes.

HEDVIG: But generally, humans want to cooperate, most of the time.

STEPHEN: Let’s just say that the system is built for that, but it doesn’t mean that we always cooperate. So, quarrels, quite fascinating to study actually, because…

HEDVIG: Ooh.

STEPHEN: …it has to be… Some stuff has to be presupposed and agreed in order to have a quarrel. And then the actual… What we are going to disagree about then comes up and of course then you’ll find that things that you thought we did agree about turn out that we don’t agree about and so on. So, quarrels are actually quite interesting because they throw up, actually to what extent the system is really built for cooperating because the whole thing sort of starts to break down when it doesn’t work. Yeah.

HEDVIG: Well, a quarrel, I would almost think of it as some kind of cooperation because if you weren’t interested in…

STEPHEN: True.

HEDVIG: Like, you’re actually in a conversation with someone. If you weren’t interested in the conversation at all, you’d just not talk to them.

STEPHEN: Yes, yes, right. So, the channel…

HEDVIG: There’s some minimal level of…

STEPHEN: Yes, yes, true.

HEDVIG: …something, but yeah, quarreling is hard.

STEPHEN: Yeah.

DANIEL: You mentioned also in the book, metaphors of space and how they were probably really important in language getting started. Could you tell me a bit more about that? I know that we say, “Oh, I’m going through a hard time…”

STEPHEN: Yes, exactly.

DANIEL: …when I’m not actually going anywhere. Is that one of those universals too?

STEPHEN: Well, it’s very pervasive in language. So, I was pointing this out. So, nothing original with me, but goes way back into the 19th century, actually. Observers pointed out the extent to which grammatical structure seems to be sort of based on spatial analogies, shall we say? And I was interested in it just because if I was thinking about the evolution of language and when you think about the evolution of language, you start to sort of look at our other great apes and then you have this sort of fundamental puzzle. Wait a minute, at first sight they seem like on a different universe, but then as people like Michael Tomasello and Josep Call and others working on ape communication, pointed out that actually their gesture system has the kind of flexible quick response character that human conversation has to some extent. And it has timing that we now know, I don’t think they knew that originally, that is almost identical to the sort of human timing, those 200-millisecond timing. So, you begin to say, “Oh, wait a minute, that’s sort of interesting,” because maybe we inherited that timing from our forebears.

And all of the great apes except us are predominantly gestures. We had gestures too except in sign language, it’s a subsidiary system. So, the presumption I made was simply, “Well, yes, okay.” So then, presumably proto-language was gestural in humans… in the human life.

HEDVIG: All right. A stance! We love it.

STEPHEN: That seems like… Well, it’s a very old theory, of course, goes way back and resisted by various people for various reasons that we could get into. But if that’s so, then that would give some account of why spatial concepts play such a critical role in language structure. Because gesture is a spatial language about space. It’s normally… If you started to describe your apartment or something, you would immediately start gesturing because it is a medium that’s really great for talking about space. Or if you wanted to describe the shape of a vase or something, you would start using your hands. It’s just very hard otherwise. So, I think that could be part of, if you like, the fossils in our system. We have a gesture system still, but the gesture may have brought along with it all of this kind of preoccupation with spatial concepts.

DANIEL: Okay, so let me talk about gesture then for a second, because in the book, you put the start of language a little bit farther away than I normally do. I think, Hedvig, we’ve said on the show probably 100,000 to 200,000 years humans have been using language?

HEDVIG: It has to be older than 100,000… We really want to be before out of Africa. So, 200,000.

DANIEL: That’s 80,000 years.

HEDVIG: Yes.

STEPHEN: Well, actually, guys, just get real. We’ve been out of Africa at least four or five times [LAUGHS] as you start with Homo erectus or even earlier. And then, they’re various… They’ve got Neanderthals, and then we’ve got many others, it seems now, many other species of hominid floating around.

DANIEL: But you put this at 750,000 maybe instead of 200,000.

STEPHEN: Yes.

DANIEL: You put it way farther back than I thought.

STEPHEN: Yes.

DANIEL: And that’s because of the danger of getting stuck in gesture. Please tell me about that.

STEPHEN: No. Okay, I will try to explain. So, first of all…

DANIEL: [LAUGHS] Sorry.

STEPHEN: …that dating assumes that it’s based really on this transition towards the vocal stream, to the vocal medium from a gestural medium. And that, as I said, we got some inkling about because of fossil vertebrae and the vertebral column and the thoracic innervation.

HEDVIG: Yeah.

STEPHEN: For breathing, that was, right? That was for breathing. So now, we know that 1.4 million years ago, that wasn’t there. We know that proto-Neanderthals, yes, it’s there. So, Neanderthals and us, now we’re looking back at their common ancestor. So, that gets us back about 700,000. I mean, it’s controversial. It’s somewhere between 500,000 and 700,000. Would be the common ancestor for the two branches of humans that went on until yesterday, as it were, 80,000 years ago. [LAUGHS] Yes.

HEDVIG: I mean, I think most of the time when people say 100 to 200, I think a lot of linguists would assume that’s a lower bound.

STEPHEN: No, I think they’re just thinking anatomically modern humans are the…

HEDVIG: Yeah. First and only.

STEPHEN: Yeah. They are the only language-bearing species.

HEDVIG: Yeah.

STEPHEN: But biologically that’s very unlikely given how closely related we are to the other humans. So, I think we have this kind of human exceptionalism that we’re very loathe to give up, thinking, “Oh, we are just fantastic and special and god given,” and all those things.

HEDVIG: It’s funny to me sometimes because I agree with you, people have a great human exceptionalism. I mean, maybe it started with people being very afraid of the idea of knowing that they were at all related to primates. But we’re talking about hominid species that are no longer found today that we only have through fossil records. And personally, I don’t find that so weird to imagine that I have something in common with them. Maybe it’s easier because they don’t exist anymore. So, I find it a bit funny when people…

STEPHEN: Well, in a very peculiar sense, about 50% of the Neanderthal DNA is walking around today. In each of us, I mean, we only have a little fraction, but if you put all the fractions together, [LAUGHS] there’s a Neanderthal… half a Neanderthal walking around.

DANIEL: Mm-hmm.

HEDVIG: Yeah.

STEPHEN: So, we’re too close for that really to be thinkable that Neanderthals didn’t have a language very like us, I think. Well, the people who are still pushing for human exceptionalism do know various small gene changes that have to do probably with neurogenesis, certain differences in the brain structure and perhaps in the speed of learning and all that kind of thing. I’m sure there are going to be some quantitative differences rather than qualitative differences between us and the other hominids. But anyway, we’ll wait and see because this is such an exciting part of science nowadays, is work on the evolution of our own species, but of other species too.

HEDVIG: I even have seen an even lower bound. So, I was using a textbook in a class, and in the textbook, one of the students came up to me and said, “Hedvig, this one says language is 40,000 years old.” And I was like, “Huh, now that does seem very recent.” And they were like… And it turned out that this author was just, for reasons I don’t understand, extremely conservative and just wanted to be as safe as possible, but I don’t think that’s a safe choice really.

STEPHEN: Well, if you just go back, whatever it would be, I suppose… Anyway, before the DNA revolution, 20 years before had archeo DNA, people thought that… They looked at the Upper Paleolithic Western Europe and in the cave paintings and the amazing art and tool use and so on, and they thought, “Wow, these guys, something different between these guys and their predecessors.” And they say, “Oh, language, modern mind.” [LAUGHS] Of course, Western Europe, we thought, “Oh, yeah, Western, began here. Not in Africa.” [LAUGHS] And now, the Chinese are saying, “Oh, yeah, but look at these very early fossils in China.” So, this is all, yeah, nationalism. [LAUGHS]

HEDVIG: Yeah, tied up with a lot of politics very easily.

STEPHEN: But I think it’s pretty incontrovertible, I think, that language as we know it goes way back. I agree that it’s going to be hard to ever know precisely. But as I say, I think the archeo DNA will clarify whatever conceptual cognitive differences there were between Neanderthals and us. That’ll become clearer quite soon, I imagine.

HEDVIG: And besides the fossil record, do you also think that evidence of sort of cooperative culture…? Like, there’s this quote by… Is it Margaret Mead? That they find someone who has had a bone. What’s it called in English, when you help a broken bone with a wood?

DANIEL: A splint.

STEPHEN: Splint, yes.

HEDVIG: Splint?

STEPHEN: Yes.

HEDVIG: That they find a splint. And she says, I think this is one of the earliest signs of civilisation, that someone would splint someone else, that there’s cooperation. You have burials, you have maybe decorative culture. Do you think that those are necessarily tied up with language as well, that we can learn about that?

STEPHEN: I don’t. I don’t. I think this is where the people who sort of got all worked up about the Upper Paleolithic in Western Europe, forgot to look at the ethnographic record. If you just look across the ethnographic record, you see people who’re devoid of representational art. To this day, you could find folks who just don’t do it, right?

HEDVIG: Mm-hmm.

STEPHEN: Jack Goody, years ago, looked into Africa and saw that, “Oh, there’s a correlation between ancestor worship,” tribes that did ancestor worship and representational art and other tribes went into it. So, I think that again, if you just look at the ethnographic record, it’s clear there’s a huge variety of the degrees to which people use symbolic expressions and materialised.

HEDVIG: Fair enough. But what about the cooperative nature of like, yes, splinting a bone or something like that, because I have some peers here at institute here who work with some of the monkeys at our zoo and they sometimes tell me that they see bonobos behave towards their children in ways that are not very kind and friendly. And humans seem… I mean, for all the wars we start and quarrels we have, we seem to generally like and try and help each other. And do you think that’s necessarily tied up with developing a language?

STEPHEN: Well, I think that empathy I sketched [LAUGHS] in the book, it does have something to do rather clearly with language. But just going back to the facts, there are cases of Neanderthals clearly where, I think, there were cases where people didn’t have any teeth and clearly the food had been prepared for them or people had broken bones again. So, it goes way back. Yes, I do think that some kind of empathy is rather a peculiar human trait. That’s to say it’s clear that apes grieve their dead and elephants and other species. So, there is that kind of empathetic thing, but it’s quite restricted in other species. And in our species, it’s clear that if we passed a stranger kid lost on the street, we’d stop and help them. And this is just unthinkable, I think, from a cross-species point of view, probably get eaten if it was a baby chimp. [LAUGHS]

HEDVIG: Yeah, wouldn’t be surprised.

STEPHEN: And I related this to the fact that we got into the business, as some other species have, but not the great apes, of sharing child rearing. And if you’re going to share child rearing, so a chimpanzee mum can’t let its baby out of its sight because it’ll likely get eaten or played with to death. And so, if you lend your kid, your precious kid, out to some other childcare, there has to be some way in which that other person cares about the infant and is going to tend to its needs and so on.

HEDVIG: Yeah.

STEPHEN: And so, that sharing of empathy was a precondition, it seems to me, to being able to outsource childcare. And outsourcing childcare was the key to our demographic success because we reproduce at double the speed of a chimp, because mum is freed up for the next one by having a younger sister or whatever look after the baby or grandma.

DANIEL: And then, when you’ve got empathy, then that leads to being able to sense the feelings of others. And then, you can do things like give and understand signs, understand intention, maybe predict what somebody’s going to do. It sounds like that led to all the things that we’re talking about.

STEPHEN: Yes, yes, I agree. Thank you. Yeah, so that’s exactly right. So, being able to put yourself in the other guy’s shoes, [LAUGHS] as it were, is rather critical to trying to figure out what people mean. And I think one of the reasons why linguists haven’t thought so much about this, although the pragmatists have, of course, but the sort of structural linguist, is that they think language is fully carried by the symbolic code. But the point that I tried to make in The Dark Matter of Pragmatics was, “Oh, wait a minute, it’s not really.” I mean, it is, but it’s sort of triggered by the symbolic code. It’s a lot of inferences that actually carry the full message. And without that inference mechanism, the symbolic code actually wouldn’t work.

HEDVIG: It’s very poor. It’s very…

DANIEL: It’s situational, it’s not syntactic.

HEDVIG: I find it kind of amazing that people would have another outlook, but maybe I’m very unusual, I don’t know. But I remember, like, my first year in linguistics, I was very interested in conversation and pragmatics and how people… Because if you look at what people actually say, and when information gets transferred, there’s a big disconnect. And in my native language, Swedish, there’s a little particle you can use to mean, “I’m saying a thing, and I believe we both know it.” It’s like a marker that means, “You know this, don’t you?” And it doesn’t occur in written language very often because you don’t know who’s reading it, because you can’t really use it in written language that much unless you’re chatting directly with someone.

Also, some people think it’s like an awful spoken language thing that’s slang or something. But it’s this beautiful thing and we don’t understand it and it does so much work. And conversation in general, I’m amazed that it works. I don’t know, I feel like someone like high on drugs sometimes when I think about it. You look at your house, I look at two people talking and I’m like, “How the hell did that get from there to there?”

STEPHEN: Well, I think that wonder is absolutely correct. And one of the things I was trying to convey in the book that we ought to have, this wonder for it because of the incredible multitasking that’s involved. All of these things that we’re trying to do at once and it’s all squeezed into this sort of very compressed little turn, and it has to be compressed because of this kind of bandwidth of our speech production system. And we leave the rest over to sort of the comprehension system to enrich what we said to make sense of it. But yeah, no, I think it’s a total wonder really.

One of the wonders because I’ve looked a lot at transcripts of conversation, that’s what conversation analysts do, and just so rare that people really mis-talked and then a year or a lot later, they say, “Wait a minute, when you were talking about John, you meant that John?” That happens very rarely. So, there are self-corrective mechanisms like repair that sort of… And just the very turn-taking system itself is a sort of self-correcting mechanism, because if I misunderstood you when I respond, then you get a quick chance to fix it there and then. But I think it’s just really miraculous that it works as well as it does. Yeah.

HEDVIG: And it doesn’t work perfectly, but it works pretty darn well.

DANIEL: Enough of the time.

HEDVIG: It’s almost like…

STEPHEN: Enough of the time, yes.

HEDVIG: We sometimes joke on this show and say that language is involuntary telepathy because you can just say things and then an image appears in someone else’s head. I can say, like, a kangaroo riding a zebra and now you all have it in your head.

DANIEL: Stop it. Stop it. Get out of my head. Dang it, Hedvig.

HEDVIG: But you know, it’s not a perfect telepathy and there’s so many more channels that are going on, like we talked about today but also, we talked about recently with Lauren Gawne and her book on gesture as well that we also in conversation we can do so many things that we can’t do in written of course that get unfortunately lost. And we have to do things like… Well, the conversation analysis have all these little symbols, don’t they, of like this dot means one-second pause and this kind of thing means that they tilted their head a little and they’re desperately trying to transcribe all this information that we take for granted. It’s amazing to look at the transcripts.

STEPHEN: The very first people who looked at, film in the 1950s were looking at… This is Bateson and Pittenger and people, they took a film of a therapy session, I think it was, I’m sure it was, and it was a 35-millimeter film. And then, they cranked it back slowly through and did analyses of it. So, this was like the first early thing. And they all tried to invent a sort of notation, a bit like people who notate dance movements and so on, trying to invent for describing everything. But in fact, it’s like so much is happening, it’s pretty much beyond us, I think, to actually sort of notate it all. So, my colleagues at the Max Planck who do work on multimodal interaction, they use multiple, multiple tiers of analyses on any video, looking at the two hands and noting exactly where the two hands are and so on and the eyes and the gaze. But there’s too much going on. It’s just extraordinary, really. Yeah.

HEDVIG: You have to pick your battles and decide, “Okay, I’m going to study mother, child, gaze direction of child,” and note that and nothing else. Otherwise, it’s impossible.

DANIEL: Well, now I’m going to ask kind of a bad question because one time on the show, we challenged each other to answer the question, what one thing explains the most about language? What one thing explains the most about language? And I have challenged you to answer this question too. What would your answer be?

STEPHEN: Well, I, honestly, [LAUGHTER] it seems like a sort of college entrance exam. [LAUGHTER]

DANIEL: Or working at Google or something.

STEPHEN: [LAUGHS] I think I’m going to fail this.

DANIEL: [LAUGHS]

STEPHEN: I think that there’s… That’s exactly the point, there’s not one thing, [LAUGHS] but I suppose that…

HEDVIG: Empathy and collaboration.

STEPHEN: I think… Turn taking is actually pretty interesting if you…

DANIEL: Okay.

STEPHEN: Because… I mean, it seems like a totally mechanical… it’s as far away from syntax as you can get kind of thing. But hey, consider that the length of an average turn is about 1.8 seconds, which is basically the length of a sort of basic transitive clause in most languages. Is that accidental? Probably not. That’s to say, given that the turn-taking system probably antedates… we were arguing that the apes seem to have a sort of turn-taking systems that seems quite remarkably similar in certain ways. So, if it antedates language, then it’s the syntax is adapting to the turn-taking system, not the other way around and maybe you could push that. [LAUGHS]

And then, I’ve also pointed out in the book that quite a lot of sort of the language operation, the need to get things into a particular order and so on, may well have to do with wanting to get bits in the right place for turn-taking reasons. So, if you’ve asked me a question, topicalising something, now I want to get that topical element or the answer to it in the right spot, which is rhematic as they say in your language, whichever organisation that is.

DANIEL: So, topicalising would be, instead of saying, “I saw the UFO,” you would say something like “The UFO, I saw it.”

STEPHEN: Yes, exactly.

DANIEL: And then, I put things in a way that you will understand me.

STEPHEN: Yeah.

DANIEL: Okay.

STEPHEN: So, the need to shovel bits around has to do partly with this sort of thematic structure of the conversation and the topic that we happen to be talking about. So, a lot of the sort of mechanisms that syntacticians worry about, you could ask, why do they exist? Well, they exist for that sort of reason. They also exist for reasons I also pointed out in the book, for politeness reasons [LAUGHS] because if you’re going to ask somebody, well, borrow 20 bucks or something, now you got to think, “Wait a minute, how am I going to put this?”, and you put in some complicated way.

HEDVIG: [LAUGHS]

STEPHEN: So, the operations that take a clause and transform it, [LAUGHS] transformationally as it were, are often motivated by interactional considerations, turn taking one of them.

DANIEL: Wow.

HEDVIG: So, I think, if I may summarise some of what you said, is that something that we maybe haven’t said that’s not an elephant in the room, but maybe unsaid, which is that you believe, Steve, that interaction is what language is for, and it is what explains language, and it is what we do with it and what it’s for. And thinking and our cognitive structures, if anything, is adapted from that interaction. Rather than, that we have a way of forming thoughts in our heads, and then secondarily, language can have a communicative use, which makes so much sense to me. It seems like the only possible stance to have, but it is not the stance that everyone has, is it?

STEPHEN: True.

HEDVIG: So, I propose, Daniel, if I may, that for title of this episode, we could do something controversial like interaction instead of merge or what’s a better one?

STEPHEN: [LAUGHS]

DANIEL: Titles are hard.

HEDVIG: Titles are hard. Empathy instead of merge is even better, I think, but it’s not as precise.

STEPHEN: [LAUGHS]

DANIEL: Well, it’s certainly a bag of tools that we don’t hear about often in, for example, an intro to linguistics class or really just anywhere. We hear about the P’s and the S’s, phonology, syntax, semantics, phonology, pragmatics. But really the tools that we use are not so much syntactic or not so much phonological, they’re interactional. And I’m convinced that’s where it happens.

STEPHEN: Hmm. Well, I think that’s what’s motivating the system. I often have these quarrels with my psychological colleagues. They sort of say, “We don’t want to know where the system comes from or why it’s there. We just want to understand it.” And I think in a way, that’s what syntacticians, they’re treating the system a little bit devoid of its function, really, and of its motivations for how it got to be the way it is. They just want to know, how does the engine work, right?

DANIEL: Mm-hmm.

HEDVIG: Mm-hmm.

STEPHEN: And I think that’s just a different kind of study, which I respect. I don’t have.… written grammar of a language, and I went through the effort of sort of trying to crank out all of those little nuts and bolts that make the language, I respect that. And I think it has its own fun, little eureka moments and so on when you see, “Oh, gosh, there’s a logic to the structure of this system.” But I think that if you’re asking, how come languages of the world have this kind of family resemblance from one to another, then I think you have to ask, what led to them? How did they arise? What motivates the structure? And so, then you’re into a different… If you’re like into a different sort of game. But I think it’s important also for…

And perhaps, would make intro linguistics a lot more interesting for people who haven’t yet got the bug [LAUGHTER] if you sort of look at the matrix that generates the whole thing. If you think about the infant learning a language and you think about their motivations to get into the system, that’s what drives language learning and why infants are so brilliant at it. But the system that drives, that gives children access is this interactional ability which they have. Long before they know any words, they can already engage in a sort of give and take with mum or the caretaker.

HEDVIG: And they have to if they want to get food or snacks or diapers changed, they have no other…

STEPHEN: So, I look at it as the sort of the machine tool that produces languages [LAUGHS] in the right context. Yeah.

DANIEL: Well, in that case, I cannot think of a better working title for this episode then, The Interaction Engine, which coincidentally is the title of the book we’re talking about today. The Interaction Engine is available now from Cambridge University Press. We’ve been talking with the author, Professor Steve Levinson, linguistic lion. Steve, how can people follow you?

STEPHEN: Oh.

DANIEL: Well, I think so. Or do you want people to follow you around? [LAUGHS]

STEPHEN: I don’t have a social media presence, I’m afraid. I’m a dinosaur. I date from before that era. [LAUGHS]

HEDVIG: That’s fair.

DANIEL: Enviable man.

STEPHEN: Yeah, well, it takes too much time for one thing. I…

DANIEL: And it’s no fun. It’s no goddamn fun.

STEPHEN: Okay, okay. Yeah. Yeah.

DANIEL: So, you come to us from the time before 2006, that’s amazing.

HEDVIG: [LAUGHS]

STEPHEN: [LAUGHS]

DANIEL: Oh, my gosh.

HEDVIG: I just wanted to say as well for a lot of our listeners here who have maybe taken an intro to linguistics class many years ago or are taking one currently, if you’re looking for an update on some of the pragmatics and looking for maybe going beyond, I think some of you might be familiar with Grice’s four maxiMs if you want to get an update, both language… The Interaction Engine and The Dark Matter Pragmatics as well, that Steve has produced, that’s two books in not a very long span of time. And you just finished your grammar as well. Are you having any more books in the pipeline?

STEPHEN: I do have another book in the pipeline. Yes, I’m writing a book. Yeah, tentatively titled Mind Tools. It’s partly about what language does to cognition, but more actually about thinking about [LAUGHS] the other things that are that externalise cognition, of which language I think is probably our first great tool in that way, but…

DANIEL: All right.

HEDVIG: Very cool.

DANIEL: Well, we’ll be looking forward to that. Steve, thank you so much for coming on the show and sharing your knowledge with us. It’s been a real honour to be able to chat with you.

STEPHEN: Yeah, well, pleasure.

[BECAUSE LANGUAGE THEME]

[Transcript provided by SpeechDocs Podcast Transcription]

122: The Interaction Engine (with Stephen Levinson)

Timestamps

Listen to this episode

Video

Patreon supporters

Show notes

Transcript

Pages

Contact us

122: The Interaction Engine (with Stephen Levinson)

Timestamps

Listen to this episode

Video

Patreon supporters

Show notes

Transcript

Related Posts

133: Why We Talk Funny (with Valerie Fridland)

132: WotY 2025, the Final Word (with Kelly Wright)

131: Words of the Week of the Year 2025 (live with friends)

Pages

Contact us