Welcome to Entanglements, a new podcast at Undark. In this episode, join hosts Brooke Borel and Anna Rothschild as they explore the question: Will AI Kill Us All? To answer, they’ve brought two experts with differing opinions together on the show, in an effort to find some common ground. The point isn’t to both-sides an issue or to try to force agreement. Instead, the show aims to explore points that can get overlooked in heated online forums or in debate-style media.
Their guests this week are Daniel Kokotajlo, a philosopher and former forecaster at OpenAI, and Arvind Narayanan, a computer scientist at Princeton University and the director of the school’s Center for Information Technology Policy.
Below is the full transcript of the podcast, lightly edited for clarity. New episodes drop every Monday through the end of the year. You can also subscribe to Entanglements at Apple Podcasts and Spotify.
Brooke Borel: Welcome to Entanglements, the show where we wade into the most complicated scientific debates in the news today.
Anna Rothschild: Debate is really important for good science to thrive, but it can often get heated and contentious.
Brooke Borel: So we wanted to try something different. Can we bring people together who have similar expertise but vastly different opinions to try and find some common ground?
Anna Rothschild: By the way, I’m Anna Rothschild, I’m a science journalist.
Brooke Borel: And I’m Brooke Borel, articles editor at Undark. Now I want to be very clear: We are not here to debunk silly claims like the Earth is flat or climate change is a hoax.
Anna Rothschild: Right, absolutely none of that. But in seeing where two respected experts agree and disagree, we hope we can help our listeners find some truth in the midst of what can often feel like chaos. So let’s get started. Brooke, what is today’s episode about?
Brooke Borel: This episode, we’re asking the question: Will AI kill us all?
Anna Rothschild: Yeah, this feels really science fiction-y, but, you know, I guess we do live in a world now where ChatGPT can write poetry and screenplays. And it’s becoming a, you know, much bigger part of our lives.
Brooke Borel: Yeah, so pretty far away from doom, right? But still, it is getting smarter and it’s getting more useful. Even five years ago, some people working with AI would have scoffed at this idea that it’s dangerous. But there’s also been some longtime fears over this technology as well. And in 2023, some 300 AI experts signed an open letter. It was a little alarming. It had just one sentence. Do you want me to read it to you?
Anna Rothschild: Oh, please, yes.
Brooke Borel: Okay. Here it is: “Mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks, such as pandemics and nuclear war.”
Anna Rothschild: I mean, that just couldn’t be more terrifying.
Brooke Borel: Yeah, I mean, right. We already know what it’s like to live through a pandemic. Yeah. And we’re probably all afraid of nuclear war, I would assume. So it is kind of intense. So I talked to two experts with these opposing opinions.
Anna Rothschild: Ooh, I’m really excited to hear.
Brooke Borel: Yeah. It was great. So let’s start with our so-called AI Doomer.
[Music]
Brooke Borel: Do you think that AI is going to kill us all?
Daniel Kokotajlo: I mean, I think it might kill us all. I think I’d probably put, like, I don’t know, 20 percent chance of that. And then I would say something like 70 percent chance that some sort of catastrophe happens. Although, possibly not a catastrophe that involves killing us all.
Brooke Borel: So that’s Daniel Kokotajlo. He’s a philosopher and he used to work at OpenAI, which is the company that developed ChatGPT and DALL-E and other AI products. Daniel mostly did forecasting at the company, which is basically predicting how a tech may evolve and how we might use it.
While he was at OpenAI, he signed that big open letter claiming that AI is an existential threat. And he wasn’t the only person in the company to do that. Even Sam Altman, the CEO, signed the letter. A bunch of other people did as well. But unlike a lot them, Daniel quit OpenAI kind of famously earlier this year. He told me that he lost confidence that the company would behave responsibly as this tech improves.
Daniel Kokotajlo: I think that they’re sort of in ‘move fast and break things tech company startup mode’ which is, you know, fine for regular startups but, I think it’s utterly unacceptable for artificial general intelligence, something that powerful. Imagine if they were building nukes, for example.
Anna Rothschild: Wait, can I pause this for a sec? Okay, so if Daniel’s going to be making this comparison to, like, nuclear warheads, I think we probably should define artificial general intelligence, you know
Brooke Borel: Yeah. That’s fair. So that’s the point at which AI reaches the same intelligence as humans.
Anna Rothschild: Okay, gotcha.
Brooke Borel: And to be clear, we are not there yet. Okay. So at the moment in AI research, computer scientists are mainly using this approach called machine learning and they’re building algorithms that can basically learn on their own.
Anna Rothschild: Okay. I think I understand this, but can you give an example just to be sure?
Brooke Borel: Yes. So let’s say you have an algorithm and you want to teach it how to identify photos of horses.
Anna Rothschild: Okay.
Brooke Borel: Yeah. Why not, right? Everyone needs that. So you show a picture of a horse to the algorithm, and it’s labeled, here is a horse, and you keep doing that. Computer scientists call this training. And then after a while, the algorithm can just look at new pictures of horses and say, hey, it’s a horse.
Anna Rothschild: Right okay. Brooke, important question. Side note. Were you a horse girl?
Brooke Borel: I was. I am. Should I admit that? That’s fine. It’s fine. Now everyone knows.
Anna Rothschild: Own it. Own it. I mean, that sounds great if you’re a horse girl, but it doesn’t sound so powerful yet, right?
Brooke Borel: Sure. Yeah, it’s a silly example, but these sorts of algorithms are used in things like self-driving cars and Netflix uses them to predict what movies you might like. And companies like OpenAI also use this approach more or less to build their own products like ChatGPT, the chat bot that’s really popular and stuff like that.
So what Daniel and these other scientists who signed that letter are worried about is a future step, artificial general intelligence, or AGI.
So basically, this is an AI that can accomplish all economically valuable tasks on the planet on its own—or at least that’s what the experts I spoke with for this episode agreed on for a definition. So we’ll stick with that. And then after AGI, we maybe hit something called ASI or artificial super intelligence. And that’s when it goes beyond even our intellectual capabilities.
Daniel Kokotajlo: At some point in the future it seems like we will have AI systems that outcompete humans at all the relevant cognitive tasks. They can be better lawyers, they can be better coders they can be better scientists, they can be, better…
Brooke Borel: Journalists?
Daniel Kokotajlo: Journalists. You know, just like go through, go through all the different cognitive tasks. There will be AI systems that are better than the best humans at all of those tasks at some point in the future.
Perhaps it can’t dance because it doesn’t have, you know, legs. But all the like tasks you can do by thinking or talking, all the tasks you can do on your computer, it’s as good as the best humans at it. And then the idea behind ASI or super intelligence is that it’s not just as good as the best humans, it’s much better.
Brooke Borel: But it still can’t dance.
Daniel Kokotajlo: But maybe it still can’t dance because maybe it still doesn’t have legs.
Brooke Borel: Okay.
Daniel Kokotajlo: Although, probably it’ll get legs pretty fast in that sort of scenario. You know, the robots exist and you can build them and hook them up and so forth. It is very much a sort of science fiction concept. However, it’s also going to be science fact in the near future.
Brooke Borel: And just how soon does Daniel think we’ll reach artificial general intelligence?
Daniel Kokotajlo: Fifty percent chance by 2027. That’s the bottom line for my current estimate for how far away AGI is.
Brooke Borel: Fifty percent by 2027 have AGI—
Daniel Kokotajlo: By the end of 2027.
Brooke Borel: By that year.
Daniel Kokotajlo: Someone will have AGI.
Brooke Borel: That’s pretty soon. That’s pretty soon.
Anna Rothschild: Wait, just to make sure I understand. So that’s Daniel’s estimate for when we’ll create AGI, but that doesn’t mean that our extinction is imminent, right? Like we could make a good AGI, right?
Brooke Borel: Yeah, like an AGI that’s not going to kill us all. Yeah, I’d like that. So I asked him about this and he says he could see a range of possibilities, but his main point is this.
Daniel Kokotajlo: You know, humans have driven lots of species to extinction. Almost always it’s not deliberate. It’s rather just a side effect of us doing what we do want and not actually caring that much about whether these species live or die. And so, similarly, I would say that the reason to be concerned that we could all die from loss of control to AI would be if our alignment techniques fail in a way that results in an AI that doesn’t particularly care about whether we live or die.
Brooke Borel: So here you may be wondering, what the heck is an alignment?
Anna Rothschild: Yes I was.
Brooke Borel: I figured, I figured. So this is something Daniel worked on a bit at OpenAI, and it’s an area of research where the goal is to make sure that the algorithm is aligned with its creator’s intentions. So in other words, will this algorithm obey its human overlords?
Anna Rothschild: Yeah, that sounds pretty important.
Brooke Borel: Yeah yeah. So the worry here is what happens if we think our algorithms are aligned, but they’re not, and we trust them to perform a task that we assume will be done with our best interest in mind, and they have other plans.
Daniel Kokotajlo: One thing leads to another, and then you end up with these very powerful, very smart agents that have a different vision for what the world should be like than humans, you know.
Brooke Borel: And they’re in control.
Daniel Kokotajlo: That’s right. And then, you know, things could not actually be that bad, right? It depends on what their different vision is, right? It’s also entirely possible I would think that, that it wouldn’t involve that. And then that’s where you get into like the all humans getting killed sort of scenario.
Brooke Borel: One of the things that Daniel is worried about is that not only will AGI have alignment issues, but humans won’t be able to fix them because we don’t even really understand how our own algorithms work.
Anna Rothschild: Wait. Are you kidding?
Brooke Borel: Yeah, no. No.
Anna Rothschild: That’s crazy! How is that even possible?
Brooke Borel: I know it’s bananas. Just listen.
Brooke Borel: So if we’re building this tech—we, you know, computer scientists, whatever, are building this tech—how do they not know how it works?
Daniel Kokotajlo: The main thing to say here is that we’re working with artificial neural nets now. That’s what modern AI is. So it’s sort of like code that nobody wrote, but rather that evolved or grew.
Brooke Borel: To clarify, neural networks are one approach used in machine learning where the algorithms are sort of inspired by the neurons in our brains, and then they make new connections as they learn new things, just like we do.
Daniel Kokotajlo: So, you know, the neural nets start off a random bag of, like a random spaghetti tangle of circuitry. And so at the end of lots of training, your bag of circuits has sort of evolved into a shape that’s very effective at scoring highly in this environment. But you don’t know how it does that.
Brooke Borel: Basically, computer scientists may know how an algorithm starts out. But once they start training it — so showing those pictures of the horses or whatever — the AI is making new connections and it’s starting to recognize patterns. And then at some point, computer scientists can’t really go back to understand all the pathways that the algorithm is using because it’s just too complex to trace.
Daniel Kokotajlo: There is a field of interpretability that’s trying to design better methods for taking these trained neural nets and understanding what and how they’re thinking. But we’re not there yet.
Brooke Borel: When do you think we will be there?
Daniel Kokotajlo: I bet 10 more years at the current pace of interpretability progress would, probably be enough, you know? That’s my current guess, but obviously it’s hard to predict.
Brooke Borel: All of this is why Daniel thinks we need regulation now to get ahead of AGI, a development that, remember, he thinks is only a few years away.
Brooke Borel: For people who argue that the tech is nowhere near advanced enough to warrant mitigating against an existential threat, what is your response to them?
Daniel Kokotajlo: If you thought that AGI was still, you know, 15 years away, then it would be too early to really regulate it. We want to instead just see how the situation develops, right? However, I think that if you pay attention to what’s happening in the field, you should think that there is a decent chance that it could happen very soon.
[Music]
Anna Rothschild: Brooke, I have to admit that I was much more scared by that conversation than I thought I was going to be. Like, I assume Daniel thought there was a small chance of catastrophe, obviously, um, given, you know,
Brooke Borel: Yeah. He’s our, he’s our guy.
Anna Rothschild: But the fact that he thinks there’s a 70 percent chance that there’s some sort of horrible catastrophe, I mean, that is just so much higher than I was expecting.
Brooke Borel: Yeah, so that number is called P(Doom) and it’s a probability of doom or a probability of a catastrophic event.
Anna Rothschild: I mean, even that sounds scary, P(Doom).
Brooke Borel: I know, and his P(Doom) is quite high. But before you throw your computer out and move off grid to a cabin in the woods or whatever, I do have someone to introduce you to who has a more optimistic view.
Anna Rothschild: Okay. Awesome. Yes. That would be great.
[Music]
Brooke Borel: So the big question of the day is: Will AI kill us all?
Arvind Narayanan: I would have to say no.
Brooke Borel: This is Arvind Narayanan. He’s a computer scientist at Princeton University and the director of the school’s Center for Information Technology Policy. And his job is basically to understand the society impacts of technology — and in particular, artificial intelligence. And notably, he did not sigh that big open letter that came out in 2023. The one, again, comparing AI as a threat on the level of pandemics and nukes.
Arvind Narayanan: Before talking about the future, let’s talk about history for a moment. The people claiming that AGI is near, and people talking about AI doom have a track record of being wrong for 70 years.
Brooke Borel: Take when the early neural networks were built in the late 1950s.
Arvind Narayanan: The New York Times declared, quoting some of the people behind this effort, that computers were soon going to be able to walk, talk, see, and be conscious of their existence, reproduce, various other things. There is a very long history of over optimistic forecasts. And it’s not just that they were wrong about timing. They were wrong in a deeper categorical sense. We know now that the kind of AI architectures that people were working with back then were just fundamentally incapable of coming anywhere close to the flexibility of human intelligence, of coming anywhere close to the level that would pose a threat to humanity. So why do we think they’re right this time around?
Anna Rothschild: You know, I see Arvind’s point here, and it does make me feel a little bit better, but, you know, isn’t our technology just actually a lot better today than it used to be?
Brooke Borel: Well, yes and no. So Arvind says the best AI built today are things like large language models. And this is a subcategory of machine learning, and these are algorithms that can basically produce text. And they are trained on these massive sets of text based material.
Anna Rothschild: Yeah, I’ve heard of this before.
Brooke Borel: Yeah, so for example, ChatGPT, that’s an example of a large language model. And they call this an LLM for short. He argues that while LLMs are impressive, they really aren’t that smart.
Arvind Narayanan: Something that’s remarkable about language models is that they’re reasonably good at playing chess. And the reason it’s good at playing chess is because there are billions of games, literally billions, of chess games that are recorded online.
Brooke Borel: So the AI has gotten a text recap of all these pre-recorded games, and that’s how it knows the difference between a good move and a bad one. Let’s say you change the rules of chess though, and now the rook moves completely differently on the board than it used to. A human chess player, they’re going to be reasonably good at adapting to that new set of rules.
Arvind Narayanan: Whereas language models, it just, it just qualitatively, fundamentally, categorically incapable of doing this kind of thing, right? They’re just going to be very strongly constrained by what’s in their training data, and, they’re not going to be able to adapt in the, in the way that, that people can.
Brooke Borel: And this goes for all LLMs, not just those that are focused on chess.
Anna Rothschild: Okay, fine. But, you LLM or anything else, I mean, don’t AIs learn as we use them? At least I thought they did. Maybe I’m wrong.
Brooke Borel: Yeah, according to Arvind, the chatbots and other AI tools that we use today are actually not learning from their interactions with us. I mean, they kind of are, but not in real time, right? They might learn from a static database of previous interactions in some cases, but even then the chatbots are just relying on that training material. They’re not learning in real time from whatever we are typing into the computer.
Anna Rothschild: Oh interesting.
Brooke Borel: Yeah, so the companies that say otherwise, maybe they’re marketing something to you, maybe making it sound a little smarter than it is. In fact, scientists have been trying to build these more flexible AIs that can learn from users, but it hasn’t worked so far.
Arvind Narayanan: So when you look at that kind of flexibility, these AI systems, you know, are not even at the level of a toddler.
Brooke Borel: So I’m not a computer scientist or a science historian, but it does seem like over that long history, there’s been an acceleration in these advances. is that not right?
Arvind Narayanan: It depends on what one means by acceleration. Certainly, by some definition, there’s acceleration going on. There’s no question about it. So, you know, a few decades ago, the kind of acceleration that most people were talking about is the clock speeds of computers were increasing. And so they were wondering what this means for AI, you know, faster and faster computers, what would they be able to do?
But pretty soon we realized that faster calculation only gets you so far and you need new approaches. And now with language models, there is acceleration in the sense that throwing more and more data at it.
But now, companies are already throwing all of the data on the internet at these language models, right? So, and so the more sophisticated way to look at this acceleration is what is called stacked S curves.
Brooke Borel: Think about the bottom of the letter S. It starts out rising upward exponentially, but eventually it just levels out. So then scientists develop new technology and then a new S curve starts to rise from there.
Arvind Narayanan: And so when you’re standing in any one particular S curve, it always feels like you’re in an exponential. It always feels like you’re at a special moment in history. But throughout, people have felt that they were at that special moment in history.
Brooke Borel: Do you think that if we did reach AGI, that would be something to be worried about?
Arvind Narayanan: I wouldn’t say worried about, but I would say, you know, the implications for humanity would be huge, hard to overstate. And what I mean by AGI is AI that’s capable of automating Any economically valuable task. And you know, again, by definition, if you could do that, then you have a world where people technically don’t need to work.
So that could be a utopia or a dystopia or anywhere in between, right? So, you know, we might all have so much time for leisure and creative pursuits and AI would be working for us, or AI is going to put us all out of a job and we don’t have social safety nets, or, you know, any other. The outcome is also possible.
I think this is primarily a matter of how we govern AI and what policies we put in place. It’s less a matter of how the technology behaves.
[Music]
Anna Rothschild: So, Arvind makes it sound like we should be more concerned about policy than actually how smart the algorithms may or may not get, right?
Brooke Borel: Yeah, that’s right.
Anna Rothschild: Okay. But what about the alignment stuff that Daniel was talking about? Like, it still blows my mind that we actually don’t know how these things work. That seems super important. Like, don’t we need safety measures to make sure the algorithms are actually on our side?
Brooke Borel: Yeah. Yeah. So I promise we’re getting there. I got Daniel and Arvind together to discuss just that and a lot more.
[Music]
Daniel Kokotajlo: Maybe what I should do is I should describe the future as I project it. And then you can describe the future as you project it and we can see where they start to diverge.
Arvind Narayanan: Yeah let’s do it.
Daniel Kokotajlo: What I would say is going to happen, is sometime in the next few years, we’ll get to sort of like automated research engineer stuff.
Brooke Borel: In other words, Daniel predicts that soon we’ll have AIs that can write code. And then they’ll have the ability to help with artificial intelligence research.
Daniel Kokotajlo: I think that that accelerates things, so then you’ll be in the regime where it’s like, oh, our AI assistants are great at writing all this code for us. But they’re not so good at research taste. Like, if you tell them to just automate the research process, they’ll end up doing a bunch of experiments that don’t actually teach you much, or they’ll misinterpret the experiments that they’re doing.
Brooke Borel: But Daniel says that big AI companies will train their AI to have better research taste. That is the ability to choose the right problems to work on, the right things to learn, and then they’ll be able to fully automate the AI research process.
Daniel Kokotajlo: There’s still not AGI because they still don’t have maybe some other real-world skills. Maybe they’re bad at being a lawyer. Maybe they’re bad at being a therapist.
But they’ve worked out all the stuff involved at doing AI research. Now I think the pace of AI research going on inside one of these companies will be significantly faster than it is today. Like at least 10 times faster.
Brooke Borel: And at that point, Daniel says it could just be a hop, skip, and a jump to learning those other skills like being a lawyer or a therapist.
Daniel Kokotajlo: That’s my prediction for what the technical side will go. In terms of timelines, I would say something like probably less than a year from that research engineer milestone to something that I would call superintelligence. If 2030 comes along and this hasn’t happened yet, then I’m wrong. Okay, over to you. What do you think the next, you know, five, six years are going to look like? What do you think the next ten years is going to look like?
Arvind Narayanan: Yeah, there’s a lot in there. Let’s unpack that. So the first part of what you talked about is automating AI research. And I agree that there’s a nontrivial chance that that might happen in, the next five years, the next ten years.
I think currently the ability to do that is crap. Pardon my language, but, to me, AI research is one of the most automatable things. It’s like nowhere near the complexity of some of the other things we’ve talked about, like being a doctor or something like that, so that’s where I think my predictions really start to diverge from yours, not just in terms of timeline, but in a more qualitative sense. My prediction is that we’re going to find, some of these social factors to be intrinsic barriers to further technical development. I think the ability to extrapolate from this AI researcher to the skills that are needed in, thousands of different professions is a huge gap.
Brooke Borel: In other words, there are limitations on how well these things are going to be able to adapt to the many, many different thought-based tasks that we humans are able to do.
Daniel Kokotajlo: I think that by the time you have the automated AI researchers, you’ll be able to make new paradigm, shifts. New major advances. That, take you towards, much faster learning. For example, that’s comparable to the speed of human learning. So that you can learn on the job.
Arvind Narayanan: All of that is possible. I, you know, can’t logically argue that that cannot happen. My prediction is that it’s unlikely.
Daniel Kokotajlo: I’d be curious, what’s your credence that, you know, we get to AGI by 2029 or 2030.
Arvind Narayanan: That’s an easy one. I mean, I can only give an upper bound. I’m comfortable giving an upper bound of 1 percent.
Daniel Kokotajlo: One percent. Okay.
Brooke Borel: So very different than Daniel.
Daniel Kokotajlo: Well, that’s a big difference. Yeah.
Brooke Borel: And Daniel, remind us what yours is.
Daniel Kokotajlo: I say 50 percent by the end of 2027. So we’ll see who’s right.
Brooke Borel: We can meet back here if we’re all still alive. Kidding!
Arvind Narayanan: To me, that’s only an upper bound. If someone said, you know, 10 to the minus six, I would say that’s a perfectly reasonable thing to believe. I would think, 10 to the minus six to me is more believable. You could more easily convince me of a value of 10 to the minus six than you could have 50 percent, I think.
Daniel Kokotajlo: Interesting. But when I gave my story, you were like, yeah, that’s possible. But by that’s possible, you mean like 10 to the minus 6 possible, not like 10 percent possible?
Arvind Narayanan: I mean, yeah, I would say, maybe there’s a 1 percent chance of something like that, but that’s just one of many, many things that need to happen, each of which I think is independently unlikely.
Brooke Borel: I want to ask a question about another thing that you may not agree on. There’s this whole area of research called alignment, and that’s basically making sure that these algorithms are obeying us and doing what we intended for them to do. And then there’s also, this idea of interpretability. Can you talk to each other about that? I am so curious to hear what you have to say to each other on that.
Arvind Narayanan: My position is not that it’s not important for safety. I think they are important, and I’m very glad that people are working on those, it’s just that I don’t think our ability to understand the internals of AI models should be on the critical path towards achieving the safety objectives that we want in society.
Brooke Borel: Arvind thinks that we should approach AI safety the same way we keep ourselves safe from other humans. Our brains are also black boxes, and we can’t trust that everyone’s intentions are aligned with what’s best for humanity, right? We still have systems in place to protect ourselves from a rogue person. We put locks on our doors, for example. So he thinks it’s more important to have systems and policies in place than to make sure that every last algorithm is going to behave.
Arvind Narayanan: Our interventions cannot be targeted at AI models and systems because we don’t control the AI models and systems that our adversaries are using.
Daniel Kokotajlo: I would say that before we get to AGI, we don’t really have to worry that much about loss of control. And when we don’t have to worry that much about loss of control, we can basically treat these things as tools, and the question is like, who has them and what do they do with them?
And I think that in that sort of regime, the answer is we should try to spread out the power so that, it’s not all concentrated in a tiny group of individuals, and so that, society can gradually start to adjust and wake up to the new stuff.
I would also say we should try to keep it out of the hands of the bad actors, but I think that, as you say, that’s going to be really hard to do very effectively. So I think it’s sort of like, make decisions on a case by case basis, you know. But I think we might be on the same page pre-AGI. We might also be on the same page post-AGI, too. I don’t know what you’d say we should do once we actually get to AGI.
Arvind Narayanan: So, just to understand your position, is it that pre-AGI we should be comfortable with, you know, proliferation.
Daniel Kokotajlo: Yes.
Arvind Narayanan: But then once we reach this threshold, assuming we can define what it is, then we have to get into a new mode of governance where, like, all the governments of the world unite and keep AI out of the hands of quote unquote bad actors? Okay.
Daniel Kokotajlo: Uh no actually. So, I think we’ll have bigger problems than bad actors at that point.
Arvind Narayanan: Okay. Yeah. Yeah. Okay.
Brooke Borel: Is it that alignment doesn’t maybe matter so much before we hit AGI, but then after it does become increasingly important? Is that something you would both agree?
Daniel Kokotajlo: I would say so. Yeah.
Brooke Borel: And Arvind?
Arvind Narayanan: No. I mean, I think if we’re ever in a position where we’re relying on alignment to keep us safe, then we’ve already lost.
Brooke Borel: Okay.
Arvind Narayanan: I think even after AGI…
Daniel Kokotajlo: That’s, okay I like that. Maybe I’ve been too gaslit by working at OpenAI for two years. Maybe I should have higher standards for levels of safety after we get to AGI.
Brooke Borel: I want to hear more about that.
Daniel Kokotajlo: It would be nice if we didn’t have to rely on the alignment of our AGI superintelligence, you know? It’d be nice if we could have some story for why this is all fine that’s not just like ‘We trained it in the following way and therefore it has the goals we wanted it to have. And then it’s going to be let out into the world and do all sorts of crazy things. But it’s going to be fine because it has the right goals.’ I think that’s sort of what you were saying, Arvind, right?
Brooke Borel: So right here, Arvind starts talking about some unpublished research from his group.
Arvind Narayanan: Our fundamental idea is that superintelligence already presumes a loss of control. So, if you’ve not designed AI systems so that you give it this autonomy, if you’ve not even put yourself in a position where you have rogue power-seeking AIs, then I think the constraints that we put from the outside on AI systems instead of tinkering with the models through alignment are going to ensure that they don’t rise to a level that, in our definition, constitutes superintelligence.
Daniel Kokotajlo: Nice. Okay. I’m excited to read your paper.
Brooke Borel: I like how much agreement we have. It sounds like one of the main disagreements is when we’re going to hit AGI. I am curious, is there any middle ground on policies for when we reach that, or is, it even possible to talk about that if one of you are predicting it’s going to happen decades from now, if at all, and the other one perhaps in the next few years?
Arvind Narayanan: My intuition is that it might be actually easier to get agreement on policy in many ways than some of the empirical claims about either the timeline or the behavior of AGI.
Daniel Kokotajlo: Nice. Let’s try. So I think that’s actually a kind of quite a bold claim that you just made. Because how could we possibly agree on policy when we have such different views about the like sheer magnitude of the technology?
But maybe we can. What I’ve recently been advocating for is basically transparency requirements. You know, if AGI is coming soon, then laws that force companies to keep the public and the government informed about their progress towards AGI are good for spreading out the power and helping everyone to wake up and realize what’s happening before it’s too late.
Arvind Narayanan: So strongly support, yeah, strongly support transparency. I think they’re very helpful against the worlds that I’m concerned about.
Brooke Borel: I’m curious to hear about your thoughts on how you would approach that on an international level, because certainly within a country, maybe that would be slightly more easy to do. But if we’re talking about these technologies that could affect us globally, and if one country is doing things very differently and more secretively than another, what happens then?
Daniel Kokotajlo: So, here’s what I would like to see happen. And, you know, I say 70 percent chance of doom, or whatever. But there’s a 30 percent there. Like, I can see ways that this could go quite well.
The world sort of like pulls together and handles it responsibly. And here’s an example of how that might go. We get the transparency requirements, at least in the U.S., and all the major leading corporations, you know, need to keep the public and the government updated on their progress. Not necessarily showing their algorithms or anything like secret like that, but just saying like, here’s what our systems are capable of.
Brooke Borel: And his hope is what happens here in the U. S. will have a big influence on what happens abroad, since at least right now, the major AI companies are based here.
Daniel Kokotajlo: If the public is informed then I think that there’s a good chance that there’ll be a nice big public conversation about this, and various interest groups will argue with each other, and some sort of deal will be reached for how to manage this sort of thing in a sort of more democratic way where lots of different people get to weigh in on it. Not just the tech company and not just the executive branch, Manhattan-Project-type people.
Brooke Borel: Arvind, what do you think?
Arvind Narayanan: Yeah, I mean, that all sounds reasonable to me. I think getting international cooperation on anything is going to be hard, but, you know, a prerequisite for getting there is public support and obviously this is an issue that the public is paying attention to. Whatever type of regulatory regime you’re thinking about getting support for transparency, however hard it is, is easier than getting some UN-type organization to actually control AI development. You know, it seems worthwhile to shoot for.
Daniel Kokotajlo: I expect no major international agreement to be reached until we’re already in a huge crisis. But crises have a way of motivating these sorts of things.
Brooke Borel: I have one more question to ask. I’m so curious if the two of you have agreed more than you expected to.
Daniel Kokotajlo: Yes.
Arvind Narayanan: Yes.
Brooke Borel: Do you think that gives you any hope for this? Because this has become a very pitched debate, especially online, but also in other forums. Does this give you hope for these kinds of debates going forward?
Arvind Narayanan: I think so. Yeah, I think a lot of the polarization is an artifact of social media.
Brooke Borel: And then something very sweet happened when we were wrapping up.
Daniel Kokotajlo: Thank you very much, Arvind, this has been much more pleasant than I expected, and I had high expectations. If you ever want to meet up afterwards and just talk about stuff, I would be very interested in that.
Arvind Narayanan: That sounds great. Yeah. And for my part, thank you, Daniel. I have to admit, I was a little jittery about 70 percent P(Doom). I was worried that this was going to be a conversation full of messianic warnings, but I think this was really fun. We talked substance and we identified the sources of our disagreement.
Brooke Borel: Look at this. You thought it was going to be a debate and now everyone’s making friends. It’s great.
Daniel Kokotajlo: Yeah. It’s great.
[Music]
Anna Rothschild: Oh, I love that. That makes me feel so much better.
Brooke Borel: Doesn’t it?
Anna Rothschild: Yeah.
Brooke Borel: Okay, so I have a big question for you though. How did you feel when you started this episode in terms of will AI kill us all? And how do you feel now?
Anna Rotschild: Right. So, I will say that I sort of went into this assuming that the claims that AI was going to kill us all were sort of overblown. Like, it just seemed too kind of science fiction dystopia to me, you know? So I guess my P(Doom) was pretty low.
Brooke Borel: It was low, yeah. It was low.
Anna Rothschild: Yes. Um, but I knew I have to say very little about this topic, to be fair. So starting off by hearing Daniel, um, have a very high P(Doom), that was way scarier than I assumed. Like, I assumed that a high P(Doom) would be something like 20. You know, something like that. but that being said, you know, hearing the two of them talk and find a lot of points of agreement actually made me feel a lot better. So I guess maybe I am slightly more scared than where I started off. But in a way that makes me feel, I guess, more like a better-informed citizen, you know. I feel like I actually can have a take on AI policy at this point.
Brooke Borel: Great. My job here is done.
Anna Rothschild: Good.
Brooke Borel: So I felt more nervous at first. I think my P(Doom) was higher than yours, probably. Not to the level of Daniel’s P(Doom), but higher than yours. And I do feel a little bit better now, but I’m still unsure if we actually have the political will to do anything about AI.
There’s no federal regulation looking at this really in any big way, and like regarding specifically the development of the technology, there was a proposal out in California recently that would have put such a law in place, but it got shut down.
Anna Rothschild: Oh okay.
Brooke Borel: Yeah. And there are a lot of concerns that we didn’t get into in this episode too, right? Like actual things that are happening now. Environmental issues on the servers and labor issues. And if we do get to the point where these chatbots or these other AIs are able to do work, will we have universal guaranteed income if they come to take our jobs? Right?
Anna Rothschild: Right.
Brooke Borel: But when it comes to AGI specifically and like the existential threat that they’re talking about in that big open letter, I do feel a little bit better. And I feel as though people who are actually really involved in this world may have some more common ground to actually get somewhere interesting in these heated conversations.
Anna Rothschild: Yeah, I mean, I really do think the fact that two people with very different points of view could come together and actually agree and want to get coffee, that really does make me feel like there’s hope, you know?
[Music]
Brooke Borel: Yeah. Let’s end on hope.
Anna Rothschild: Perfect, yes, let’s end on hope. That’s it for this episode of Entanglements, brought to you by Undark magazine, which is published by the Knight Science Journalism program at MIT.
Brooke Borel: The show is fact-checked by Undark deputy editor Jane Reza. Our production editor is Amanda Grennell and Adriana Lacy is our audience engagement editor. Special thanks to our editor-in-chief, Tom Zeller Jr. You can reach us at [email protected]
Anna Rothschild: Thanks for listening. See you next time.
[Music]