Notes on Existential Risk from Artificial Superintelligence

Earlier this year I decided to take a few weeks to figure out what I think about the existential risk from Artificial Superintelligence (ASI xrisk). It turned out to be much more difficult than I thought. After several months of reading, thinking, and talking with people, what follows is a discussion of a few observations arising during this exploration, including:

Introduction

What follows is presented in an interview format. It's not actually an interview, but rather an idealized distillation of conversations I've had with many people. I chose this unusual form after struggling with a more conventional essay or paper form; I think such forms imply more confidence than warranted in most discussions about ASI xrisk. An interview seems a more appropriate mix of evidence, argument, and opinion. Some of the material covers background that will be known to people well read on ASI xrisk¹. However, there are also novel contributions – for example, the discussion of emergence² and of the three xrisk persuasion paradoxes – that I believe are of interest.

"Do you believe there is an xrisk from ASI?" Yes, I do. I don't have strong feelings about how large that risk is, beyond being significant enough that it should be taken very seriously. ASI is likely to be both the most dangerous and the most enabling technology ever developed by humanity³. In what follows I describe some of my reasons for believing this. I'll be frank: I doubt such arguments will change anyone's mind. However, that discussion will lay the groundwork for a discussion of some reasons why thoughtful people disagree so much in their opinions about ASI xrisk. As we'll see, this is in part due to differing politics and tribal beliefs, but there are also some fundamental epistemic reasons intrinsic to the nature of the problem.

"So, what's your probability of doom?" I think the concept is badly misleading. The outcomes humanity gets depend on choices we can make. We can make choices that make doom almost inevitable, on a timescale of decades – indeed, we don't need ASI for that, we can likely⁴ arrange it in other ways (nukes, engineered viruses, …). We can also make choices that make doom extremely unlikely. The trick is to figure out what's likely to lead to flourishing, and to do those things. The term "probability of doom" began frustrating me after starting to routinely hear people at AI companies use it fatalistically, ignoring the fact that their choices can change the outcomes. "Probability of doom" is an example of a conceptual hazard⁵ – a case where merely using the concept may lead to mistakes in your thinking. Its main use seems to be as marketing: if widely-respected people say forcefully that they have a high or low probability of doom, that may cause other people to stop and consider why. But I dislike concepts which are good for marketing, but bad for understanding; they foster collective misunderstanding, and are likely to eventually lead to collective errors in action.

"I can't believe you take this ASI xrisk stuff seriously!": I have (very lowkey!) worried about it since being a teenager in the 1980s. The only thing that's changed recently is the urgency. Growing up, I was mostly extremely optimistic about science and technology, but also worried about many concomitant problems: the ozone hole, nuclear weapons, acid rain, climate change, ecosystem collapse, overpopulation, grey goo, peak oil, and many other risks, including what I would today call ASI xrisk. Those are mostly not xrisks, but they are (or were) reasonably thought to be potentially catastrophic, affecting hundreds of millions or billions of people. Many of my teachers in school were very pessimistic about issues like these, and I absorbed some of that pessimism. For instance, I grew up thinking nuclear Armageddon was likely, especially after reading the book "The Cold and the Dark" as a ~14-year old, about the consequences of a large-scale nuclear war.

Later, in my twenties, I noticed that many of the direst predictions I worried about as a child or teenager didn't seem to have come true. Sometimes that was because the timescale was longer than a decade or two, but in some cases the issue had been largely addressed, or had faded out of collective consciousness. The ozone hole was, for example, largely addressed by the Montreal Protocol. Acid rain seems to have faded as a concern, for reasons I only partly understand. But such issues have been replaced in public consciousness by other existential worries; indeed, existential worry seems more conserved among humans than one might a priori suspect. I gradually came to believe that there is often a type of hubris in pessimism: it's easy to confuse "I [and my friends] don't know of a solution" to some big problem with "there is no solution". Often, civilizational problems are solved in ways anticipated by very few people in advance.

As an example, in the 1970s and 1980s there was much fear about "the population bomb", the apparently unstoppable and likely calamitous exponential growth of the world population⁶. This very real fear is hard to remember today, when the population bomb has been mostly defused by urbanization, birth control, and improved food production and distribution – indeed, many people are today more concerned about the prospect of a population implosion. But in the 1970s and 1980s many people had feelings of doom about overpopulation akin to today's climate or AI pessimists. As another example, over the past decade I've been pleasantly surprised by progress on climate and renewable energy, especially the enormous drop in prices of solar and wind energy, of batteries, and of other modes of decarbonization. There's still a huge amount to be done, and I know climate pessimists will poo-poo my optimism; still, I think the optimism is correct. There will be (indeed, already are) considerable impacts of climate change, but those impacts will be far less severe than many suspected ten years ago.

For decades I hoped and believed a similar kind of unexpected progress would make worries about ASI xrisk obsolete. And so I put aside my concerns; if anything, I was very optimistic about the long-run prospects for AI, though I thought major progress was likely some way off⁷. Well, major progress is now here⁸. And today the situation looks to me to be very challenging; rather than the problem being solved, it looks like it's getting rapidly worse.

"That wasn't an argument for ASI xrisk!" True, it wasn't. Indeed, one of the things that took me quite a while to understand was that there are very good reasons it's a mistake to expect a bulletproof argument either for or against xrisk. I'll come back to why that is later. I will make some broad remarks now though. I believe that humanity can make ASI, and that we are likely to make it soon – within three decades, perhaps much sooner, absent a disaster or a major effort at slowdown. Many able people and many powerful people are pushing very hard for it. Indeed: enormous systems are starting to push for it. Some of those people and systems are strongly motivated by the desire for power and control. Many are strongly motivated by the desire to contribute to humanity. They correctly view ASI as something which will do tremendous good, leading to major medical advances, materials advances, educational advances, and more. I say "advances", which has come to be something of a marketing term, but I don't mean Nature-press-release-style-(usually)-minor-advances. I mean polio-vaccine-transforming-millions-of-lives-style-advances, or even larger. Such optimists view ASI as a technology likely to produce incredible abundance, shared broadly, and thus enriching everyone in the world.

But while that is wonderful and worth celebrating, those advances seem to me likely to have a terrible dark side. There is a sense in which human understanding is always dual use: genuine depth of understanding makes the universe more malleable to our will in a very general way. For example, while the insights of relativity and quantum mechanics were crucial to much of modern molecular biology, medicine, materials, computing, and in many other areas, they also helped lead to nuclear weapons. I don't think this is an accident: such dual uses are very near inevitable when you greatly increase your understanding of the stuff that makes up the universe.

As an aside on the short term – the next few years – I expect we're going to see rapidly improving multi-modal foundation models which mix language, mathematics, images, video, sound, action in the world, as well as many specialized sources of data, things like genetic data about viruses and proteins, data from particle physics, sensor data from vehicles, from the oceans, and so on. Such models will "know" a tremendous amount about many different aspects of the world, and will also have a raw substrate for abstract reasoning – things like language and mathematics; they will get at least some transfer between these domains, and will be far, far more powerful than systems like GPT-4. This does not mean they will yet be true AGI or ASI! Other ideas will almost certainly be required; it's possible those ideas are, however, already extant. No matter what, I expect such models will be increasingly powerful as aids to the discovery of powerful new technologies. Furthermore, I expect it will be very, very difficult to obtain the "positive" capabilities, without also obtaining the negative. You can't just learn the "positive" consequences of quantum mechanics; they come as a package deal with the negative. Guardrails like RLHF will help suppress the negative, but as I discuss later it will also be relatively simply to remove those guardrails.

Returning to the medium-and-longer-term: many people who care about ASI xrisk are focused on ASI taking over, as some kind of successor species to humanity. But even focusing on ASI purely as a tool⁹, ASI will act as an enormous accelerant on our ability to understand, and thus will be an enormous amplifier of our power. This will be true both for individuals and for groups. This will result in many, many very good things. Unfortunately, it will also result in many destructive things, no matter how good the guardrails. It is by no means clear that questions like "Is there a trivially easy-to-follow recipe to genocide [a race]?" or "Is there a trivially easy-to-follow recipe to end humanity?" don't have affirmative answers, which humanity is merely (currently and fortunately) too stupid to answer, but which an ASI could answer.

Historically, we have been very good at evolving guardrails to curb and control powerful new technologies. That is genuine cause for optimism. However, I worry that we won't be able to evolve guardrails sufficient to the increase in this case. The nuclear buildup from the 1940s through the 1980s is a cautionary example: reviewing the evidence it is clear we have only just barely escaped large-scale nuclear war so far – and it's still early days! It seems likely that ASI will create many such threats, in parallel, on a much faster timescale, and far more accessible to individuals and small groups. The world of intellect simply provides vastly scalable leverage: if you can create one artificial John von Neumann, then you can produce an army of them, some of whom may be working for people we'd really rather not have access to that kind of capacity. Many people like to talk about making ASI systems safe and aligned; quite apart from the difficulty in doing that (or even sensibly defining that) it seems it must be done for all ASI systems, ever. That seems to require an all-seeing surveillance regime, a fraught path. Perhaps such a surveillance regime can be implemented not merely by government or corporations against the populace, but in a much more omnidirectional way, a form of ambient sousveillance¹⁰.

On practical alignment with real systems

"What do you think about the practical alignment work that's going on – RLHF, Constitutional AI, and so on?": The work is certainly technically interesting. It's interesting to contrast to prior systems, like Microsoft's Tay, which could easily be made to do many terrible things. You can make ChatGPT and Claude do terrible things as well, but you have to work harder; the alignment work on those systems has created somewhat stable guardrails. This kind of work is also striking as a case where safety-oriented people have done detailed technical work to improve real systems, with hard feedback loops and clear criteria for success and failure, as opposed to the abstract philosophizing common in much early ASI xrisk work. It's certainly much easier to improve your ideas in the former case, and easier to fool yourself in the latter case.

With all that said: practical alignment work is extremely accelerationist. If ChatGPT had behaved like Tay, AI would still be getting minor mentions on page 19 of The New York Times. These alignment techniques play a role in AI somewhat like the systems used to control when a nuclear bomb goes off. If such bombs just went off at random, no-one would build nuclear bombs, and there would be no nuclear threat to humanity. Practical alignment work makes today's AI systems far more attractive to customers, far more usable as a platform for building other systems, far more profitable as a target for investors, and far more palatable to governments. The net result is that practical alignment work is accelerationist. There's an extremely thoughtful essay by Paul Christiano, one of the pioneers of both RLHF and AI safety, where he addresses the question of whether he regrets working on RLHF, given the acceleration it has caused. I admire the self-reflection and integrity of the essay, but ultimately I think, like many of the commenters on the essay, that he's only partially facing up to the fact that his work will considerably hasten ASI, including extremely dangerous systems.

Over the past decade I've met many AI safety people who speak as though "AI capabilities" and "AI safety/alignment" work is a dichotomy. They talk in terms of wanting to "move" capabilities researchers into alignment. But most concrete alignment work is capabilities work. It's a false dichotomy, and another example of how a conceptual error can lead a field astray. Fortunately, many safety people now understand this, but I still sometimes see the false dichotomy misleading people, sometimes even causing systematic effects through bad funding decisions.

A second point about alignment is that no matter how good the guardails, they are intrinsically unstable, and easily removed. I often meet smart AI safety people who have inventive schemes they hope will make ASI systems safe. Maybe they will, maybe they won't. But the more elaborate the scheme, the more unstable the situation. If you have a magic soup recipe which requires 123 different ingredients, but all must be mixed accurate to within 1% weight, and even a single deviation will make it deadly poisonous, then you really shouldn't cook and eat your "safe" soup. One of the undercooks forgets to put in a leek, and poof, there goes the village.

You see something like this with Stable Diffusion. Initial releases were, I am told, made (somewhat) safe. But, of course, people quickly figured out how to make them unsafe, useful for generating deep fake porn or gore images of non-consenting people. And there's all sorts of work going on finetuning AI systems, including to remove items from memory, to add items into memory, to remove RLHF, to poison data, and so on. Making a safe AI system unsafe seems to be far easier than making a safe AI system. It's a bit as though we're going on a diet of 100% magic soup, provided by a multitude of different groups, and hoping every single soup has been made absolutely perfectly.

Put another way: even if we somehow figure out how to build AI systems that everyone agrees are perfectly aligned¹¹, that will inevitably result in non-aligned systems. Part of the problem is that AI systems are mostly made up of ideas. Suppose the first ASI systems are made by OpenAnthropicDeepSafetyBlobCorp, and they are absolutely 100% safe (whatever that means). But those ideas will then be used by other people to make less safe systems, either due to different ideologies about what safe should mean, or through simple incompetence. What I regard as safe is very unlikely to be the same as what Vladimir Putin regards as safe; and yet if I know how to build ASI systems, then Putin must also be able to build such systems. And he's likely to put very different guardrails in. It's not even the same as with nuclear weapons, where capital costs and limited access to fissionable materials makes enforcement of non-proliferation plausible. In AI, rapidly improving ideas and dropping compute costs mean that systems which today require massive resources to build can be built for tuppence tomorrow. You see this with systems like GPT-3, which just a few years ago cost large sums of money and took large teams; now, small open source groups can get better results with modest budgets.

Summing up: a lot of people are trying to figure out how to align systems. Even if successful, such efforts will: (a) accelerate the widespread use and proliferation of such systems, by making them more attractive to customers and governments, and exciting to investors; but then (b) be easily circumvented by people whose idea of "safe" may be very, very different than yours or mine. This will include governments and criminal or terrorist organizations of ill intent.

"Does this mean you oppose such practical work on alignment?" No! Not exactly. Rather, I'm pointing out an alignment dilemma: do you participate in practical, concrete alignment work, on the grounds that it's only by doing such work that humanity has a chance to build safe systems? Or do you avoid participating in such work, viewing it as accelerating an almost certainly bad outcome, for a very small (or non-existent) improvement in chances the outcome will be good? Note that this dilemma isn't the same as the by-now common assertion that alignment work is intrinsically accelerationist. Rather, it's making a different-albeit-related point, which is that if you take ASI xrisk seriously, then alignment work is a damned-if-you-do-damned-if-you-don't proposition.

Unfortunately, I am genuinely torn on the alignment dilemma! It's a very nasty dilemma, since it divides two groups who ought to be natural collaborators, on the basis of some uncertain future event. And apart from that point about collaboration and politics, it has nasty epistemic implications. It is, as I noted earlier, easiest to make real progress when you're working on concrete practical problems, since you're studying real systems and can iteratively test and improve your ideas. It's not impossible to make progress through more abstract work – there are important ideas like the vulnerable world hypothesis¹², existential risk¹³ and so on, which have come out of the abstract work on ASI xrisk. But immediate practical work is a far easier setting in which to make intellectual progress.

"Some thoughtful open source advocates believe the pursuit of AGI and ASI will be safer if carried out in the open. Do you buy that?": Many of those people argue that the tech industry has concentrated power in an unhealthy way over the past 30 years. And that open source mitigates some of that concentration of power. This is sometimes correct, though it can fail: sometimes open source systems are co-opted or captured by large companies, and this may protect or reinforce the power of those companies. Assuming this effect could be avoided here, I certainly agree that open source approaches might well help with many important immediate concerns about the fairness and ethics of AI systems. Furthermore, addressing those concerns is an essential part of any long-term work toward alignment. Unfortunately, though, this argument breaks down completely over the longer term. In the short term, open source may help redistribute power in healthy, more equitable ways. Over the long term the problem is simply too much power available to human beings: making it more widely available won't solve the problem, it will make it worse.

ASI xrisk persuasion paradoxes

"A lot of online discussion of ASI xrisk seems of very low quality. Why do you think that is?" I'll answer that indirectly. Something I love about most parts of science and mathematics is that nature sometimes forces you to change your mind about fundamental things that you really believe. When I was a teenager my mind recoiled at the theories of relativity and quantum mechanics. Both challenged my sense of the world in fundamental ways. Ideas like time dilation and quantum indeterminacy seemed obviously wrong! And yet I eventually realized, after much wrestling, that it was my intuitions about the world that were wrong. These weren't conclusions I wanted to come to: they were forced, by many, many, many facts about the world, facts that I simply cannot explain if I reject ideas like time dilation and quantum indeterminacy. This doesn't mean relativity and quantum mechanics are the last word in physics, of course. But they are at the very least important stepping stones to making sense of a world that wildly violates our basic intuitions.

I love this kind of change in thinking. I think it's one of the most remarkable achievements of science. And something I find depressing in discussions of ASI xrisk is how little it's happening. Imagine if, with time dilation, physicists had split into two tribes, with one tribe believing in time dilation, and the other not. It'd be analogous to the situation today, where AI experts (and others with relevant expertise) disagree enormously on the question of whether AI systems pose xrisk over the next few decades. There is no similar debate amongst serious physicists about whether time dilation (or something very similar) is occurring; that question was decisively settled with remarkable speed. Any "debate" today is the realm of cranks¹⁴.

With that said, people in the discussion of ASI xrisk do seem to change their mind about small things. But I see few people changing their mind about big things. If their initial intuition is strongly "very little xrisk", then the only thing that seems likely to change that intuition is their tribe – if their friends think differently – not evidence. Ditto if their initial intuition is "considerable xrisk"¹⁵. This sounds like a critique of those people, but it's not meant that way: I think it's an often-sensible response to the weakness of our evidence and arguments. Put another way: I think the current arguments both for and against ASI xrisk aren't very good. That, in turns, sounds like a critique of the quality of work done. That's (rather snottily) what I initially thought was likely to be true. But I've come to think that's not true, either. I just think the problems are intrinsically extremely difficult. It reminds me of the (ongoing) situation with the foundations of quantum mechanics, and the understanding of quantum measurement, where the problems are so difficult that even extraordinarily capable and well-informed people have trouble making progress.

Unfortunately, in the case of ASI xrisk the result is that people aren't deciding their positions due to the quality of the evidence and argument. Instead, many seem to be choosing based on vibe, tribe, power, and self-interest. And so you end up with a public conversation based on persuasion and charisma and overconfidence and power. Intuition pumps, not genuine understanding. Having a plausible story is not the same as having a compelling, well-tested model. I suppose it's true that it's often easier to be convincing when a position is grounded in genuine understanding. (This would be true for either the case for or against xrisk.) But still: the quality of the arguments today just isn't that strong.

I've found this all quite discouraging. It's why I've used this informal pseudo-interview form: as I said at the start, an essay or paper seems to imply more confidence than warranted. However, one thing that has helped my thinking is gradually understanding some reasons it's intrinsically hard to analyze the case for and against xrisk.

"What are those intrinsic reasons it's hard to make a case either for or against xrisk?" There are three xrisk persuasion paradoxes that make it difficult. Very briefly, these are:

More broadly: these three obstructions hold regardless of whether concerns about ASI xrisk are well-founded or total baloney. That is, they can be regarded as obstructions to both making good arguments for and good arguments against xrisk¹⁷. And that difficulty in making a dispositive argument about ASI xrisk makes it clearer why people so often fall back instead on vibe and tribe.

(Incidentally, you may note that persuasion paradox 1 asserts that certain types of argument for xrisk may create xrisk, while paradox 2 asserts that arguments for xrisk may avert xrisk. This may seem inconsistent, but of course both can be true. The broader point: human action is contingent upon our understanding; any change to that understanding may change our actions, and thus our outcomes. This entangling of argument with actions and outcomes makes xrisk difficult to reason about.)

Recipes for ruin

"Okay, I think I see why you believe it's difficult to obtain strong results about xrisk. But surely there are strong a priori reasons to doubt. New technologies often carry risks, and humanity sometimes struggles to deal with them, but we usually more or less get there in the end, albeit often with a too-high cost. Many countries have banned things like asbestos and lead in fuel; we've greatly improved how we deal with pesticides and airline and automobile safety; while we're (too) slowly improving on greenhouse gases, we are genuinely improving. Why are you concerned about xrisk from ASI? Won't the same sorts of approaches to making systems safe work there?" Yeah, it's an excellent question. I'll approach it indirectly however. I want you to forget about AI and ASI entirely for a bit, and imagine instead that you have been placed in front of an all-knowing oracle. And you get to ask the oracle questions like:

These are doomsday questions, asking for simple recipes for ruin. I emphasize again that they have nothing directly to do with existing AI systems, or even with future ASI systems. They are simply questions about what is physically possible, and what can be achieved with today's technologies. However, I've talked over such doomsday questions with many people, and it's surprising how much people's intuitions about such questions vary, and how those intuitions seem to influence how they think about ASI xrisk. I'll come back to the ASI connection later. First, though, I want to describe a few more variant questions for our oracle:

In fact, suppose you were somehow able to accelerate a single electron to a trillion, trillion, trillion, trillion, trillion electron volts. What would be the impact of aiming it at the Earth? I asked this question on Twitter. While many of the replies were poorly informed, a few people had relevant expertise in physics, and I was surprised to see that there wasn't agreement between them. Part of the issue is that this energy is far past the Planck energy¹⁸, and it's plausible that any good answer would require a theory of quantum gravity. However, it seems almost certain that if such a particle were to interact with the Earth, and dissipate any substantial fraction of its energy widely through the Earth, it would be enormously disruptive.

It seems a bit like cheating to imagine a single particle of such high energy. So you might ask the oracle instead:

Now, you might argue, "oh, can't you prove that's not possible? Surely if you're going to set off a lot of destruction you need a great deal of energy?" In fact, that argument is deeply unsatisfactory. We'll see why in a moment. First, let's refine one of our earlier questions:

A scenario like this was part of the premise of Kurt Vonnegut's novel "Cat's Cradle". He imagined a phase of water, dubbed Ice-9, which had a few unusual properties: (1) Ice-9 is solid at ordinary room temperature; and (2) it's a type of seed crystal, which, when it comes into contact with ordinary liquid water, also at room temperature, causes the ordinary water to also freeze and turn into more Ice-9¹⁹.

That innocuous-sounding pair of properties is disastrous. It means that once Ice-9 comes into contact with the ocean it will cause all the world's oceans to freeze over²⁰, with horrendous consequences, wiping out not just human life but much of the biosphere. It would be a world-ending catastrophe. And it would have ended the world with no need for very large energy sources: rather, it would be using the crystallization process to rearrange one form of matter into another (much less desirable) form of matter of similar energy. So a more specific variant doomsday(-esque) question for our oracle is:

Of course, everyone who reads "Cat's Cradle" wonders whether something like Ice-9 can exist. A rebuttal which I've heard occasionally²¹ is that if such a polymorph of water were possible, then it would already have occurred by chance. In more detail: the Earth's oceans contain lots of water molecules whizzing around, and taking on all sorts of configurations by chance. If there was a seed crystal for Ice-9, it would have occurred by chance by now, and the Earth would be covered in Ice-9. The fact it isn't implies this is extremely unlikely.

This argument is somewhat reassuring. Certainly, it means it's unlikely Ice-9 will occur by chance any time in the next thousand years, say. But it's not entirely reassuring: it says very little about whether or not it's possible to design a configuration of water molecules with the required properties. Such a configuration wouldn't need to be very complicated before it was extremely unlikely to have ever occurred by chance on Earth.

In fact, something at least a little like Ice-9 has occurred on Earth before. About 2.4 billion years ago the Earth's atmosphere contained just trace amounts of oxygen. It was at that time that evolution accidentally discovered the modern photosynthetic pathway, with some cyanobacteria evolving the ability to convert CO2 and sunlight into energy and oxygen. This was terrific for those lifeforms, giving them a much improved energy source. But it was very likely catastrophic for other lifeforms, who were poisoned by the oxygen, causing a mass extinction. It must have seemed like a slow suffocation; it was much slower than the takeover by Ice-9 described by Vonnegut, but it was perhaps even more enveloping²². It was perhaps the first victory by a grey goo²³ on Earth.

Reiterating my earlier point yet again: none of these oracle questions has anything directly to do with AI or ASI. They're all just questions about the laws of physics, and existing human capabilities. Still, as I said above I find that people differ very strongly in their intuitions about the answers. Many people just can't conceive that a recipe for ruin is even possible. But some find it quite easy. Many of those who find it easy do so because they've heard stories in this vein. But a story can be compelling without being true – think of all the science fiction stories that use faster-than-light mechanisms which have a surface plausibility, but which would cause severe problems in our actual universe²⁴. So hearing a compelling story on its own means little.

What I find more concerning is one group who often seem to think a recipe for ruin is plausible: imaginative scientists with a broad and deep view of the long run of technology. I've spoken with many about this, and quite a few (not all) find such recipes for ruin plausible. It's not so surprising: they tend to understand well how miraculous and unlikely many properties of matter would have seemed before their discovery: from fire to superconductivity; from the steam engine to fission chain reactions; from hot air balloons to gravitational wave detectors; and so on, and on, and on. Humanity keeps discovering possibilities latent within matter, which make easy what was formerly impossible. Imagine describing fire to someone who had never seen it – a substance that consumes all before it, turning it into blackened char; can move with lightning speed; is exothermic, which means it can be self-sustaining; can destroy areas the size of countries. It would sound ludicrous. Indeed, if the Earth's atmosphere had much higher levels of oxygen, fire might well (at least approximately) be an example of a recipe for ruin. If would become far easier to achieve horrendous, all-consuming firestorms like those seen in the bombings of Tokyo and Dresden. We take fire for granted, yet it seems to mostly be chance²⁵ that it is not a recipe for ruin; certainly, it is adjacent.

One reason I like this framing in terms of hypothetical oracles is that it avoids a problem common in discussions of ASI xrisk. Often, in such discussions, people posit Godlike capabilities for ASIs. This naturally and reasonably triggers skepticism: "Oh, ASI won't be able to do that". The oracle framing avoids that response, by breaking things down into two steps. The question about whether recipes for ruin exist is a perfectly reasonable question about the laws of physics and human technological capabilities. Maybe you can find an argument for why no such recipe for ruin can exist.

However, suppose you regard it as plausible that such a recipe exists. Then a natural follow-on question is: could an ASI discover it²⁶? It is by no means obvious that even if recipes for ruin exist then ASI's will be able to discover them: the gap between current human understanding of the universe and a complete understanding seems likely to be (extremely!) large. It may be that ASI can radically speed up scientific and technological progress, while leaving recipes for ruin safely unknown for the foreseeable future. However, my intuition is to be extremely concerned that recipes for ruin exist, and that a near-future ASI will be perfectly capable of discovering them.

"Are ASIs finding recipes for ruin the main concern you have about xrisk?" No, that's just one lens on the problem. It's meant to be illustrative of the kinds of dangers that very powerful and widely accessible technologies could pose, even without rogue AIs taking over. But it's easy to conjure up many other scenarios, too; no doubt you've seen some. In the case of recipes for ruin, you end up in a situation where you're hoping that: (a) such recipes don't exist; (b) if they do exist, ASIs won't be able to develop them; (c) if they can find them, the alignment guardrails will be so robust and impermeable that humans in practice won't be able to learn such recipes; (d) if the guardrails can be crossed, then no human will want or have the capability to use them. Unfortunately, I have strong doubts about all these. If it were as easy to cause mass destruction as it is to buy a gun and ammo from Walmart (in much of the US), I think we'd be in quite a bad state. At the very least, I think we'd start to see a lot more surveillance and authoritarianism and civil unrest, and the concomitant rise of demagogues. I don't mean of the kind seen in the US during the Trump years. I mean Stalin-Gulag-or-worse style. And that's the /optimistic/(!) scenario. The pessimistic scenario is far worse.

To what extent will ASI speed up science and technology? When is experiment a bottleneck? When can an ASI think its way to new discoveries and new capabilities?

"It seems, then, that much of your concern lies in the ability of ASI to speed up science and technology. How much do you think that will happen?" It's true, that is where much of my concern is. It's (very) personally uncomfortable: I've spent much of my career concerned with how to speed up science and technology. And now find myself wondering if I've been playing on the wrong team²⁷. In response to your question about how much ASI will speed up science and technology: a useful model is in terms of three barriers. There's an intellectual barrier: maybe some discoveries are difficult simply because it is challenging for even an ASI to think the right thoughts. It seems plausible, for example, that even an ASI will have trouble solving NP-hard problems. Many problems of technical design or of finding good arguments (e.g., mathematical proofs) have this character, at least heuristically: they are easy to verify as correct, but hard to find. Put another way: while an ASI may be immensely more intellectually capable than a human for almost any problem, it may still fall vastly short of an omniscient oracle²⁸. The gap between humanity and omniscience is vast!

The second barrier is a resource barrier. Although the people trying to discover the Higgs boson were very smart, they weren't able to use those smarts to avoid constructing a very large, very expensive particle accelerator. Of course, sometimes you can tradeoff resources and intelligence: intelligence did (for example) enable us to build much better particle detectors, no doubt reducing the cost of discovering the Higgs. And perhaps intelligence could have been used to speed things up more, or perhaps to acquire the resources faster²⁹.

A particularly interesting barrier is the third barrier, the experimental barrier. People often point out that experiment is a bottleneck in AI-enabled scientific progress. Historically, some proponents of xrisk have adopted a naive view, in which by "just thinking" an ASI could radically speed up all scientific discovery. Experienced scientists then retort that this is ridiculous, that just being really, really smart doesn't eliminate the need for experiment. In fact, I think we can say something interesting about when experiment will be a bottleneck and when it won't be.

"So: when will experiment be a bottleneck?" Suppose we had a perfect theory of everything and infinite computational power. Even given those resources, our ability to do (say) medicine would be surprisingly constrained. The reason is that we simply can't deduce the nature of the human body (either in general, or in a specific instance) using those resources alone: it depends a tremendous amount on contingent facts: of our evolutionary history, of the origin of life, of our parents, of the environment we grew up in, and so on. Put another way: you can't use pure theory to discover that human beings are built out of DNA, RNA, the ribosome, and so on. Those are experimental facts we need to observe in the world.

Indeed, we keep discovering entire systems in the human body that we were previously unaware of; I won't be surprised if we discover many new systems in the future. In many ways, the history of medicine can be viewed as a history of experimentally developing a better and better model of how human beings work, and of interventions to change that. You might object that we could at least deduce the possibility of DNA, RNA, the ribosome (etc) from pure theory, given sufficient computational power and a theory of everything. I'm willing to buy that, subject to certain assumptions³⁰. But that wouldn't single them out as what is actually going on in human beings. For that, you need a lot of experimental observation³¹.

This kind of bottleneck holds in any situation where historically contingent facts about the world crucially impact a phenomenon. So, for instance, this observation affects our understanding of the way the Earth's climate works, of psychology, astrophysics, and many other sciences. In each case we must make observations to see what is actually out there in the world, it isn't sufficient to just think, no matter how clever we are. This doesn't mean we can't reduce the need for experiments. But in many sciences contingent facts about the world dominate.

"So, when will ASI be able to think its way to new discoveries?" There's a flipside to the above, which is that ASI can be expected to excel in situations where we already have extremely accurate predictive theories; the contingencies are already known and incorporated into the theory, in detail. Indeed, there are already cases where humanity has used such theories to great advantage to make substantial further "progress"³², mostly through thinking ("theory and/or simulation") alone, perhaps augmented with a little experiment:

It's tempting to say that we are operating in an engineering regime when this kind of predictive theory can be done. But that's not right. You see this with the discovery of public-key cryptography: it's not so much engineering as the discovery of an entirely new phenomenon, in some sense lying hidden within existing ideas about computation and cryptography. Or consider the question of whether we could have predicted the existence of liquid water – and phenomena like the Navier-Stokes equations, turbulence, and so on – from the Schroedinger equation alone? Over the last 20 years the study of water from first principles has been a surprisingly active area of research in physics. And yet, as far as I know, we are not yet at the point where we can even deduce the Navier-Stokes equations, much less other phenomena of interest³⁴. I believe it is principally science (not engineering) when such emergent phenomena and emergent principles are being discovered. It seems to me that a fundamental question is:

"When can ASI deduce emergent phenomena and emergent laws, from accurate underlying theories?" Of course, I don't know the answer to this question. But to the extent that ASI can make such deductions, it will give it qualitatively new types of power and capability, beyond what humans presently have. And although I cannot know for sure, I expect this will be common: that there are many, many such new types of emergent phenomena latent in our existing theories, and ASI will eventually discover many such phenomena. Indeed, I expect we're already rather close to this with AlphaFold 2 and similar systems. Perhaps, for example, AI-driven systems for protein design are discovering new principles for such design; indeed, perhaps they're even doing it unknown to their human creators.

Conclusion

ASI xrisk is complicated, and these notes omit many important issues. One such issue is the positive impact of AI and ASI. Most discussions of ASI xrisk focus almost entirely on scenarios in which things go badly wrong. This is understandable, but a mistake³⁵. Any harms will largely (though not entirely) be a side effect of pursuing the positive goals motivating AI. Certainly, most of the people I know working toward ASI don't just want to be personally successful, they also desire a better, healthier, safer, more just world³⁶. And so any thorough analysis must be deeply grounded in an understanding of the positive goals, methods, and impacts of AI, a vision of a Great AI Flourishing for Humanity. I've written one short essay on this, and have extensive notes, but an integrated presentation would be far better. One benefit of simultaneously exploring multiple scenarios is that understanding optimistic scenarios enriches and changes one's understanding of negative scenarios; and vice versa. Another benefit is that it makes the stakes clearer if worriers (like me) are wrong to worry: slowing AI down may be a very high-cost-to-humanity activity.

Like many, I was stunned by ChatGPT and GPT4. While GPT4 falls well short of ASI, it makes urgent the need to better understand ASI xrisk. These notes are my first in-depth exploration of ASI xrisk, written to help me metabolize prior work, and to contribute a few ideas of my own. I've made progress more slowly than I expected, and am frustrated at still not knowing what to do about ASI xrisk. How can one have high-leverage impact here? A wise friend and colleague, Catherine Olsson, once remarked that in any endeavor, no matter how pessimistic you may be about the overall situation, positive progress is founded in optimistic plans. This is true even when tackling challenges as daunting as ASI xrisk: optimism not as naive hope, but as a constructive framework for action. Aiming at a deeper understanding seems a helpful step toward such a framework.

Acknowledgments

Thanks to Scott Aaronson, Josh Achiam, Josh Albrecht, Alexander Berger, Gwern Branwen, Adam Brown, Steven Byrnes, David Chapman, François Chollet, Brian Christian, Patrick Collison, Andrew Critch, Laura Deming, David Deutsch, Allison Duettmann, Mike Freedman, Katja Grace, Jeremy Howard, Adam Marblestone, Andy Matuschak, Neel Nanda, Richard Ngo, Chris Olah, Catherine Olsson, Toby Ord, Kanjun Qiu, Terry Rudolph, Caitlin Sikora, Daisy Stanton, Alexander Tamas, Umesh Vazirani, and Bret Victor for conversations which informed these notes. Of course, I expect that many of these people would disagree (perhaps vociferously!) with the points of view expressed herein. Thanks to Terry Rudolph for comments on a draft, and to Hannu Rajaniemi for pointing out an error in the published version. Thanks especially to Alexander Berger, Adam Marblestone, Toby Ord, and Lenny Susskind for encouragement, and to Andy Matuschak for both a lot of encouragement and assistance at a point of lost confidence, and for comments on a draft. Finally, a point I've reflected on often while thinking about ASI xrisk: the notes (and all my work) have benefited enormously from all the amazing cognitive technologies humanity has developed, from writing to scientific papers to Google to ChatGPT and Claude. John von Neumann observed that "for progress there is no cure". Perhaps he was correct.

Citation information

Footnotes

See, for example: Nick Bostrom, "Superintelligence" (2014); Stuart Russell, "Human compatible" (2019); Brian Christian, "The Alignment Problem" (2020); David Chapman, "Better Without AI" (2023). There is a lot of other stimulating writing on the subject, which I won't attempt to review, so this is necessarily an unfairly short list. I will say this: those pieces all make a case for extraordinary risks from AI (albeit in different ways); I am somewhat surprised that I have not been able to find a work of similar intellectual depth arguing that the risks posed by ASI are mostly of "ordinary" types which humanity knows how to deal with. This is often asserted as "obviously" true, and given a brief treatment; unfortunately-often the rebuttal is mere proof by ridicule, or by lack-of-imagination (often people whose main motivation appears to be that people they don't like are worried about ASI xrisk). It's perhaps not so surprising: "the sky is not falling" is not an obvious target for a serious book-length treatment. Still, I hope someone insightful and imaginative will fill the gap. Three brief-but-stimulating shorter treatments are: Anthony Zador and Yann LeCun, Don't Fear the Terminator (2019); Katja Grace, Counterarguments to the basic AI x-risk case (2022); and: David Krueger, A list of good heuristics that the case for AI x-risk fails (2019).↩︎
Note that this is not in the sense in which large language models are sometimes said to show "emergent" properties, but in the original sense of the terms. This will be defined later.↩︎
Describing it as "a" technology gives a somewhat misleading sense of fixedness. It will change, a lot. Indeed, it's that very change which is part of what makes it dangerous.↩︎
Some people have argued that it's very difficult to achieve complete extinction using either nuclear weapons or engineered viruses. I won't too strenuously argue the point; I've no doubt what can be done through these means would be utterly horrifying to anyone reasonable.↩︎
Adapted from Nick Bostrom's useful term "information hazard": Nick Bostrom, Information Hazards: A Typology of Potential Harms from Knowledge (2011).↩︎
Those fears were perhaps most notably stoked by Paul Ehrlich's book "The Population Bomb". It is a curious fact that those fears mirrored the response to Thomas Malthus's "An Essay on the Principle of Population", published in 1798. The response to Ehrlich is perhaps more understandable, given the extraordinary growth rate of the world population when his book was published, in 1968.↩︎
Roughly a decade ago I wrote an introductory technical book about neural networks and deep learning. It's striking in retrospect how muted it is both on the subject of ASI and ASI xrisk. The former is discussed briefly; the latter only mentioned in passing. Certainly, neither seemed looming to me at the time. That is no longer the case.↩︎
"How long to AGI?" and "How long to ASI?" are "so, what about this weather we've been having?" questions in much of Silicon Valley. Of course, they have the same problem as the concept of the probability of doom: they implicitly assume a fatalistic attitude which denies human agency to change outcomes. They only make sense conditional on assumptions about human behavior. So: conditional on no major changes in attitude among our technocratic class (including no especially restrictive policies, in any AI-oriented country), no major wars or other disasters, I think time-to-ASI is plausibly estimated at anything from a few years to a few decades. As I will discuss below in more detail, very large multi-modal unsupervised learning of foundation models seems very promising to me. There are many problems with that paradigm but it's by no means clear that combining such models with a few simple (and perhaps already extant) ideas won't solve those problems.↩︎
In general, I wish there was more focus on misuse, and less on malevolent ASI takeover: if a malevolent, agentic ASI can use capability X to take over the world, then a human being using an ASI purely as a non-agentic tool can likely also use capability X (or something similar) in a very damaging way. While I personally find agentic ASI quite plausible, it blocks a lot of people from thinking about ASI. A very different group of people seem to find it fascinating for the theatrics – there's far more movies about robot uprisings than there are about humans using massive swarms of drones as weapons, despite the fact that the latter is already happening, while the former is not. But theatrical narrative interest is a poor reason to focus on a scenario, yet I believe that is part of the reason for the interest in malevolent ASI takeover.↩︎
See: David Brin, "The Transparent Society" (1998).↩︎
An incoherent concept, of course. Being aligned with Vladimir Putin isn't the same thing as being aligned with your friendly neighbour. Different people have different values; power has always been exercised to impose values. I doubt that's going to change. However, there's a more optimistic version, which is that ASI challenges us to massively upgrade our systems of governance.↩︎
Nick Bostrom, "The Vulnerable World Hypothesis" (2019).↩︎
Nick Bostrom, "Existential Risks" (2002). A recent book on the subject is: Toby Ord, "The Precipice" (2021).↩︎
I'm not referring to work on successor theories to relativity, which may modify time dilation in important ways, or make it non-fundamental, only recoverable in some approximation. That is, it still appears in such theories, but only in an approximate or non-fundamental form. In this sense time dilation is assimilated into all serious modern contender theories of physics.↩︎
Gradually increasing concern about xrisk seems, however, not uncommon. This seems associated to it becoming more concrete, not really to dispositive evidence, however.↩︎
In his astonishing essay: Vernor Vinge, "The Coming Technological Singularity: How to Survive in the Post-Human Era" (1993).↩︎
I've mostly framed them in one direction, as being about why it's difficult to argue for xrisk. For the reverse direction, consider for the first: for many types of xrisk a reasonable argument against that xrisk is "but you don't have any detailed, concrete scenarios". You'd think this for meteor strikes, for example, or natural pandemics. In this particular case, however, we have good reasons to avoid making detailed, concrete arguments, so their absence isn't especially compelling. A similar, though milder, version of this argument holds for the second paradox, which is much less specialized to ASI. And I think the third is obviously a reason it's difficult to argue either for or against ASI xrisk.↩︎
Note that the Planck energy is about 2 gigajoules – for comparison, a human body burns through about 10 megajoules per day, so the Planck energy is very roughly your energy use over half a year. It's not intrinsically an especially unusual energy scale. However, when that energy (or far more, as in the text) is wrapped up in a single elementary particle, it's at least plausible that interesting things happen.↩︎
Apparently this was suggested to Vonnegut by his brother, Bernard Vonnegut, who in turn learned of it from the Nobel chemist Irving Langmuir. I can't resist mentioning a chain of related associations: Langmuir and Barnard Vonnegut both worked on a genuine phenomenon rather similar to Ice-9, cloud seeding to control the weather. And Barnard was also interested in perhaps applying related techniques to control tornadoes. He collaborated in this with the nuclear physicist, Stirling Colgate. Colgate, in turn, may have had his interest stimulated in part by his role in overseeing diagnostics for the US's Castle Bravo thermonuclear test. This test was extremely concerning, because the yield from the test was several times higher than estimated: the physicists had overlooked a crucial reaction during design. All these phenomena – beginning from small seeds which are then self-amplifying to produce a gigantic impact – are rather concerning.↩︎
Assuming, of course, that salinity and similar impurities don't affect this property of Ice-9.↩︎
I first heard it in grad school – I do not remember from whom – but I'm not sure where it was first written down, or where the strongest form of the argument has been made. One place making this argument is: https://www.askamathematician.com/2012/11/q-could-kurt-vonegets-ice-9-catastrophy-happen/↩︎
It's been speculated that this may have been the evolutionary pressure that triggered symbiogenesis, and thus the rise of multicellular eukaryotic life.↩︎
I've noticed that if one speaks of "grey goo" scenarios, people often react very strongly that "That's not grey goo! Because [it violates some condition the author thinks a grey goo must satisfy]". This is fair enough, although I've noticed that many such refutants seem to disagree with one another on what a grey goo is. For present purposes I just mean: a grey goo is a submicron assembly that, once loose in the world, gradually reshape and alters the entire world. The SARS-CoV-2 virus is arguably a (very, very mild) example; the rise of oxygen due to photosynthesis is a far stronger example. Life itself is another example.↩︎
E.g., things like the ability to send messages backward in time. It may be that there are ways of rescuing physics, but at the very least this is quite a challenge to make consistent.↩︎
Or perhaps better described as anthropic post-selection: we likely wouldn't be here if enormous firestorms were commonplace.↩︎
It is interesting to ponder whether Elon Musk's proposed TruthGPT would aspire to provide such recipes for ruin.↩︎
Right now, I think not: the question is whether humanity's frameworks for adjusting to technology can cope with any acceleration. And I think they can do a good job, up to a point – a point well beyond where we're currently at (but likely not beyond what ASI will make possible).↩︎
A staple of discussions of ASI and the Singularity is, of course, bootstrapping computing devices, each generation able to rapidly design far more capable successors (or extensions) to itself. A very interesting variant of this argument is to imagine sequences of qualitatively new types of computer. Perhaps the first (classical) ASI builds a quantum ASI whose advantage is not merely a few more capabilities, but a fundamentally different class of problems that it may plausibly attack at all. And then, fancifully, perhaps it discovers more powerful classes of computer still, based on novel physics. Such a change is not just bootstrapping in the sense that GPT-4 will help build GPT-5; it actually changes the fundamental nature of computation.↩︎
It is amusing to imagine a hedge fund set up principally to fund the next generation of particle accelerators. The success of Renaissance Technologies suggests this may actually be a useful strategy.↩︎
Though it's interesting to ponder the ways in which this might fail. I haven't given this a tremendous amount of thought. It depends in part on the nature of a theory of everything, and on the truth (or otherwise) of the Church-Turing-Deutsch Principle.↩︎
Of course, with sufficient intelligence you might require much less experimental observation than we humans have required. This is one of the things that makes the Sherlock Holmes stories so entertaining, albeit sometimes unrealistic: he is able to make so much out of tiny scraps of evidence. On Twitter I once opined that: "Eg, a superintelligence in 1800 couldn't have thought its way to nukes: it needed a lot of experiments and much better models of the world before even the possibility would have seemed plausible." Gwern Branwen pointed out that in fact: "The sun is all you need to raise the possibility of large explosions to meaningful probability. The anomaly of 'what powers the sun/stars' is so striking that a century before 1800, Newton is resorting to hypothesizing angels pushing comets into the sun." While neither Newton nor Branwen is a superintelligence, the point ought to be clear: sometimes very smart entities can make surprising deductions from scant evidence.↩︎
I use "progress" here very narrowly to mean: increases in scientific understanding or technological capability. I think the invention of atomic and hydrogen bombs was an instance of enormous regress in human civilization. And things like the stealth fighter certainly deserve careful thought as to whether they are progress or regress. Yet all are "progress" in the narrow sense I mean here.↩︎
I was told this story by Carl Caves, who was there. Indeed, I believe Caves told me the story a few hours after the seminar (which I unfortunately couldn't be at). Assuming my memory is accurate, Anderson was incorrect, since Bose likely deserves more of the credit than Einstein. But it's too funny and interesting a line to let pass.↩︎
I am reminded of a famous passage from the Feynman Lectures on Physics: "When we look at a rainbow, it looks beautiful to us. Everybody says, “Ooh, a rainbow.” (You see how scientific I am. I am afraid to say something is beautiful unless I have an experimental way of defining it.) But how would we describe a rainbow if we were blind?… Then one day the physical review of the blind men might publish a technical article with the title “The Intensity of Radiation as a Function of Angle under Certain Conditions of the Weather.” In this article there might appear a graph such as the one in Fig. 20-5…Now do we find the graph of Fig. 20–5 beautiful? It contains much more detail than we apprehend when we look at a rainbow, because our eyes cannot see the exact details in the shape of a spectrum. The eye, however, finds the rainbow beautiful. Do we have enough imagination to see in the spectral curves the same beauty we see when we look directly at the rainbow? I don’t know."↩︎
I first heard this emphasized by Christine Peterson.↩︎
I sometimes disagree sharply with their personal priorities for what human flourishing means, but the desire for a better world often seems to me to be deeply felt. Of course, it can be easy, in such a situation, to confuse what is good for oneself or one's friends or one's company or one's tribe with what is good for humanity.↩︎