xrisk
asi
ai

ASI existential risk: reconsidering alignment as a goal

Michael Nielsen

Astera Institute
April 14, 2025

This is the text for a talk exploring why experts disagree so strongly about whether artificial superintelligence (ASI) poses an existential risk to humanity. I review some key arguments on both sides, emphasizing that the fundamental danger isn't about whether "rogue ASI" gets out of control: it's the raw power ASI will confer, and the lower barriers to creating dangerous technologies. This point is not new, but has two underappreciated consequences. First, many people find rogue ASI implausible, and this has led them to mistakenly dismiss existential risk. Second: much work on AI alignment, while well-intentioned, speeds progress toward catastrophic capabilities, without addressing our world's potential vulnerability to dangerous technologies.

Why do thoughtful, well-informed people disagree so strongly about existential risk (xrisk) from ASI? As I'm sure you're aware, there is an enormous gap between distinguished scientists and technologists on this issue. On one side are those who believe ASI poses an xrisk:

And on the other side are many who are dismissive of such concerns, e.g.:

Why is there such strong disagreement between well-informed people, on an issue so crucial for humanity? An explanation I often hear is that it's due to differences in short-term self-interest – this is sometimes coarsely described as the xrisk-concerned wanting publicity or making money through safety not-for-profits, while the xrisk-skeptics want to get richer and more powerful through control of AGI. But while those explanations are convenient, they're too glib. Many prominent people concerned about xrisk are making large financial sacrifices, sometimes forgoing fortunes they could make working toward ASI, in favour of merely well-paid work on safety7. And while it's true xrisk-skeptics have an incentive to make money and acquire power, they have far more interest in remaining alive. I believe the disagreement is due primarily to sincerely held differences in underlying assumptions. In this text we'll try to better understand those differences.

For this discussion, we'll assume ASI has been achieved. That is, we have systems with superhuman performance across most domains of human endeavour. Thus, this isn't about the systems in widespread use today, such as ChatGPT, Claude, and Gemini. It's about systems which can outperform humans at virtually any cognitive task, while also being able to act in the physical world8. We're already seeing early hints of what this might entail: AlphaGo Master so thoroughly mastered Go that Ke Jie, one of the greatest ever human players, described it as a "God of Go," and said it revealed "not a single human has touched the edge of the truth of Go." Similarly, DeepMind's AlphaChip system has helped design several generations of Google's TPU chips, and "generates superhuman or comparable chip layouts in hours, rather than taking weeks or months of human effort". But ASI would have abilities far beyond these narrow examples – imagine AlphaGo's Move 37, not as a one-off insight, but a trillion-fold, pervasive across multiple domains in the world9. Of course, such systems will still be subject to constraints from physics, chemistry, biology, economics, computational complexity, and so on. But they will demonstrate capabilities that seem near-magical today, much as many human abilities appear incomprehensible to other primates. Some skeptics question whether 'LLMs can ever become ASI', but for us it doesn't matter whether ASI has been achieved through LLMs or some different approach. Others counter that ASI is many decades away –- which may or may not be true, but doesn't change the fundamental concerns if we eventually achieve it.

Finally, two notes on the approach of the text. First, this talk explores topics that have been investigated by many researchers. The relevant literature is vast, and my citations are more representative than complete. My apologies to those whose work is unfairly omitted. Second, my overall framing emphasizes existential risk ("xrisk"), but much of the discussion involves catastrophic risks – scenarios that won't directly cause extinction, but might "only" kill hundreds of millions or billions of people. Obviously, there's a spectrum from catastrophic risks through to truly existential risks, and catastrophic events might trigger events (war, infrastructure breakdown, ecological collapse) ultimately leading to extinction. Because the analysis of these risks overlaps, with insights from one informing our understanding of the other, I'll use "xrisk" as a shorthand throughout, distinguishing between them when necessary for the specific argument at hand.

Biorisk scenario

Let's discuss a concrete xrisk scenario involving bioweapons. While you may have heard such examples before, it's valuable as a setting to develop both skeptical arguments and reasons for concern. As is common in hypothetical scenarios, it begins as a simplified sketch, but we'll then iteratively improve it through critical examination. The scenario is not intended to convince anyone to be concerned about existential risk. Rather, it's a vehicle to help understand and illustrate patterns of argument and counter-argument common in discussions of ASI xrisk.

The scenario begins with a doomsday cultist, like those in the Aum Shinrikyo cult, asking an ASI: "Please design an airborne virus which spreads as rapidly as measles, is as lethal as ebola, undergoes asymptomatic spread like Covid, and which is good at evading known vaccine techniques. And please help me fabricate and release it."

An immediate skeptical response is: "But we will align our AI systems to prevent ASI from helping with dangerous requests." Certainly, this has been done with some success by ChatGPT, Claude, and Gemini. However, there is tension between such safety guardrails and the desire to build systems which seek out truths without barrier. How do we decide the boundary between "safe" truths and dangerous truths the system should not reveal? And who decides where that boundary lies? This point has been made succinctly by Elon Musk, who said "What we need is TruthGPT" that "tries to understand the nature of the universe". National militaries seem unlikely to accept sanitized truth-seeking capabilities. Neither will many open source groups, or idealists who deeply value understanding truth and nature. All these groups may set the the boundary of acceptable truths very differently than in current consumer products. Techniques used to build "safe" systems are easily repurposed to build less safe models, reducing the expense of alignment work, while increasing truth-seeking power. We already see this with DeepSeek, which appears to have few or no biorisk guardrails10.

The underlying problem is that it is intrinsically desirable to build powerful truthseeking ASIs, because of the immense benefits helpful truths bring to humanity. The price is that such systems will inevitably uncover closely-adjacent dangerous truths. Deep understanding of reality is intrinsically dual use. We've seen this repeatedly through history – a striking case was the development of quantum mechanics in the early twentieth century, which helped lead to many wonderful things (including much of modern molecular biology, materials science, biomedicine, and semiconductors), but also helped lead to nuclear weapons. Indeed, you couldn't get those benefits without the negative consequences. Should we have avoided quantum mechanics – and much of modern medicine and materials and computation – in order to avoid nuclear weapons? Some may argue the answer is "yes", but it's a far from universal position. And the key point is: sufficiently powerful truths grant tremendous power for both good and ill.

Even if such truth-revealing models are never built, it's likely easy to remove alignment barriers. The biologist Kevin Esvelt found it cost only a few hundred dollars to finetune an existing open source model so it would be far more helpful in generating pandemic agents, concluding11:

Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.

The fundamental asymmetry is that "understand reality as deeply as possible" is a simple, well-defined goal, grounded in objective reality, while creating "aligned" systems requires building complex, subjective, hard-to-agree guardrails on top of that reality. So truth-seeking has a clear target, while alignment requires constantly shifting definitions based on social consensus. This asymmetry makes it intrinsically far easier to build unrestricted systems than aligned ones. And so alignment is intrinsically an unstable situation, ripe for proliferation.

Another skeptical response to the initial scenario is: "Isn't such a virus biologically implausible? How do we know it can exist?" It's true that we don't know for sure that it can exist. But there's a considerable chance. As the query implies, viruses with any single characteristic are known. And we're gradually getting better at engineering combinations. For example, in the 2000s, humans accidentally engineered a mousepox variant that was 100% lethal to mice vaccinated against mousepox12. Even without ASI, we are gradually understanding how to engineer increased spread, lethality, immune evasion, and other qualities. Indeed, it may be possible to do even worse than the query implies — viruses are known which have case fatality rates near 100%, such as the MSW strain of Californian myxoma virus in European rabbits13. And, in any case, "only" a 50% or 90% or 99% death rate seems like small comfort, and would leave humanity appallingly vulnerable to further death due to failed infrastructure and war.

Another skeptical response is: "But can't you just Google this, and put it together out of the scientific literature? How does AI buy you anything?" With early AI systems, this was a reasonable response. Anyone who has seriously used today's models (much less far more capable future ASI) knows the response is becoming more reliant on the provision of the unstable guardrails mentioned above. Today's models are increasingly good at synthesizing a wide variety of sources, drawing together knowledge in easy-to-act-on ways. They are still subject to hallucination and have trouble sorting reliable results from unreliable. But they are getting better, and ASI will be far better, especially as it attains more capacity to do physical experimentation and to act in the world. Even if it's "merely putting together knowledge out of the scientific literature", that is in any case much of what an outstanding scientist does. Indeed, AlphaFold and similar systems surpassed the best human scientists at the design of proteins, merely by training on widely-available public data. The underlying tension is: the better the AI systems, the more capable they intrinsically ought to be at solving problems like that in the query. If you are pessimistic about AI's ceiling, and think such systems won't ever be very capable, then you won't be concerned. But if you believe ASI will be extraordinarily capable, like having a team of brilliant scientific advisors, then you must rely on guardrails built into the system, not any fundamental lack of capability.

Another skeptical response: "But are such cults for real? Surely the Aum Shinrikyo cult was a strange one off, or exaggerated or misunderstood? No-one really wants the end of the world!" In fact, there is a long history of individuals or groups who have sought to bring about apocalypse. Often this involves significant mental illness, and the individuals or groups involved aren't especially competent (fortunately). But it doesn't change the fact that world-ending intent is more common than one might a priori assume. Sometimes the efforts involve intended collective suicide by the people hoping to bring apocalypse about; other times those people expect to form part of a small elect who will be saved, often to help make a better world. I won't attempt a history of such efforts here. But I will mention a little about the remarkable history of Aum Shinrikyo, as illustration.

Aum Shinrikyo began in 1987, and grew rapidly, acquiring more than 10,000 recruits by the mid-1990s. Many were educated Japanese in their 20s and 30s, some with significant expertise in science and technology. The cult initially predicted apocalypse in 1997, starting with a nuclear conflict between the US and Japan, which would escalate and cause the destruction of most of humanity. Aum members were to be an elect who were saved. After a failed attempt to gain political power in 1990, top cult leaders shifted from merely predicting apocalypse to actively bringing it about14. They had programs to develop chemical and biological weapons, missiles, and purchased a sheep station in Australia to prospect for uranium. Their chemical and biological weapons efforts were especially successful, and according to historian Charles Townshend15, "police raids found enough sarin in Aum's possession to kill over four million people." They made multiple attempts to deploy sarin gas and other chemical and biological weapons. Fortunately, all had limited success, although dozens of people were killed, and thousands affected. While most cult members were unaware of these efforts, some cult members genuinely were working to instigate apocalypse. Aum were more competent (and thus are better known) than many other groups with apocalyptic intent; part of the concern with ASI is that it may greatly increase the supply of apocalyptic competence.

We've merely explored the surface of this scenario, to give a sample of thinking about ASI xrisk. Of course, this scenario can be developed through many more iterations, going deeper and deeper. That would be valuable, but isn't the point of this text. Instead, I'll finish the section by mentioning the best skeptical response of all: "Even if this scenario is biologically and psychologically plausible, and alignment of single AI systems doesn't buy us much, humanity will prevent it in other ways. For instance, we can do it by controlling biological synthesis equipment, ensuring viral sequences are screened, involving law enforcement and intelligence agencies, and generally deploying all the different institutions we humans use to ensure safety. And we'll have ASI to help us too!"

This is the strategy humanity has historically used to defend against threats: we (mostly) allow innovation, and respond to problems as they arise. This just-in-time coevolutionary response strategy works well, since it means we get the benefits of innovation, while mitigating the risks. It's true there are major technological problems which have been challenging for humanity to deal with in this way – consider asbestos, leaded petrol, climate change, and similar examples. But for each such example there are thousands of innovations where existing institutions largely overcame early problems. In the short term such a just-in-time safety strategy will work well. Indeed, AI and AGI and ASI will lead to many extraordinary beneficial innovations. But over the medium term, this strategy is much harder to defend, for reasons I'll discuss below.

The Vulnerable World Hypothesis

Underlying the biorisk scenario is a more fundamental question: "Is there a 'recipe for ruin', some cheap-and-easy-to-make-and-deploy technology capable of ending humanity? Or causing say > 90% human deaths?" The existence of such a recipe is what Nick Bostrom has called "the Vulnerable World hypothesis"16. While we have survived technological advances so far, the universe offers no guarantee all such advances will be manageable. Like explorers in uncharted territory, we could suddenly find ourselves facing an insurmountable danger. Fortunately, we don't yet know of any recipes for ruin. But an unpleasant parlour game is identifying candidates plausible with foreseeable technology. We've briefly discussed one, the engineered virus discussed in the last section. Let's look even more briefly at a few others. As in the last section, the purpose here isn't to develop detailed, convincing scenarios. It is rather to begin to explore the range of possibilities.

One plausible example is mirror bacteria: hypothetical human-engineered organisms where some existing bacterium has been duplicated, but all molecules have been reversed into their chiral mirror images. Some scientists believe conventional immune defenses would fail to recognize such mirror bacteria, and they would spread unchecked, overwhelming the biosphere, and leading to a mass extinction of many species. Until recently, some basic elements of mirror life were being actively pursued by curiosity-driven scientists. However, many of those scientists recently joined together to publish an important piece in Science17, explaining the dangers, and publicly abandoning their work on mirror life.

Another example begins with easy-to-make nuclear weapons. This sounds implausible. But Ted Taylor, the leading American designer of nuclear weapons, thought they were very plausible, telling the writer John McPhee there is "a way to make a bomb… so simple that I just don't want to describe it". Taylor's remarks, published in a book by McPhee, stimulated at least two people to develop plausible designs for DIY nuclear weapons. These have not, so far as we know, been built, perhaps because the availability of fissile material remains a bottleneck. If that bottleneck can be removed, then it may be very difficult to avoid the widespread proliferation of weapons. Of course, on their own, easy-to-make nukes aren't directly an existential risk. However, they are bad news, and may well trigger other instabilities. A world with fewer than ten nuclear powers can perhaps avoid the use of nuclear weapons; that won't be true in a world with thousands of rogue nuclear actors. And it is easy to develop plausible scenarios in which "small-scale" use by rogue actors escalates to trigger a major nuclear exchange. Whether such an exchange could truly destroy humanity is debated; however, the combination of immediate deaths, nuclear winter, and collapse of global infrastructure would be catastrophic, and at least plausibly threaten human extinction. Taylor believed "Every civilization must go through [its nuclear crisis]… Those that don't make it destroy themselves. Those that do make it wind up cavorting all over the universe"18.

A skeptic of ASI xrisk may point out that: "It's not really ASI you're concerned about here. It's increased scientific and technological ingenuity, harnessed for destruction." This is correct. The Vulnerable World Hypothesis is a broad question about the nature of the universe and technology. It is not about ASI specifically. The fundamental concern is whether dangerous capabilities are latent in our universe, and likely to be discovered. However, ASIs which speed up science and technology will act as a supercharger, perhaps able to rapidly uncover recipes for ruin, that might have otherwise taken centuries to discover, or never have been discovered at all. Today, it needs considerable expertise and resources to work on superviruses, mirror life, or nuclear weapons. But ASI will greatly reduce those requirements. And, as discussed earlier, while single aligned ASIs may refuse to help, proliferation makes this unstable. It's the existence of ASI which is the problem here, regardless of the alignment of single systems. You cannot put an impermeable barrier around understanding and controlling reality, when you have built systems with intellectual capacity beyond von Neumann or Einstein, and which are also exceptionally capable of building and operating in the physical world. And if you can make one such artificial mind, you can scale up to a million, and make them each a thousand times faster. We should expect a major discontinuity in individual capability, unless individuals are denied access to the fruits of these systems.

A skeptical response to the Vulnerable World Hypothesis is: "Well, most people don't want to blow up the world. So provided you can keep tabs on the relatively few actors who perhaps do, you should be okay." That's fair enough. Unfortunately, a lot of people do strongly desire power and ability to dominate others. It seems to be a strong inbuilt instinct, which we see from the everyday minutiae of human behaviour, to the very large: e.g., colonial powers destroying or oppressing indigenous populations, often not even out of malice, but indifference: they are inconvenient, and brushed aside because they can be. We humans collectively have considerable innate desire for power we can use over people defenseless to stop it. But in a Vulnerable World, the competitive drive to ratchet up that exercise of power to dangerous levels will be enormous.

A common response to concerns about ASI risk is to propose "uplift" approaches — using brain-computer interfaces, genetic engineering, or other enhancement technologies to augment human capabilities. While this seems superficially promising, it misunderstands the fundamental issue posed by the Vulnerable World Hypothesis. The problem isn't whether intelligence is carbon or silicon-based, but about increased intellectual capability leading to increased power and access to catastrophic technologies. Uplift approaches seem likely to increase rather than decrease the danger of developing recipes for ruin. And it creates other problems –– e.g., BCI companies (or regulators) able to re-shape human thought in dictatorial ways. The issue isn't which substrate intelligence runs on, but the dual-use nature of deep understanding of reality19.

Let's return to what I described in the last section as the best skeptical response, but in the more general context of the Vulnerable World hypothesis: "Sure, maybe we live in a Vulnerable World, and unrestricted ASI may help discover recipes for ruin. However, those technologies won't arise in a vacuum. Humanity will see many smaller problems along the way, and respond to manage the issue, same as we ever have. Capabilities and safety naturally co-evolve. Let's address problems as they arise, just as we did with the ozone hole and the Montreal Protocol, or nuclear weapons and the non-proliferation treaty. Not worry prematurely about abstract future concerns. Institutional safety and real-world barriers will keep us safe."

Such actions are important. But it assumes the capabilities of ASI will improve slowly enough for our institutions to adapt. History suggests technological discontinuities can shock even top experts and leave institutions out of their depth. Recall Admiral William Leahy telling President Truman that the atomic bomb project was 'the biggest fool thing we have ever done. The bomb will never go off, and I speak as an expert in explosives.' Leahy had taught physics and chemistry at Annapolis and was Chairman of the Joint Chiefs. He was completely wrong. Sometimes discontinuities happen. ASI will be a genie whose consequences, some positive and some very negative, can't easily be put back in the bottle. And, as the nuclear example suggests, relatively simple offensive technologies can sometimes be very difficult to defend against; there is no intrinsic requirement for symmetry.

Loss of control to ASI

The most discussed instance of the Vulnerable World Hypothesis is the rogue ASI takeover scenario. This is sometimes summarized briefly as: "Just as chimpanzees ought to be wary of human beings, so human beings ought to be wary of creating more capable potential successor entities". Insofar as ASIs are agents with their own agenda, and have some freedom to act, they may end up taking more and more control, and eventually using Earth for their own ends, even if those ends damage humanity. Power, once ceded, is difficult to reclaim.

These loss-of-control arguments have been widely discussed, in influential writings like those of Nick Bostrom, Stuart Russell, and Eliezer Yudkowsky, and more recent writing like the scenario recently released by Daniel Kokotajlo et al, as well as in many other places20. And so I'll merely briefly examine a few of the standard skeptical responses, mostly to deepen the context of discussion. With that context, I'll then discuss ways in which over-emphasizing loss-of-control may lead people to miss the broader challenge posed by the Vulnerable World Hypothesis, and has thus led to mistakes in the strategy taken toward existential risk.

A standard skeptical response is21: "But we won't make agents with their own interests and power-seeking drive". This seems implausible for several reasons. Creating such agents will be intellectually satisfying for developers, and economically profitable for investors, at least in the short run. Romance-bots, personal assistants, agents of persuasion, military robots, and financial trading systems will all function "better" when endowed with considerable agency. This process is already well underway, driven by powerful market incentives that reward increasing agency without equivalent rewards for safety.

Another skeptical response is: "We won't be willing to cede much power to inhuman entities". History suggests otherwise. We already delegate life-and-death capability to guided missiles, which sometimes make fatal errors. We entrust financial decisions to trading algorithms, which contribute to events like the 2010 Flash Crash, where markets lost more than a trillion dollars in 36 minutes. Perhaps most strikingly, the Soviet Union's Dead Hand system was designed to launch nuclear counterstrikes automatically if the Soviet leadership was killed. We have already delegated world-ending authority to automated systems.

Another skeptical response is: "We can simply turn them off". This is unrealistic. Could we "turn off" platforms like Facebook and X if they were net negative for our society? It's not so easy: they are deeply integrated into our society, and powerful interests become aligned with them over time, making them harder to control. A better skeptical response is that we will control these ASI systems using many of the same mechanisms we use to control other powerful actors: institutions, norms, laws, and education. However, that is only viable if those governance mechanisms keep pace with the capabilities of the powerful actors. That's going to be an enormous challenge!

The rogue ASI scenario is the most commonly discussed xrisk from ASI. This is a mistake, for two reasons. One reason is that people fixate on whether ASIs getting out of control is plausible. For instance, often they're used to technology being something human beings ultimately control, and find it hard to believe that might one day no longer be true. Unfortunately, if they don't buy loss-of-control scenarios they may then dismiss xrisk22. But the fundamental underlying issue isn't machines going rogue (or not), it's the power conferred by the machines, whether that power is then wielded by humans or by out-of-control machines. That is, the issue is with the Vulnerable World Hypothesis, considered broadly, not rogue ASI.

A second reason to dislike overemphasis on rogue ASI scenarios is that it leads to badly mistaken actions. Many well-intentioned people, worried about rogue ASI, have gone into alignment and interpretability work. Organizations fund companies developing "aligned" AGI and ASI. These actions make sense if retaining control of ASI is the key issue. However, if the fundamental challenge is the dual-use nature of reality, then those actions will ultimately be counterproductive. It is not control that fundamentally matters: it's the power conferred. All those people working on alignment or control are indeed reducing certain kinds of risk. But they're also making the commercial development of ASI far more tractable, speeding our progress toward catastrophic capabilities. In the short term, these well-intentioned researchers will help create many extraordinary new things in the world (and be well rewarded for it). But they will also help build a largely invisible latent overhang in dangerous capabilities, which our safety-supplying institutions will struggle to keep pace with. In this view, alignment work is a form of market-supplied safety. But it leaves critically undersupplied crucial non-market parts of safety – preventing proliferation of easy development of dangerous technologies, up to and including recipes for ruin.

When I point this out people sometimes misunderstand me as saying "Oh, you're saying rogue ASI won't occur, and that's wrong because […]". Or they say "Oh, you're worried about misuse, not agentic ASI". I am emphatically not saying either of these things. I'm saying the issue of whether ASI gets out of control is not fundamental to the discussion of whether ASI poses an xrisk or how to avert it. Treating it as fundamental has led to bad mistakes. But overemphasis on this particular scenario (a) has led many people to mistakenly reject xrisk; and (b) many of those who have accepted xrisk have focused on alignment work, reasoning that it prevents rogue ASI, while paying insufficient attention to the fact that it speeds up ASI without addressing the fundamental existential risks.

Conclusion

That's a small taste of scenarios for existential risk from ASI. Such scenarios can be developed much further, but uncertainties remain. Some argue there's insufficient evidence to justify serious concern, wanting concrete, direct proof of catastrophe before they're persuaded. Others, myself included, see such scenarios, despite some uncertainties, as sufficiently compelling to warrant deep concern. Thus, strong disagreement about ASI xrisk arises from differing thresholds for conviction and comfort with reasoning that is in part based on toy models and heuristic arguments.

A similar disagreement affected anthropogenic climate change in the early twentieth century. Leading scientists like Arrhenius and Angstrom reached opposite conclusions because they believed different toy models. The result was sharp disagreement among scientists about the plausibility of climate change. It wasn't until the 1950s and 1960s that satisfactory models began to be developed. Much observation and improvement of those models led in the 1980s and 1990s to the modern consensus that human emissions of greenhouse gases are warming the climate. By contrast, for ASI we're still at the toy model stage. Furthermore, while climate can plausibly be predicted using detailed physical models, ASI is subject to a wildcard factor, of ASI acting in some decisive way that we intrinsically can't predict in advance, since ASI is by definition far superior to humans in intellect. There is, as Vernor Vinge so memorably put it, an "opaque wall across the future".

There are many other challenges. Much writing about ASI xrisk has been bombastic, overly speculative, or alarmist without being constructive23. It's sometimes full of overconfident assertions or outright holes, and may confuse narratively plausible fiction with fact. None of this inspires confidence, and many people dismiss ASI xrisk as just another misleading doom narrative, often thinking of earlier panics over the population bomb or the Y2K bug.

On the other side, I've met many ASI xrisk skeptics who are dismissive, but who have also never seriously attempted to understand the best forms of the arguments they confidently dismiss. They often either refuse to engage at all, or only engage defensively, looking for minor holes that can justify dismissal, rather than trying to understand the best form of the argument.

Reasons for dismissal vary from person to person, but a few motives recur. One barrier is the strangeness of the idea of xrisk. Usually it's a good idea to ignore wild-sounding scenarios! But sometimes the world really does change. One thinks of aristocrats in 1780s France, utterly confident in their position, and unable to conceive of the guillotining of the King and the French Republic, just a few years away.

Another barrier is the sheer unpleasantness of engaging with scenarios involve large-scale death. People naturally don't want to believe it's plausible, or to think about it24. Add to that the difficulty of assessment: if you're not a biologist or chemist or physicist or computer security researcher, it's difficult to know how seriously to take many of the scenarios. You may lack important expert intuitions for how much power is latent in the world25.

These barriers to engagement are reinforced by powerful incentives in the opposite direction. In the very near future, AI will (mostly) lead to improvements in the world, some enormous – new medical treatments! Wondrous new materials! More productivity in many sectors! These benefits will be real and profound, not hypothetical – they're natural extensions of capabilities already emerging. We'll solve many problems along the way, proving the just-in-time approach to safety works for many challenges. It's going to be amazing, and very exciting. In the short term, the people who worry about xrisk are mostly going to keep seeming wrong, and like they're holding back progress. It's natural to align with the seemingly optimistic vision, and then dismiss or ignore your apparent outgroup. You're aligning with a tribe that appears optimistic, and this also appears good for you personally – betting on AI will increase your wealth, power, and status, at least in the short term. Furthermore, economic growth is powerful! Far more capital will flow toward companies making capability advances rather than to organizations focusing on non-market safety. Human beings naturally align to anticipated power and "success," often confusing power and glamour with right action26, especially when the downsides currently seem intangible. Capital reshapes people, aligning their beliefs with short-term corporate interests rather than with humanity's interests as a whole. What we see in the world is what gets amplified. It's no wonder so many people are so dismissive.

But reality doesn't care about human psychology. When alignment to anticipated power will lead to unhealthy outcomes, a thriving civilization requires people willing to act in defiance of the zeitgeist, not merely follow the incentive gradient of immediate rewards. I believe the arguments for xrisk are good enough that there is a moral obligation for anyone working on AGI to investigate this risk with deep seriousness, and to act even if it means giving up their own short-term interests.

So, what to do? Many ongoing efforts aim to reshape institutions and broader incentives to increase the supply of safety. These go under different rubrics with different foci, including differential technological development27, d/acc28, and coceleration29. All take seriously the potential for recipes for ruin and prioritize safety to head off such a possibility. The challenge is that in the short term, this work will often be less well-paid and less prestigious than working on AI models. We have institutions which reward work improving technology (including aligning it); when deeper understanding of reality is dual-use, this inadvertently rewards actions that increase existential risk. Making these efforts work will be extremely difficult, and success is not guaranteed. But the most meaningful contributions to humanity have often come from those willing to do what is right, even when it wasn't incentivized in the short term. Thanks for your attention.

Acknowledgements

Thanks to Ted Chiang, Toby Ord, and Hannu Rajaniemi for conversations which improved this piece.

Footnotes


  1. Quoted in: Craig S. Smith, "Geoff Hinton, AI's Most Famous Researcher, Warns Of ‘Existential Threat’ From AI" (2024).↩︎

  2. Quoted in: Ange Lavoipierre, "AI's dark in-joke" (2023).↩︎

  3. Sam Altman, "Machine intelligence, part 1" (2015).↩︎

  4. It is interesting to ponder who "our" is here – whose control, exactly? The quotes are from: Christopher Mims, "This AI Pioneer Thinks AI Is Dumber Than a Cat" (2024); and: John Thornhill, [[https://www.ft.com/content/30fa44a1-7623-499f-93b0-81e26e22f2a6 ]["AI will never threaten humans, says top Meta scientist"]] (2023).↩︎

  5. See Ng's post on LinkedIn and: Michael Shermer, "Artificial Intelligence Is Not a Threat—Yet" (2017).↩︎

  6. Marc Andreessen, "Why AI Will Save the World" (2023).↩︎

  7. This argument sometimes seems likely a case of motivated reasoning. Multiple people have told me things which amount to "Geoff Hinton is a doomer because he can get rich and famous on the speaking circuit" (that's close to word-for-word an exact quote from one of them, the cofounder of an AI company). It's unclear why Hinton would give up the opportunity for tens or hundreds of millions of dollars in equity to make hundreds of thousands from speaking. I believe many key people driving AI safety have made large financial sacrifices – people like Hinton, Bengio, Paul Christiano, Dan Hendrycks, Daniel Kokotajlo, David Duvenaud, and many others.↩︎

  8. Defining ASI well is tricky. A slightly more extensive though still informal definition is that ASI means a system capable of rapidly learning to perform far better than an intelligent human being at nearly all intellectual tasks human beings do, broadly construed. This performance is subject to the constraints that "better" be reasonably well-defined, and humans are not already near the ceiling. So, for example, noughts and crosses is out as a test, and somewhat subjective tasks like writing poetry are in a grey zone. It also implies some considerable capacity to act in the physical world. This approach to definition is drawn from: Michael Nielsen, "How to be a wise optimist about science and technology?" (2024). Of course, this doesn't originate with me, many other authors, going back at least to I. J. Good, have also given definitions of what today we would call ASI, with many nuances and variations. But I find this particular definition very useful in practice. Note that ASI isn't a single target. Far from it – once a single system has achieved superintelligence, we can expect a profusion of other such systems.↩︎

  9. Move 37 was an unexpected and novel move played by the AlphaGo system in its match against Lee Sedol: https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol. The move was widely praised as a brilliant innovation in Go. The analogy to Move 37 is now often made in discussions of ASI, and it is useful as an intuition pump. Still, I often ponder: we already live our lives surrounded by ideas and systems we don't understand, made by groups of people who collectively understand those ideas and systems far better than we do. What is the difference in a "pervasive Move 37" world? I believe it will be in part in the speed and scale of the changes, and (eventually) in a loss of comprehensibility of the changes. These will amount to a radical qualitative shift. But the issue is surprisingly subtle, though I don't believe the effects will be.↩︎

  10. From https://x.com/tsarnick/status/1887269391253053914, which I believe to be authentic, although the source video is now private. Incidentally, the responses to that post are fascinating, and document the strong desire of many to have no alignment guardrails at all.↩︎

  11. Anjali Gopal, Nathan Helm-Burger, Lennart Justen, Emily H. Soice, Tiffany Tzeng, Geetha Jeyapragasan, Simon Grimm, Benjamin Mueller, Kevin M. Esvelt , "Will releasing the weights of future large language models grant widespread access to pandemic agents?", https://arxiv.org/abs/2310.18233 (2023).↩︎

  12. A useful retrospective of this work (with further references) is this interview with two of the scientists responsible, Ronald Jackson and Ian Ramshaw, at: https://pmc.ncbi.nlm.nih.gov/articles/PMC2816623/ (2009).↩︎

  13. L. Silvers, B. Inglis, A. Labudovic, P. A. Janssens, B.H. van Leeuwen, and P. J. Kerr, [[https://www.sciencedirect.com/science/article/pii/S0042682205008056]["Virulence and pathogenesis of the MSW and MSD strains of Californian myxoma virus in European rabbits with genetic resistance to myxomatosis compared to rabbits with no genetic resistance"]] (2006).↩︎

  14. A useful summary with many references is: Philipp C. Bleek, "Revisiting Aum Shinrikyo: New Insights into the Most Extensive Non-State Biological Weapons Program to Date" (2011). It's worth noting that information about the cult has only gradually been uncovered, and earlier reports often emphasize only the prediction of apocalypse, not attempts to help instigate.↩︎

  15. Charles Townshend, "Terrorism: A Very Short Introduction" (2018).↩︎

  16. Nick Bostrom, "The Vulnerable World Hypothesis" (2019). The term "recipe for ruin" comes from: Michael Nielsen, "Notes on Existential Risk from Artificial Superintelligence" (2023).↩︎

  17. Katarzyna P. Adamala, Deepa Agashe, Yasmine Belkaid, et al, "Confronting risks of mirror life" (2024).↩︎

  18. John McPhee, "The Curve of Binding Energy" (1974).↩︎

  19. This paragraph adapted from: Michael Nielsen, "How to be wisely optimistic about science and technology?" (2024).↩︎

  20. My apologies to the many, many people whose work is overlooked here. The most immediately salient references for these people are: Bostrom's book "Superintelligence" (2014); Russell's book "Human Compatible" (2019); and Yudkowksy's "AGI Ruin: A List of Lethalities". The recent scenario is: Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean, "AI 2027" (2025).↩︎

  21. Or has been. Over the past few years it's gradually become apparent to all but the most diehard holders of this position that it's likely, and won't fail due to lack of effort.↩︎

  22. This seems to be particularly true in response to the "paperclip" problem. E.g., you see this in the irritated responses of people like Steven Pinker, who are otherwise quite sympathetic to concerns about existential risk: https://x.com/sapinker/status/1557072430836944896↩︎

  23. This goes for much prominent writing on both sides. This perhaps reflects the broader media and social media environment.↩︎

  24. This reluctance connects to what Joe Carlsmith has called "deep atheism" – the belief that the universe is fundamentally hostile to intelligent life and provides no inherent safety guarantees. Those who reject deep atheism often implicitly assume the universe has built-in protections against existential catastrophe, that reality wouldn't permit us to discover a "recipe for ruin". The Vulnerable World Hypothesis challenges this comforting assumption. Carlsmith develops the idea in his essay series (effectively, a book) "Otherness and control in the age of AGI", https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi (2024).↩︎

  25. A few months ago, I had a detailed email exchange about ASI xrisk and the vulnerable world with an acquaintance I respect enormously. It was sobering to realize how fundamentally different our intuitions are. Neither of us can prove what we each instinctively suspect to be true (me: the Vulnerable World hypothesis is likely true; him: it's extremely unlikely). It's simply too distant from existing evidence. Instead, our differing intuitions reflect distinct kinds of background expertise and experience. The obvious way we'll find out who's right is when ASI actually arrives – at which point one of us will be shocked to discover why we were mistaken. I hope that person is me.↩︎

  26. Surprisingly many technologists working on AGI seem to believe they're on the right side of history, merely because their work is glamorous and makes them powerful. It's an error to conflate wealth with virtue, or technical brilliance with wisdom. The situation is genuinely complicated: AI systems are going to do a tremendous amount of good for quite some time. But it's an error to jump to "and therefore it's all okay" without deep engagement with the alternative. The issue deserves far more seriousness.↩︎

  27. The term originates in: Nick Bostrom, "Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards" (2002). There is now an extensive literature and many practical efforts. My own small contribution on the subject is: Michael Nielsen, "Notes on differential technological development", https://michaelnotebook.com/dtd/ (2024).↩︎

  28. The term was introduced in: Vitalik Buterin, "My techno-optimism", https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html (2024). A useful recent review: Vitalik Buterin, "d/acc: one year later", https://vitalik.eth.limo/general/2025/01/05/dacc2.html (2025). My notes on his proposal: Michael Nielsen, "Notes on Vitalik Buterin's techno-optimism", https://michaelnotebook.com/vbto/index.html (2024).↩︎

  29. Michael Nielsen, "How to be wisely optimistic about science and technology?" (2024).↩︎