Notes on the Vulnerable World Hypothesis

In 2019, Nick Bostrom published an important paper on "The Vulnerable World Hypothesis" in Global Policy. Loosely speaking, the Vulnerable World Hypothesis (VWH) is the idea that it may be near-inevitable that intelligent species discover near-ungovernable technologies which cause them to wipe themselves out. This idea has been discussed by many prior authors, but to my knowledge Bostrom's paper is the first in-depth analysis. What follows are my working notes on the paper. The notes are not at all comprehensive – they riff on a few elements of the paper, and briefly sketch some ideas inspired by it. The notes discuss, among other things: (1) The Friendly World Hypothesis as an alternative to the VWH; and how attitudes toward the VWH and FWH shape attitudes toward ASI xrisk; (2) A reframing of the Alignment Problem as the problem of aligning the values and institutions of a liberal society (including, crucially, the market) with differential technology development; (3) As a possible solution to the Alignment Problem, I sketch the idea of provably beneficial surveillance; and (4) A sketch of a planetary civilization in which the VWH would be true, but would be unsuspected for a long time.

It is tempting to view the VWH mainly through the lens of concerns about risk from Artificial Superintelligence (ASI). This is due in part to the currency of such concerns, and in part due to Bostrom's well-known contributions on ASI risk¹. However, I believe the VWH is of both broader and independent interest, and it's a mistake to approach it primarily through the lens of ASI. Rather, one should first engage with the VWH directly, on its own merits, and only secondarily consider the relationship to ASI. That's the philosophy I shall (for the most part) use in these notes.

I've previously done some work related to the VWH, discussing what I call recipes for ruin, in my notes on existential risk from ASI², and in a brief sketch of my current interests³. By recipes for ruin I mean simple, easily executed, immensely destructive recipes that are near ungovernable, and could end humanity, or at least wreak catastrophic world-changing damage. That work was done largely before reading Bostrom's paper, and did not engage significantly enough with Bostrom's work. These notes partially redress that error. Although there are differences in our respective conceptions, those are not my primary interest here; it's rather exploring the VWH.

He further defines the "semi-anarchic default condition" to mean a world with very limited capacities for policing and governance, and many actors with diverse motivations:

(Parenthetically, let me say that this is a very interesting definition, since in the first two parts it is defining a type of civilization mostly by the absence of certain qualities. It's also a somewhat problematic definition, since for lifeforms very different from humans categorizations like "governance" or "diverse motivations" might have very different and perhaps rather unclear meanings. With that caveat in mind, let us accept Bostrom's broad framing, and continue.)

It is tempting to dismiss the VWH as idle speculation, or perhaps even as needless catastrophizing. But it's a mistake to dismiss merely because it seems outside our usual experience. One is reminded of Admiral William Leahy, Chief of Staff of the US armed forces at the end of World War II, who said of the atomic bomb: "That is the biggest fool thing we have ever done. The bomb will never go off, and I speak as an expert in explosives." Today, we shake our heads, and may think of Leahy as ignorant. But in fact he was technically knowledgeable: he had taught physics and chemistry at Annapolis, and had extensive experience with the navy's weapons. And the Hiroshima bomb seems a priori implausible: take two inert bodies of material – each small enough to be carried by a single human being – and bring them together extremely rapidly. If you'd never heard of nuclear weapons, would you expect them to explode with city-destroying force? Intuitively, it's absurd. You need to understand a tremendous amount about the world – about relativity, particle physics, and branching processes – before it is plausible. It was that new and rather subtle understanding that revealed a latent possibility for immense destruction, one almost entirely unsuspected just a few decades earlier. Similarly, it is entirely conceivable that the VWH is true for reasons of which we humans are presently ignorant.

"Easy nukes", and a sketch of an advanced planetary civilization which is an unsuspecting Vulnerable World

As an intuition pump for the VWH, Bostrom explores a thought experiment describing an alternate reality in which there is an easy way to make nukes. Historically, the Manhattan Project cost more than $20 billion (in 2023 dollars). However, it is not at all a priori obvious that such a large effort was required. Perhaps there is some other physical or chemical process, of which we were (and are) merely ignorant, which would enable the easy construction of nuclear weapons. Bostrom explores in some detail the horrendous consequences that would likely have ensued had such a process been discovered in the 1930s or 1940s, concluding with considerable understatement: "We were lucky that making nukes turned out to be hard". I won't recount his comments here. I do want to make three brief observations:

This move is very good: one doesn't need to know the answer to a question for the question itself to have value – in this case, do we live in a Vulnerable World or a Friendly World. I've found it helpful to use the term "Friendly World Hypothesis" to mean a (very rough) converse to the VWH. One possible formulation, rather simple:

In practice it's not clear to me what the best formulation for the FWH is. The simplest possibility is to make it the logical negation of the VWH, but several things go wrong with that. The benefit of such an approach is that you end up with a genuine dichotomy. But that's a minor benefit. It's better instead to pick an alternate position which is of intrinsic interest in and of itself, as I have tried to do above, and then to regard there as being a continuum of positions inbetween. I'll come back to this point below.

Aligning a liberal society with differential technology development, safety as a collective action problem, and an AI danger tax

Differential technology development (DTD) is, roughly, the idea of preferring to work on protective technologies over risk-increasing technologies. Here's Bostrom on DTD and the VWH:

There is considerable overlap between Bostrom's emphasis on DTD and Vitalik Buterin's recent manifesto on d/acc. Buterin emphasizes alignment between the market and DTD, but both Buterin and Bostrom leave unsolved the problem of how to do that alignment. We don't want to leave collective safety to purely altruistic motivations: it's best if it's provided by self-interest as well. Yet many aspects of safety and security are (approximately) public goods or collective action problems⁸, and the market as currently constructed undersupplies them. There are many ways one might address this. I am a fan of ongoing efforts – including those by Buterin – to develop new financial instruments that help address such problems. A classic solution from economics is some form of Pigouvian taxes on any technology with negative externalities. We're beginning to see such negative externalities from AI, for example: misinformation polluting the public commons, racism and other forms of bias, and many more⁹. It would be possible to tax AI companies for such harms¹⁰, including any harms incurred by open source models released by those companies¹¹. The situation is complicated, since those models, both open source and non, may also sometimes have large positive spillovers contributing to the common good. A thorough analysis of this problem seems highly desirable. One significant issue is the difference between immediate harms incurred and future harms. In this the situation has some similarities to a carbon tax; it is interesting to ponder the possibility of some sort of AI danger tax.

Friendly versus Vulnerable Worlds as a Crux in Discussion of Xrisk: are Recipes for Ruin Possible? And: A More Powerful Form of the Alignment Problem

Depending on whether you think we live in a Vulnerable World or a Friendly World you are likely to come to very different conclusions about the way the world should be governed. For most of my life I've implicitly believed the Friendly World Hypothesis, and explicitly held to the principle that¹² science is on net good, benefits humanity, and should be pursued aggressively, with constraints only applied locally to obviously negative applications. We may, for example, ban things like leaded gasoline or asbestos, but we certainly don't ban the kind of basic research that forms the background against which discoveries like leaded gasoline and asbestos are made. But I take the Vulnerable World Hypothesis seriously, and it's leading me to doubt the principle of open, aggressive exploration of basic science should be applied forever. And as I begin to doubt the timelessness of that principle, I wonder what it should be replaced with, and when.

Crucially, I'd like any replacement to give us most of the benefits of liberal values and institutions, including the benefits of science and technology, but also without irreversibly exposing us to catastrophic danger. I mentioned above the problem of aligning the market and differential technology development. I suspect this is a more powerful way of expressing the Alignment Problem than in more conventional formulations¹³. Indeed, since "the market" is itself mutable and may be redesigned – it obtains legitimacy only insofar as it genuinely promotes the good – the Alignment Problem is perhaps best phrased as alignment between liberal values and institutions, including the market, and differential technology development. Phrased this way, a solution to the Alignment Problem would automatically entail safe development of programmable biology, ASI, and many other technologies. I struggle with the question of how to do this, but I like that it is a fundamentally optimistic place to stand.

Returning to the Vulnerable-Friendly World dichotomy, I believe this is a fundamental crux underlying many discussions of xrisk, including xrisk associated to both ASI and biosafety. And I don't think most people working on advanced science and technology have seriously grappled with this¹⁴. Indeed, many people gung ho about technology refuse to engage at all, accepting as "obvious" the not-at-all-obvious incumbent position of a Friendly World. Taking that incumbent position made historic sense as we slowly emerged out of a technologically primitive state in which even medium-scale destruction was extremely difficult. But the thrust of the last 100 years of science and technology ought to make us seriously consider that the Friendly World Hypothesis is simply wrong.

Put another way, I think a fundamental question about technology is: do you believe recipes for ruin are possible, that is: do you think inexpensive, easy-to-follow recipes for building catastrophic technologies will one day be found, given sufficient understanding of science and technology? I initially thought people would fall into one of two camps on this question: people who instinctively feel the answer is "yes", and people who instinctively feel the answer is "no". But it turns out there is a third camp: people who really don't want to engage the question. In effect, they're choosing the incumbent position, but without the bother of having to give a good argument for it.

(I've been framing this as a dichotomy between a Friendly World and a Vulnerable World. Of course, as mentioned earlier, it's actually a continuum. Nonetheless, many positions on the governance of technology implicitly and sometimes explicitly take one or the other position.)

How Intuitions about (ASI) Xrisk and the VWH are Grounded in an Individual's Background Expertise

There is a related challenge in the discussion of ASI xrisk. There are no real "experts on ASI", for the same reason the 18th century had no real experts on automobiles. But experts in diverse fields are trying to use their (different) expertises to understand the impact of AI and ASI. Economists analyze AI and ASI as an economic problem: how will it restructure labour markets? Will it concentrate wealth? Sociologists analyze it as a sociological problem: will it cause unrest? What will it do to families and relationships? Lawyers as a legal problem: how should we think about copyright in the age of generative AI? Who will be culpable for abuses? Biologists as a biological problem: what new biothreats and biomedicines will AI enable? Physicists as a physical problem: are there new types of reaction or phase of matter or material waiting to be discovered, perhaps with explosive consequences? Security experts as a security problem: what new vulnerabilities will ASI create? And so on¹⁵.

And, of course, the problem of ASI is none of these; or perhaps more accurately, it's all of these things, and much more. It affects our ability to change the universe, which transcends disciplines.

Thus, the type of problem ASI seems to pose depends in considerable part on someone's prior expertise. Even more than the details, different types of expert will bring a different type of intuition about what kinds of phenomena are latent in nature, are possible (or not), and what kinds of defenses may be worked out (or not)¹⁶. Much dismissiveness about xrisk comes from people without those intuitions: an economist, for example, simply will not have the same intuitions as a security expert, and vice versa. A physicist won't have the intuitions of an economist, and vice versa. Indeed, many people have very little intuition for emergent phenomena¹⁷ at all. A historically-minded physicist, chemist, or biologist comes to understand through repeated exposure the way very simple rules may give rise to surprising, even shockingly unanticipated phenomena – from fire to superfluidity to nuclear chain reactions to public key cryptography. This breeds a very healthy respect for both the power and the surprise latent in nature. I suspect most people have a much weaker sense of this latent power and surprise. Furthermore, I expect ASI will both discover and create new levels of emergent phenomena, governed by as-yet unsuspected rules; some types of problem will actually be entirely beyond our (present, and perhaps future) comprehension at all, involving as-yet undiscovered areas of expertise. It is also notable that many of the staunchest believers in ASI xrisk are not experts in any relevant science, but instead source their intuition in science fiction. This is a fine thing, but also something to be cautious of, since what is narratively plausible is often not the same as what is true. Nature is simultaneously more constrained and more imaginative than we.

I've been framing this in terms of ASI xrisk, but similar remarks apply to the Vulnerable World Hypothesis. The extent to which one believes (or disbelieves) the VWH depends in considerable part on intuitions acquired from one's prior expertise. This creates many issues, and is obviously not a terribly satisfactory situation, but it is worth noting.

Responding to the VWH, and Provably Beneficial Surveillance

Suppose you accept the VWH as reasonably likely to be true. How should humanity respond? Bostrom investigates and rejects a number of possible solutions, as not sufficient to the task. He eventually argues that humanity would need much stronger policing and much stronger global governance:

This is, as Bostrom says several times, an extreme solution, something best considered if the VWH is highly likely to be true, and the development of some catastrophic technology is imminent. It's certainly not at all clear any benefit is worth the cost! "The good news is you don't need to worry about the world being destroyed; the bad news is you now live under Big Brother". We really ought to be extremely hesitant to accept such a regime, and, indeed, I suspect there are other ways out. (I won't discuss that here, since while it is easy to generate ideas, it's not yet clear to me which is most promising.) But I do want to think in more depth about Bostrom's proposal, and move beyond reflexive dismissal. Most of the trouble is caused by the proposal for ubiquitous surveillance. This has come to seem anathema, for good reason – the horrendous history of the Stasi, the KGB, the purges and the Gulag, and many other institutions derived from surveillance. Orwell's "1984" was a warning, not an instruction manual, and ideas like "freedom officers" and "freedom tags" seem straight out of Orwell.

Despite this reflexive response, I think it's worth engaging imaginatively with the question: is it possible to design a surveillance system which genuinely enhances liberty? Not in some cynical, newspeak 1984 style, but truly? I know some people are so ideologically opposed to this that they will be unable to seriously explore the possibility, even as a hypothetical. To those people I advise skipping ahead to the final section.

To the others: I note that that there are some surprisingly beneficial and unmalevolent surveillance regimes already in existence. I am rather surprised at how relatively little harm Apple, Google, Microsoft and the NSA seem to have done to individuals, given the extraordinary level of information they have about their activities. Apple knows far more about me than the Stasi did about the median citizen, or even most citizens of interest. And while Apple doesn't have life and death power over me, they could legally do some pretty horrible things. And yet for the most part this seems to be extremely rare. I certainly don't mean to say abuses don't happen. But I do mean: if in 2000 you'd told me surveillance capitalism would by 2023 result in orders of magnitude worse abuses than we've seen, I would not have been surprised. I find myself quite curious why there hasn't been a much worse slide.

I most certainly do not propose the NSA or surveillance capitalism as a model to emulate. However, it does suggest two questions: (1) why do they (seemingly) do much less harm than organizations like the Stasi; and (2) is it possible by design to prevent the harms they do, while protecting against the catastrophically damaging actions?

I asked above whether safe differential technology development is possible while aligned with the values and institutions of a liberal society. If we accept the VWH and are considering surveillance seriously, then we might ask whether it is possible to develop provably beneficial surveillance? This is in response and counterpoint to calls for provably safe AI¹⁸. While provably safe AI is a stimulating idea, it seems to me to only work if ASIs are created by good actors. If bad actors start producing ASIs, then we're in a lot of trouble.

But suppose instead we replaced that goal by the goal of provably beneficial surveillance. Surveillance is much more naturally a monopoly than is computation; it is also much more naturally preventative. "Beneficial" would, of course, need to be defined, but would include many of the things we take for granted: freedoms and liberties to act of various kinds; the maintenance of many types of privacy; the ability to develop most science and technology. But it would also include the capacity to detect and suppress catastrophic risks. Might it be possible to develop a surveillance regime that simultaneously guaranteed liberties as good or better than we today enjoy, and yet which simultaneously suppresses catastrophic risks¹⁹? Put in a nutshell: why not instead shift the focus from developing safe ASI to developing provably beneficial surveillance, with safe ASI as a consequence?

Note that the use of the term "provable" here is not (quite) meant in the sense of a mathematical proof. Indeed, I don't think that's possible, since it refers to the real world, not the perfect world of mathematics. Rather, it's to suggest very strong guarantees – mathematical proofs, when appropriate, and very strong real-world safeguards, when that is²⁰.

How to develop provably beneficial surveillance? It would require extensive work beyond the scope of these notes. It is worth noting that most existing surveillance regimes are developed with little external oversight, either in conception, or operationally. They also rarely delegate work to actors with different motives in a decentralized fashion. And they often operate without effective competition. I take these facts to be extremely encouraging: they mean that there is a lot of low-hanging fruit to work with here, obvious levers by which many of the worst abuses of surveillance may be reduced. Classic surveillance regimes have typically prioritized the regime, not humanity at large, and that means the design space here is surprisingly unexplored. It's interesting to consider messaging apps like Signal as instances of what happens when the overseeing authority genuinely attempts to prioritize users. I am likely to return to this in more detail in the future.

Miscellaneous thoughts

Acknowledgments

Thanks to Damon Binder, Laura Deming, Andy Matuschak, Evan Miyazono, and Andrew Snyder-Beattie for related conversations.

Footnotes

Perhaps most notably: Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014).↩︎
Michael Nielsen, "Notes on Existential Risk from Artificial Superintelligence", https://michaelnotebook.com/xrisk/index.html (2023).↩︎
Michael Nielsen, "Brief remarks on some of my creative interests", https://michaelnotebook.com/ti/index.html (2023).↩︎
See: "What Nuclear Weapons Delivery Systems Really Cost", https://web.archive.org/web/20231020161223/https://www.brookings.edu/what-nuclear-weapons-delivery-systems-really-cost/↩︎
See Michael Singer, David Weir, and Barbara Newman Canfield, "Nuclear Nightmare: America's Worst Fears Come True" (1979).↩︎
I first heard this argument in grad school, but don't know where it was first written down, or where the strongest form of the argument has been made. Google shows one reasonable form of the argument here: https://www.askamathematician.com/2012/11/q-could-kurt-vonegets-ice-9-catastrophy-happen/↩︎
I have not done the detailed calculations necessary to check this, apart from a few basic sanity checks about the way magnesium oxidizes, and the way magnesium burns in CO2. It would be an interesting challenge to do detailed calculations, and if those checked out, to do the experiments to test this. Note that while those experiments would require considerable individual safety precautions, on earth they wouldn't pose any significant large-scale risks.↩︎
There are several senses in which this is the case. Speaking loosely and all-too-briefly, one axis is the following: for ASI development, everyone needs to refrain from doing something which is in their (short- and maybe longer-)term interest, but which might well be devastating to everyone else. That is, everyone individually has an incentive to participate (insofar as they can) in the creation of AGI and ASI, even if that proposes an enormous collective risk.↩︎
Many people are concerned about both long- and short-term risks from AI and ASI. However, I have met a few people concerned about long-term risks who are dismissive about short-term risks. I believe this is a (terrible) mistake. I don't think anyone who doesn't want to bend over backward to limit immediate harms from AI can be trusted to address long-term harms.↩︎
I don't exactly anticipate enthusiasm for this among the less insightful AI boosters; the more insightful, of course, are seriously engaged with safety, and might be expected to engage more thoughtfully.↩︎
A challenge is not-for-profit projects, perhaps organized by unfunded (or near-unfunded) ad hoc and perhaps even anonymous groups. I expect both that such projects will exist in future, and will do substantial harm. Still, it is perhaps best to separate concerns, and to analyze and (perhaps) deal with that possibility separately.↩︎
The remainder of this paragraph is adapted from: Michael Nielsen, "Brief remarks on some of my creative interests", https://michaelnotebook.com/ti/index.html (2023).↩︎
Allow me to mention Brian Christian's excellent book "The Alignment Problem" as an introduction to thinking on this topic.↩︎
It remains a curious fact that the best cases against xrisk from ASI come from people who are themselves very concerned about xrisk from ASI. The two I have in mind are: Katja Grace, Counterarguments to the basic AI x-risk case (2022); and David Scott Krueger, A list of good heuristics that the case for AI x-risk fails (2019). I'm yet to read a good case from someone who believes that there is very little xrisk from ASI. I suspect most such people simply haven't considered the possibility seriously, but prefer the default posture ("scientific advance is by default good"), which is why their dismissals tend to be so low quality. Incidentally, it is also true that many people worried about ASI xrisk understand poorly the case against it. Motivated reasoning is common on all sides of this discussion.↩︎
Complementing each of these is also a focus on the opportunities of ASI. Again, what people see depends on their background. But that is not my main focus in connection with the VWH.↩︎
This is especially interesting when an offensive technology may come from one type of expertise, but the relevant defense from another. One sees this often in current analysis of climate risks: it is common for paleoclimate experts to have a good understanding of climate change (on which they are an expert), and yet to apparently be rather uninformed about the possibilities for carbon capture and sequestration, which may require quite different expertise. This has changed over time, as knowledge has diffused across disciplines, but it seems to me to have been a real effect.↩︎
I am using the term "emergent" here in the sense of Phil Anderson, not the (in many ways much weaker) sense which has recently become common in AI. This overlap in terminology is somewhat unfortunate for the point I'm making, since many of the AI people using the term emergent have, as far as I can tell, little intuition for the original form of emergence.↩︎
Notable proposals include: Stuart Russell, "Provably Beneficial Artificial Intelligence", https://people.eecs.berkeley.edu/~russell/papers/russell-bbvabook17-pbai.pdf; and: Max Tegmark and Steve Omohundro, "Provably safe systems: the only path to controllable AGI", https://arxiv.org/abs/2309.01933 (2023). Ideas in a similar vein are being pursued by the Atlas Computing Initiative, and likely others of whom I am not aware.↩︎
Cf: David Brin, "The Transparent Society" (1998); Bruce Schneier, "The Myth of 'The Transparent Society'" (2008); and Steve Mann, Jason Nolan, and Barry Wellman, "Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments" (2003).↩︎
I considered many variations, along the lines of "Guaranteed-to-be-beneficial surveillance". They have an interesting problem, which is that words tend to lose some of their power over time. I felt that "provable" was the best compromise here.↩︎
This is discussed in: Carl Schulman, "Envisioning a world immune to global catastrophic biological risks" (2020). Thank you to Andrew Snyder-Beattie and Damon Binder for pointing this out to me.↩︎
It's not strictly relevant, but occurred to me while writing these notes: most technologies will never be discovered. This seems inevitable in a world where the search space for material and information objects is exponentially large, and NP-complete problems are likely to be intractable even on any foreseeable computer. Put another way: it seems likely there are many powerful technologies the universe affords, but which will never be discovered, no matter how much intelligence or understanding or computational power is brought to bear. It's as though the universe's capacity for technology – the set of possible affordances for the universe – is strictly larger than its capacity for discovery. I find this an extraordinary thought, and yet it's almost obviously true.↩︎
The plural of Kevin Kelly's useful term technium, the entire accumulation of technologies that humans have created.↩︎
There is, of course, a very long history of proposals for global governance and even a unipolar world order. Three that I've enjoyed recently are those of Oppenheimer, von Neumann, and Brand; all are in response to catastrophic or even existential risk.↩︎