xrisk
ai
notes

Notes on the Vulnerable World Hypothesis

Michael Nielsen

Astera Institute
December 16, 2023

Of course, not all scientists accept the notion that other advanced civilizations exist. A few who have speculated on this subject lately are asking: if extraterrestrial intelligence is abundant, why have we not already seen its manifestations?.. Why have these beings not restructured the entire Galaxy for their convenience?… Why are they not here? The temptation is to deduce that there are at most only a few advanced extraterrestrial civilizations - either because we are one of the first technical civilizations to have emerged, or because it is the fate of all such civilizations to destroy themselves before they are much further along.

It seems to me that such despair is quite premature. – Carl Sagan (1978)

In 2019, Nick Bostrom published an important paper on "The Vulnerable World Hypothesis" in Global Policy. Loosely speaking, the Vulnerable World Hypothesis (VWH) is the idea that it may be near-inevitable that intelligent species discover near-ungovernable technologies which cause them to wipe themselves out. This idea has been discussed by many prior authors, but to my knowledge Bostrom's paper is the first in-depth analysis. What follows are my working notes on the paper. The notes are not at all comprehensive – they riff on a few elements of the paper, and briefly sketch some ideas inspired by it. The notes discuss, among other things: (1) The Friendly World Hypothesis as an alternative to the VWH; and how attitudes toward the VWH and FWH shape attitudes toward ASI xrisk; (2) A reframing of the Alignment Problem as the problem of aligning the values and institutions of a liberal society (including, crucially, the market) with differential technology development; (3) As a possible solution to the Alignment Problem, I sketch the idea of provably beneficial surveillance; and (4) A sketch of a planetary civilization in which the VWH would be true, but would be unsuspected for a long time.

It is tempting to view the VWH mainly through the lens of concerns about risk from Artificial Superintelligence (ASI). This is due in part to the currency of such concerns, and in part due to Bostrom's well-known contributions on ASI risk1. However, I believe the VWH is of both broader and independent interest, and it's a mistake to approach it primarily through the lens of ASI. Rather, one should first engage with the VWH directly, on its own merits, and only secondarily consider the relationship to ASI. That's the philosophy I shall (for the most part) use in these notes.

I've previously done some work related to the VWH, discussing what I call recipes for ruin, in my notes on existential risk from ASI2, and in a brief sketch of my current interests3. By recipes for ruin I mean simple, easily executed, immensely destructive recipes that are near ungovernable, and could end humanity, or at least wreak catastrophic world-changing damage. That work was done largely before reading Bostrom's paper, and did not engage significantly enough with Bostrom's work. These notes partially redress that error. Although there are differences in our respective conceptions, those are not my primary interest here; it's rather exploring the VWH.

Bostrom defines the VWH as follows:

If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semianarchic default condition.

He further defines the "semi-anarchic default condition" to mean a world with very limited capacities for policing and governance, and many actors with diverse motivations:

  1. Limited capacity for preventive policing. States do not have

sufficiently reliable means of real-time surveillance and interception to make it virtually impossible for any individual or small group within their territory to carry out illegal actions – particularly actions that are very strongly disfavored by > 99 per cent of the population.

  1. Limited capacity for global governance. There is no reliable

mechanism for solving global coordination problems and protecting global commons – particularly in high-stakes situations where vital national security interests are involved.

  1. Diverse motivations. There is a wide and recognizably

human distribution of motives represented by a large population of actors (at both the individual and state level) – in particular, there are many actors motivated, to a substantial degree, by perceived self-interest (e.g. money, power, status, comfort and convenience) and there are some actors (‘the apocalyptic residual’) who would act in ways that destroy civilization even at high cost to themselves.

(Parenthetically, let me say that this is a very interesting definition, since in the first two parts it is defining a type of civilization mostly by the absence of certain qualities. It's also a somewhat problematic definition, since for lifeforms very different from humans categorizations like "governance" or "diverse motivations" might have very different and perhaps rather unclear meanings. With that caveat in mind, let us accept Bostrom's broad framing, and continue.)

It is tempting to dismiss the VWH as idle speculation, or perhaps even as needless catastrophizing. But it's a mistake to dismiss merely because it seems outside our usual experience. One is reminded of Admiral William Leahy, Chief of Staff of the US armed forces at the end of World War II, who said of the atomic bomb: "That is the biggest fool thing we have ever done. The bomb will never go off, and I speak as an expert in explosives." Today, we shake our heads, and may think of Leahy as ignorant. But in fact he was technically knowledgeable: he had taught physics and chemistry at Annapolis, and had extensive experience with the navy's weapons. And the Hiroshima bomb seems a priori implausible: take two inert bodies of material – each small enough to be carried by a single human being – and bring them together extremely rapidly. If you'd never heard of nuclear weapons, would you expect them to explode with city-destroying force? Intuitively, it's absurd. You need to understand a tremendous amount about the world – about relativity, particle physics, and branching processes – before it is plausible. It was that new and rather subtle understanding that revealed a latent possibility for immense destruction, one almost entirely unsuspected just a few decades earlier. Similarly, it is entirely conceivable that the VWH is true for reasons of which we humans are presently ignorant.

"Easy nukes", and a sketch of an advanced planetary civilization which is an unsuspecting Vulnerable World

As an intuition pump for the VWH, Bostrom explores a thought experiment describing an alternate reality in which there is an easy way to make nukes. Historically, the Manhattan Project cost more than $20 billion (in 2023 dollars). However, it is not at all a priori obvious that such a large effort was required. Perhaps there is some other physical or chemical process, of which we were (and are) merely ignorant, which would enable the easy construction of nuclear weapons. Bostrom explores in some detail the horrendous consequences that would likely have ensued had such a process been discovered in the 1930s or 1940s, concluding with considerable understatement: "We were lucky that making nukes turned out to be hard". I won't recount his comments here. I do want to make three brief observations:

  1. The energy cost of nukes is surprisingly low: Certainly, the intrinsic energy cost of a nuclear weapon is much lower than I a priori thought. A 1 kilotonne yield is a little over 4 trillion Joules. For comparison, 1 kilowatt hour of electricity – wholesale price likely in the ballpark of 5 cents – is almost 4 million Joules. If you could efficiently convert electricity to nuclear explosive energy, the cost would thus be on the order of $50,000 per kilotonne. Clearly, the reason for the immense cost of nuclear weapons isn't the energy input. In fact, modern nuclear weapons actually beat this limit. For instance, a Brookings Institute study4 estimated the cost of a B83 thermonuclear bomb (yield up to 1.2 megatonnes) at $4.9 million 1996 dollars, including research, development, testing, and evaluation, though excluding operations and post-deployment costs. That's about $8,000 per kilotonne in 2023 dollars. I find this an astonishingly low figure!

  2. Summoning such technologies by naming them: America's pre-eminent designer of nuclear weapons was the physicist Ted Taylor. The writer John McPhee wrote a remarkable book profiling Taylor, "The Curve of Binding Energy". An occasional theme of the book is the use of nuclear weapons to devastate targets in the US. Taylor and McPhee travel to the old World Trade Center in Manhattan, pre-911, and discuss the relative ease of building a small bomb to knock it down. Taylor says: "I can’t think in detail about this subject, considering what would happen to people, without getting very upset and not wanting to consider it at all… And there is a level of simplicity that we have not talked about, because it goes over my threshold to do so. A way to make a bomb. It is so simple that I just don't want to describe it. I will tell you this: Just to make a crude bomb with an unpredictable yield—but with a better than even chance of knocking this building down—all that is needed is about a dozen kilos of plutonium-oxide powder, high explosives (I don’t want to say how much), and a few things that anyone could buy in a hardware store." After reading the above interview, a physicist named Peter Zimmerman actually carried out the design5. It is fortunate for us that, then and now, no easy way of making plutonium-oxide powder is known, and its distribution is controlled under the nuclear non-proliferation treaty. In folk stories demons can sometimes be summoned merely by saying their True Name; a somewhat similar effect may hold for recipes for ruin, where merely knowing a True Name seems to be almost enough to summon it. Certainly, this seems to have been true with AI safety, where early warnings of the dangers of AI seem to have inspired several very power-oriented people to start what have become the leading AI companies.

  3. The role of directed design: It's tempting to make an anthropic argument for why "easy nukes" are not possible: "If it were truly trivial, then they would already have happened, and humanity would not be here". It's interesting to ponder under what conditions this kind of argument is true, and under what conditions it is false. Essentially this argument has been used to justify why destabilizing states of matter like Ice-9 are not possible: if Ice-9 were easily possible, then it would have occurred by chance by now, and we wouldn't be around to ponder the issue6. We are, ergo it's not possible. This is a reasonable argument in some respects, but it falls down badly in one respect: the search space of configurations of water molecules is exponentially large, and random exploration only explores it extremely slowly. Even billions of years of Earth's existence is only enough to explore an infinitesimal fraction. But an adversary with a deep understanding of water won't explore the search space at random: they will instead use that understanding to focus on finding the problematic configurations. I certainly don't mean to suggest this is likely for either nuclear weapons or destabilizing states of matter. Rather, I'm making the point that this is generically true: dangerous new weapons don't usually occur by chance, they are rather the result of improved understanding directing a search.

It is not a primary purpose of this paper to argue that VWH is true. (I regard that as an open question, though it would seem to me unreasonable, given the available evidence, to be at all confident that VWH is false.)

This move is very good: one doesn't need to know the answer to a question for the question itself to have value – in this case, do we live in a Vulnerable World or a Friendly World. I've found it helpful to use the term "Friendly World Hypothesis" to mean a (very rough) converse to the VWH. One possible formulation, rather simple:

It is extremely unlikely that a set of capabilities will be attained which make it near-inevitable that civilization will be extinguished, even with humanity in the semi-anarchic default state.

In practice it's not clear to me what the best formulation for the FWH is. The simplest possibility is to make it the logical negation of the VWH, but several things go wrong with that. The benefit of such an approach is that you end up with a genuine dichotomy. But that's a minor benefit. It's better instead to pick an alternate position which is of intrinsic interest in and of itself, as I have tried to do above, and then to regard there as being a continuum of positions inbetween. I'll come back to this point below.

Three brief notes on recipes for ruin:

  1. A sketch of an advanced planetary civilization which is an unsuspecting Vulnerable World: Bostrom's "easy nukes" are in an alternate reality (as far as we know). But even in our universe it is possible to sketch planetary civilizations where this kind of scenario is possible. Consider, as a single example7, a planet with a CO2 atmosphere, and near-ambient magnesium dust grains in the environment – a sort of widespread sand, but where the sands are made of magnesium, rather than silica. On its own, magnesium is reactive and rather unstable, but suppose the grains are covered in a very thin layer of inert magnesium oxide, remnant of an earlier time when the atmosphere still had small amounts of oxygen. Ordinarily, on such a planet you wouldn't see fires – CO2 is a very good fire suppressant. However, if you were able to make a little oxygen and an intense spark you could start a fire that caused the magnesium grains to ignite, even through the thin coating layer of magnesium oxide. Magnesium burns very hot, and the fire could spread to adjacent magnesium sands, with the intensity of the fire causing the ambient CO2 to be broken down to provide further oxygen. If the magnesium sands were widespread enough, the fire would spread widely and rapidly, consuming everything in its path. This would plausibly be a situation where: (a) a recipe for ruin really would be possible; (b) it would seem extremely a priori implausible, without knowledge of the relevant chemistry; and (c) the main barrier to the catastrophic event happening would be improved understanding of that chemistry. In a nutshell: on such a planet, fire would be essentially unknown, and the possibility would seem implausible until a great deal of chemistry was understood. But once understood, such a fire would be trivial to start, and would be far, far more devastating than (known types of) fire on Earth. This kind of scenario ought to give one pause. Are there similar recipes which we on Earth are merely ignorant of?

  2. The seductive nature of such power: A skeptic might respond that it's all very well to speculate about "easy nukes" or more general "recipes for ruin". But it's not terribly convincing as mere speculation. And, as a scientist, it's very tempting to want to prove recipes for ruin exist. There is something extraordinarily seductive about these problems. Is there an easy method of antimatter production? Of making nuclear weapons? Of finding a destabilizing state of matter? Of producing a pathogen? It's all very horrifying. And yet, in respect to these and similar questions I all-too-strongly feel – and I suspect many scientists feel – something analogous to what Freeman Dyson said of nuclear weapons: "I felt it myself, the glitter of nuclear weapons. It is irresistible if you come to them as a scientist. To feel it’s there in your hands. To release the energy that fuels the stars. To let it do your bidding. And to perform these miracles, to lift a million tons of rock into the sky, it is something that gives people an illusion of illimitable power, and it is in some ways responsible for all our troubles, I would say, this what you might call ‘technical arrogance’ that overcomes people when they see what they can do with their minds.” Put another way: as a scientist it's an immensely interesting puzzle to prove that this is possible. But we ought to resist.

  3. The existence of recipes for ruin is not just a property of the laws of physics and our extant tech base: Early in my thinking I thought the existence or non-existence of recipes for ruin was a property of the laws of physics (albeit, conditioned on the existing technology level of a civilization). But this is too simplistic view. In fact, it depends upon contingent facts about the world. An illustration suffices to make the point: the "laws of biology" are not just determined by physics, but also upon some contingent (but very important, and not arbitrary) accidents in the history of the world. For instance, whether or not some molecule or complex acts as the root cause for a pandemic depends a great deal on the extant environment: among other things, how well existing immune systems can manufacture antibodies. The analogous statement holds much more generally about other classes of risk.

Aligning a liberal society with differential technology development, safety as a collective action problem, and an AI danger tax

Differential technology development (DTD) is, roughly, the idea of preferring to work on protective technologies over risk-increasing technologies. Here's Bostrom on DTD and the VWH:

While targeted [technological] regress might not be in the cards, we could aim to slow the rate of advancement towards risk-increasing technologies relative to the rate of advancement in protective technologies. This is the idea expressed by the principle of differential technological development. In its original formulation, the principle focuses on existential risk; but we can apply it more broadly to also encompass technologies with ‘merely’ devastational potential:

Principle of Differential Technological Development. Retard the development of dangerous and harmful technologies, especially ones that raise the level of existential risk; and accelerate the development of beneficial technologies, especially those that reduce the existential risks posed by nature or by other technologies (Bostrom, 2002).

The principle of differential technological development is compatible with plausible forms of technological determinism. For example, even if it were ordained that all technologies that can be developed will be developed, it can still matter when they are developed. The order in which they arrive can make an important difference – [examples]…

Correctly implementing differential technological development is clearly a difficult strategic task (Cf. Collingridge, 1980). Nevertheless, for an actor who cares altruistically about long-term outcomes and who is involved in some inventive enterprise (e.g. as a researcher, funder, entrepreneur, regulator, or legislator) it is worth making the attempt. Some implications, at any rate, seem fairly obvious: for instance, don’t work on laser isotope separation, don’t work on bioweapons, and don’t develop forms of geoengineering that would empower random individuals to unilaterally make drastic alterations to the Earth’s climate. Think twice before accelerating enabling technologies – such as DNA synthesis machines – that would directly facilitate such ominous developments. But boost technologies that are predominantly protective; for instance, ones that enable more efficient monitoring of disease outbreaks or that make it easier to detect covert WMD programs.

There is considerable overlap between Bostrom's emphasis on DTD and Vitalik Buterin's recent manifesto on d/acc. Buterin emphasizes alignment between the market and DTD, but both Buterin and Bostrom leave unsolved the problem of how to do that alignment. We don't want to leave collective safety to purely altruistic motivations: it's best if it's provided by self-interest as well. Yet many aspects of safety and security are (approximately) public goods or collective action problems8, and the market as currently constructed undersupplies them. There are many ways one might address this. I am a fan of ongoing efforts – including those by Buterin – to develop new financial instruments that help address such problems. A classic solution from economics is some form of Pigouvian taxes on any technology with negative externalities. We're beginning to see such negative externalities from AI, for example: misinformation polluting the public commons, racism and other forms of bias, and many more9. It would be possible to tax AI companies for such harms10, including any harms incurred by open source models released by those companies11. The situation is complicated, since those models, both open source and non, may also sometimes have large positive spillovers contributing to the common good. A thorough analysis of this problem seems highly desirable. One significant issue is the difference between immediate harms incurred and future harms. In this the situation has some similarities to a carbon tax; it is interesting to ponder the possibility of some sort of AI danger tax.

Friendly versus Vulnerable Worlds as a Crux in Discussion of Xrisk: are Recipes for Ruin Possible? And: A More Powerful Form of the Alignment Problem

Depending on whether you think we live in a Vulnerable World or a Friendly World you are likely to come to very different conclusions about the way the world should be governed. For most of my life I've implicitly believed the Friendly World Hypothesis, and explicitly held to the principle that12 science is on net good, benefits humanity, and should be pursued aggressively, with constraints only applied locally to obviously negative applications. We may, for example, ban things like leaded gasoline or asbestos, but we certainly don't ban the kind of basic research that forms the background against which discoveries like leaded gasoline and asbestos are made. But I take the Vulnerable World Hypothesis seriously, and it's leading me to doubt the principle of open, aggressive exploration of basic science should be applied forever. And as I begin to doubt the timelessness of that principle, I wonder what it should be replaced with, and when.

Crucially, I'd like any replacement to give us most of the benefits of liberal values and institutions, including the benefits of science and technology, but also without irreversibly exposing us to catastrophic danger. I mentioned above the problem of aligning the market and differential technology development. I suspect this is a more powerful way of expressing the Alignment Problem than in more conventional formulations13. Indeed, since "the market" is itself mutable and may be redesigned – it obtains legitimacy only insofar as it genuinely promotes the good – the Alignment Problem is perhaps best phrased as alignment between liberal values and institutions, including the market, and differential technology development. Phrased this way, a solution to the Alignment Problem would automatically entail safe development of programmable biology, ASI, and many other technologies. I struggle with the question of how to do this, but I like that it is a fundamentally optimistic place to stand.

Returning to the Vulnerable-Friendly World dichotomy, I believe this is a fundamental crux underlying many discussions of xrisk, including xrisk associated to both ASI and biosafety. And I don't think most people working on advanced science and technology have seriously grappled with this14. Indeed, many people gung ho about technology refuse to engage at all, accepting as "obvious" the not-at-all-obvious incumbent position of a Friendly World. Taking that incumbent position made historic sense as we slowly emerged out of a technologically primitive state in which even medium-scale destruction was extremely difficult. But the thrust of the last 100 years of science and technology ought to make us seriously consider that the Friendly World Hypothesis is simply wrong.

Put another way, I think a fundamental question about technology is: do you believe recipes for ruin are possible, that is: do you think inexpensive, easy-to-follow recipes for building catastrophic technologies will one day be found, given sufficient understanding of science and technology? I initially thought people would fall into one of two camps on this question: people who instinctively feel the answer is "yes", and people who instinctively feel the answer is "no". But it turns out there is a third camp: people who really don't want to engage the question. In effect, they're choosing the incumbent position, but without the bother of having to give a good argument for it.

(I've been framing this as a dichotomy between a Friendly World and a Vulnerable World. Of course, as mentioned earlier, it's actually a continuum. Nonetheless, many positions on the governance of technology implicitly and sometimes explicitly take one or the other position.)

How Intuitions about (ASI) Xrisk and the VWH are Grounded in an Individual's Background Expertise

There is a related challenge in the discussion of ASI xrisk. There are no real "experts on ASI", for the same reason the 18th century had no real experts on automobiles. But experts in diverse fields are trying to use their (different) expertises to understand the impact of AI and ASI. Economists analyze AI and ASI as an economic problem: how will it restructure labour markets? Will it concentrate wealth? Sociologists analyze it as a sociological problem: will it cause unrest? What will it do to families and relationships? Lawyers as a legal problem: how should we think about copyright in the age of generative AI? Who will be culpable for abuses? Biologists as a biological problem: what new biothreats and biomedicines will AI enable? Physicists as a physical problem: are there new types of reaction or phase of matter or material waiting to be discovered, perhaps with explosive consequences? Security experts as a security problem: what new vulnerabilities will ASI create? And so on15.

And, of course, the problem of ASI is none of these; or perhaps more accurately, it's all of these things, and much more. It affects our ability to change the universe, which transcends disciplines.

Thus, the type of problem ASI seems to pose depends in considerable part on someone's prior expertise. Even more than the details, different types of expert will bring a different type of intuition about what kinds of phenomena are latent in nature, are possible (or not), and what kinds of defenses may be worked out (or not)16. Much dismissiveness about xrisk comes from people without those intuitions: an economist, for example, simply will not have the same intuitions as a security expert, and vice versa. A physicist won't have the intuitions of an economist, and vice versa. Indeed, many people have very little intuition for emergent phenomena17 at all. A historically-minded physicist, chemist, or biologist comes to understand through repeated exposure the way very simple rules may give rise to surprising, even shockingly unanticipated phenomena – from fire to superfluidity to nuclear chain reactions to public key cryptography. This breeds a very healthy respect for both the power and the surprise latent in nature. I suspect most people have a much weaker sense of this latent power and surprise. Furthermore, I expect ASI will both discover and create new levels of emergent phenomena, governed by as-yet unsuspected rules; some types of problem will actually be entirely beyond our (present, and perhaps future) comprehension at all, involving as-yet undiscovered areas of expertise. It is also notable that many of the staunchest believers in ASI xrisk are not experts in any relevant science, but instead source their intuition in science fiction. This is a fine thing, but also something to be cautious of, since what is narratively plausible is often not the same as what is true. Nature is simultaneously more constrained and more imaginative than we.

I've been framing this in terms of ASI xrisk, but similar remarks apply to the Vulnerable World Hypothesis. The extent to which one believes (or disbelieves) the VWH depends in considerable part on intuitions acquired from one's prior expertise. This creates many issues, and is obviously not a terribly satisfactory situation, but it is worth noting.

Responding to the VWH, and Provably Beneficial Surveillance

Suppose you accept the VWH as reasonably likely to be true. How should humanity respond? Bostrom investigates and rejects a number of possible solutions, as not sufficient to the task. He eventually argues that humanity would need much stronger policing and much stronger global governance:

To the extent, therefore, that we are concerned that VWH may be true, we must consider the remaining two possible ways of achieving stabilization:

  1. Create the capacity for extremely effective preventive policing. Develop the intra-state governance capacity needed to prevent, with extremely high reliability, any individual or small group – including ones that cannot be deterred – from carrying out any action that is highly illegal; and

  2. Create the capacity for strong global governance. Develop the inter-state governance capacity needed to reliably solve the most serious global commons problems and ensure robust cooperation between states (and other strong organizations) wherever vital security interests are at stake – even where there are very strong incentives to defect from agreements or refuse to sign on in the first place.

The two governance gaps reflected by (1) and (2), one at the micro-scale, the other at the macro-scale, are two Achilles’ heels of the contemporary world order. So long as they remain unprotected, civilization remains vulnerable to a potential technological black ball that would enable a strike to be directed there. Unless and until such a discovery emerges from the urn, it is easy to overlook how exposed we are.

What would be required to stabilize such vulnerabilities is an extremely well-developed preventive policing capacity. States would need the ability to monitor their citizens closely enough to allow them to intercept anybody who begins preparing an act of mass destruction.

The feasibility of such surveillance and interception depend on the specifics of the scenario: How long does it take to deploy the black-ball technology destructively? how observable are the actions involved? can they be distinguished from behavior that we don’t want to prohibit? But it is plausible that a considerable chunk of the Type-1 vulnerability spectrum could be stabilized by a state that deploys currently available technologies to the fullest extent. And expected advances in surveillance technology will greatly expand the achievable protection. For a picture of what a really intensive level of surveillance could look like, consider the following vignette [about a "High-tech Panopticon" in which everybody is fitted with a 'freedom tag' and monitored by 'freedom officers']

In combination, however, ubiquitous-surveillance-powered preventive policing and effective global governance would be sufficient to stabilize most vulnerabilities, making it safe to continue scientific and technological development even if VWH is true.

This is, as Bostrom says several times, an extreme solution, something best considered if the VWH is highly likely to be true, and the development of some catastrophic technology is imminent. It's certainly not at all clear any benefit is worth the cost! "The good news is you don't need to worry about the world being destroyed; the bad news is you now live under Big Brother". We really ought to be extremely hesitant to accept such a regime, and, indeed, I suspect there are other ways out. (I won't discuss that here, since while it is easy to generate ideas, it's not yet clear to me which is most promising.) But I do want to think in more depth about Bostrom's proposal, and move beyond reflexive dismissal. Most of the trouble is caused by the proposal for ubiquitous surveillancex. This has come to seem anathema, for good reason – the horrendous history of the Stasi, the KGB, the purges and the Gulag, and many other institutions derived from surveillance. Orwell's "1984" was a warning, not an instruction manual, and ideas like "freedom officers" and "freedom tags" seem straight out of Orwell.

Despite this reflexive response, I think it's worth engaging imaginatively with the question: is it possible to design a surveillance system which genuinely enhances liberty? Not in some cynical, newspeak 1984 style, but truly? I know some people are so ideologically opposed to this that they will be unable to seriously explore the possibility, even as a hypothetical. To those people I advise skipping ahead to the final section.

To the others: I note that that there are some surprisingly beneficial and unmalevolent surveillance regimes already in existence. I am rather surprised at how relatively little harm Apple, Google, Microsoft and the NSA seem to have done to individuals, given the extraordinary level of information they have about their activities. Apple knows far more about me than the Stasi did about the median citizen, or even most citizens of interest. And while Apple doesn't have life and death power over me, they could legally do some pretty horrible things. And yet for the most part this seems to be extremely rare. I certainly don't mean to say abuses don't happen. But I do mean: if in 2000 you'd told me surveillance capitalism would by 2023 result in orders of magnitude worse abuses than we've seen, I would not have been surprised. I find myself quite curious why there hasn't been a much worse slide.

I most certainly do not propose the NSA or surveillance capitalism as a model to emulate. However, it does suggest two questions: (1) why do they (seemingly) do much less harm than organizations like the Stasi; and (2) is it possible by design to prevent the harms they do, while protecting against the catastrophically damaging actions?

I asked above whether safe differential technology development is possible while aligned with the values and institutions of a liberal society. If we accept the VWH and are considering surveillance seriously, then we might ask whether it is possible to develop provably beneficial surveillance? This is in response and counterpoint to calls for provably safe AI18. While provably safe AI is a stimulating idea, it seems to me to only work if ASIs are created by good actors. If bad actors start producing ASIs, then we're in a lot of trouble.

But suppose instead we replaced that goal by the goal of provably beneficial surveillance. Surveillance is much more naturally a monopoly than is computation; it is also much more naturally preventative. "Beneficial" would, of course, need to be defined, but would include many of the things we take for granted: freedoms and liberties to act of various kinds; the maintenance of many types of privacy; the ability to develop most science and technology. But it would also include the capacity to detect and suppress catastrophic risks. Might it be possible to develop a surveillance regime that simultaneously guaranteed liberties as good or better than we today enjoy, and yet which simultaneously suppresses catastrophic risks19? Put in a nutshell: why not instead shift the focus from developing safe ASI to developing provably beneficial surveillance, with safe ASI as a consequence?

Note that the use of the term "provable" here is not (quite) meant in the sense of a mathematical proof. Indeed, I don't think that's possible, since it refers to the real world, not the perfect world of mathematics. Rather, it's to suggest very strong guarantees – mathematical proofs, when appropriate, and very strong real-world safeguards, when that is20.

How to develop provably beneficial surveillance? It would require extensive work beyond the scope of these notes. It is worth noting that most existing surveillance regimes are developed with little external oversight, either in conception, or operationally. They also rarely delegate work to actors with different motives in a decentralized fashion. And they often operate without effective competition. I take these facts to be extremely encouraging: they mean that there is a lot of low-hanging fruit to work with here, obvious levers by which many of the worst abuses of surveillance may be reduced. Classic surveillance regimes have typically prioritized the regime, not humanity at large, and that means the design space here is surprisingly unexplored. It's interesting to consider messaging apps like Signal as instances of what happens when the overseeing authority genuinely attempts to prioritize users. I am likely to return to this in more detail in the future.

Miscellaneous thoughts

  1. Techno-capitalism and the transfer of power to the technologist class: Inherent in almost everything I've discussed is the idea that the solution to technology is still more technology. We may call this technological solutionism. Climate got you down? Do some geoengineering or carbon removal and sequestration. Biorisk blues? Let's have real-time vaccines! Or BSL-4 hardened cites21. Nukes still scaring you? Let's do more work on missile shields! Vulnerable World Hypothesis means you can't sleep at night? Let's get to work aligning our liberal values and differential technology development! And so on. This kind of technological solutionism is common among the many tribes of people thinking about the VWH and ASI both, from the startup world to EA to e/acc to doomer and so on. In all these views, there is an inherent transfer of power to people who have agency over technology, and away from people who don't have agency over technology. As Cory Doctorow has emphasized, technology inherently centralizes power in the designers and builders, and the penumbra of humanity around them. This transfer of power was written about memorably by H. G. Wells in his two-tier system of Eloi and Morlocks; Neal Stephenson has rendered it beautifully in "In the Beginning was the Command Line"; and Neil Postman has discussed it insightfully in "Technopoly". A partial solution may actually lie in ASI, insofar as it is a tool making technology more available and better expressing the will of any human being. But the trouble with that is that even expressing technological intent requires considerable expertise. So it is by no means clear ASI closes this gap in agency over technology; my sense is that it will, rather, enlarge it. Regardless, I am extremely uneasy about this transfer of power to the technological class.

  2. What kind of statement is the Vulnerable World Hypothesis? It's not an empirically testable statement, not in the usual sense. It's not quite a conventional axiomatic statement either. Now, you might try to argue that there's a sense in which the VWH is an empirical statement, by inventing a cheap technology that immediately wipes us out. In that sense the VWH (actually: the FWH) is falsifiable. Even that wouldn't show that it was near-inevitable however. Moreover, the importance of the VWH is at times at which it has not been empirically tested. It is, after all about not just current technology, or near-future technology, but about all possible future technologies – a space far greater than we can possibly hope to explore22. In fact, it's not just individual technologies, but entire technia23, since the destructive power of a technology is not purely intrinsic. It instead depends on the relationship to other technologies and the complete environment: fire on its own is bad, but becomes much easier to deal with when fire-retardant materials and firefighting technology and institutions and laws are widely available. In that sense, the VWH is about which technia are accessible, given the governance24 choices we make, and whether it is likely we will pass through technia in which catastrophe is almost inevitable. But while the VWH is not obviously empirically testable in the usual sense, that doesn't mean we cannot regard it as more or less plausible, on the basis of knowledge acquired. Knowledge of things like fire, nuclear weapons, and pandemics ought to give us pause. These things were likely a priori unsuspected by human beings – they are, in some sense, emergent phenomena, hard-to-predict in advance – a fact which ought to make us very cautious about the VWH.

Acknowledgments

Thanks to Damon Binder, Laura Deming, Andy Matuschak, Evan Miyazono, and Andrew Snyder-Beattie for related conversations.

Footnotes


  1. Perhaps most notably: Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014).↩︎

  2. Michael Nielsen, "Notes on Existential Risk from Artificial Superintelligence", https://michaelnotebook.com/xrisk/index.html (2023).↩︎

  3. Michael Nielsen, "Brief remarks on some of my creative interests", https://michaelnotebook.com/ti/index.html (2023).↩︎

  4. See: "What Nuclear Weapons Delivery Systems Really Cost", https://web.archive.org/web/20231020161223/https://www.brookings.edu/what-nuclear-weapons-delivery-systems-really-cost/↩︎

  5. See Michael Singer, David Weir, and Barbara Newman Canfield, "Nuclear Nightmare: America's Worst Fears Come True" (1979).↩︎

  6. I first heard this argument in grad school, but don't know where it was first written down, or where the strongest form of the argument has been made. Google shows one reasonable form of the argument here: https://www.askamathematician.com/2012/11/q-could-kurt-vonegets-ice-9-catastrophy-happen/↩︎

  7. I have not done the detailed calculations necessary to check this, apart from a few basic sanity checks about the way magnesium oxidizes, and the way magnesium burns in CO2. It would be an interesting challenge to do detailed calculations, and if those checked out, to do the experiments to test this. Note that while those experiments would require considerable individual safety precautions, on earth they wouldn't pose any significant large-scale risks.↩︎

  8. There are several senses in which this is the case. Speaking loosely and all-too-briefly, one axis is the following: for ASI development, everyone needs to refrain from doing something which is in their (short- and maybe longer-)term interest, but which might well be devastating to everyone else. That is, everyone individually has an incentive to participate (insofar as they can) in the creation of AGI and ASI, even if that proposes an enormous collective risk.↩︎

  9. Many people are concerned about both long- and short-term risks from AI and ASI. However, I have met a few people concerned about long-term risks who are dismissive about short-term risks. I believe this is a (terrible) mistake. I don't think anyone who doesn't want to bend over backward to limit immediate harms from AI can be trusted to address long-term harms.↩︎

  10. I don't exactly anticipate enthusiasm for this among the less insightful AI boosters; the more insightful, of course, are seriously engaged with safety, and might be expected to engage more thoughtfully.↩︎

  11. A challenge is not-for-profit projects, perhaps organized by unfunded (or near-unfunded) ad hoc and perhaps even anonymous groups. I expect both that such projects will exist in future, and will do substantial harm. Still, it is perhaps best to separate concerns, and to analyze and (perhaps) deal with that possibility separately.↩︎

  12. The remainder of this paragraph is adapted from: Michael Nielsen, "Brief remarks on some of my creative interests", https://michaelnotebook.com/ti/index.html (2023).↩︎

  13. Allow me to mention Brian Christian's excellent book "The Alignment Problem" as an introduction to thinking on this topic.↩︎

  14. It remains a curious fact that the best cases against xrisk from ASI come from people who are themselves very concerned about xrisk from ASI. The two I have in mind are: Katja Grace, Counterarguments to the basic AI x-risk case (2022); and David Scott Krueger, A list of good heuristics that the case for AI x-risk fails (2019). I'm yet to read a good case from someone who believes that there is very little xrisk from ASI. I suspect most such people simply haven't considered the possibility seriously, but prefer the default posture ("scientific advance is by default good"), which is why their dismissals tend to be so low quality. Incidentally, it is also true that many people worried about ASI xrisk understand poorly the case against it. Motivated reasoning is common on all sides of this discussion.↩︎

  15. Complementing each of these is also a focus on the opportunities of ASI. Again, what people see depends on their background. But that is not my main focus in connection with the VWH.↩︎

  16. This is especially interesting when an offensive technology may come from one type of expertise, but the relevant defense from another. One sees this often in current analysis of climate risks: it is common for paleoclimate experts to have a good understanding of climate change (on which they are an expert), and yet to apparently be rather uninformed about the possibilities for carbon capture and sequestration, which may require quite different expertise. This has changed over time, as knowledge has diffused across disciplines, but it seems to me to have been a real effect.↩︎

  17. I am using the term "emergent" here in the sense of Phil Anderson, not the (in many ways much weaker) sense which has recently become common in AI. This overlap in terminology is somewhat unfortunate for the point I'm making, since many of the AI people using the term emergent have, as far as I can tell, little intuition for the original form of emergence.↩︎

  18. Notable proposals include: Stuart Russell, "Provably Beneficial Artificial Intelligence", https://people.eecs.berkeley.edu/~russell/papers/russell-bbvabook17-pbai.pdf; and: Max Tegmark and Steve Omohundro, "Provably safe systems: the only path to controllable AGI", https://arxiv.org/abs/2309.01933 (2023). Ideas in a similar vein are being pursued by the Atlas Computing Initiative, and likely others of whom I am not aware.↩︎

  19. Cf: David Brin, "The Transparent Society" (1998); Bruce Schneier, "The Myth of 'The Transparent Society'" (2008); and Steve Mann, Jason Nolan, and Barry Wellman, "Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments" (2003).↩︎

  20. I considered many variations, along the lines of "Guaranteed-to-be-beneficial surveillance". They have an interesting problem, which is that words tend to lose some of their power over time. I felt that "provable" was the best compromise here.↩︎

  21. This is discussed in: Carl Schulman, "Envisioning a world immune to global catastrophic biological risks" (2020). Thank you to Andrew Snyder-Beattie and Damon Binder for pointing this out to me.↩︎

  22. It's not strictly relevant, but occurred to me while writing these notes: most technologies will never be discovered. This seems inevitable in a world where the search space for material and information objects is exponentially large, and NP-complete problems are likely to be intractable even on any foreseeable computer. Put another way: it seems likely there are many powerful technologies the universe affords, but which will never be discovered, no matter how much intelligence or understanding or computational power is brought to bear. It's as though the universe's capacity for technology – the set of possible affordances for the universe – is strictly larger than its capacity for discovery. I find this an extraordinary thought, and yet it's almost obviously true.↩︎

  23. The plural of Kevin Kelly's useful term technium, the entire accumulation of technologies that humans have created.↩︎

  24. There is, of course, a very long history of proposals for global governance and even a unipolar world order. Three that I've enjoyed recently are those of Oppenheimer, von Neumann, and Brand; all are in response to catastrophic or even existential risk.↩︎