Anthropic reasons for digital mind scepticism?

Published 28 April 2024

Back to writing
Contents (click to toggle)

14,673 words • 74 min read


Here’s a thought. If digital minds eventually vastly outnumber real humans, why aren’t we them?

I think that question can be read in two ways. First, we might accept that there will be many digital minds, but doubt whether we are in fact not-digital humans poised to create them, because in that case not-simulated experiences like ours would be far outnumbered by simulations of the real-seeming predicament we appear to be in, such that we can’t tell the difference. And if we are in one of those simulations, then presumably we are in a position of much less influence over digital minds.

Second, we might accept that we are the real and not-digital humans we appear to be, but use that as a reason to doubt that digital minds eventually vastly outnumber humans, on pain of finding ourselves in an inexplicably surprising — and weird — minority. This is similar to the ‘doomsday argument’ for believing that future humans will not vastly outnumber humans to date.

Either way, we might therefore doubt that we are really poised to create or influence many digital minds, despite appearances. And that might be relevant for (e.g.) how much we prioritise efforts to make sure things go well for digital minds we create.

In this post I try to figure out whether those arguments work. Ultimately, I think the answer is a qualified ‘no’. The main reason is that I think neither argument works, at least not decisively. But even if they worked to significantly undermine belief in our influence over digital minds, I think there remain good reasons to act as if we will have that influence.

This doesn’t need to be read in full — the parts on the simulation argument, and whether it would be surprising to be outnumbered by digital minds, can be read separately.

The intuitive arguments

Suppose it is soon possible to cheaply create digital minds, perhaps with experiences like our own. Then in the fullness of time, digital minds would greatly outnumber human minds. But if so, how come you’re not one of them? And shouldn’t that sense of surprise make the assumption feel less likely somehow?

This feels vague and confused, but it does raise some questions I find interesting.

First: to the extent you are concerned about the possibility of ‘simulating’ large numbers of humans, you might wonder about the grounds on which you can assume you are not simulated. But then you might think that most simulated humans can’t run their own simulations inside their simulated worlds. So either there are very few simulated humans, or otherwise worlds in which simulated humans are hard to instantiate outnumber worlds in which they are easy to instantiate, and in either case you might feel compelled in the end to think that you are not in a world that readily admits of creating simulated humans.

Or, second: even if many of the digital minds we could create wouldn’t be simulated humans, and would recognise themselves as digital minds; wouldn’t that still make it pretty weird that we then find ourselves in the tiny minority of non-digital minds? Aren’t we supposed to assume that we’re not special like that?

Ok. Let’s see if these thoughts lead anywhere.

The simulation argument

Nick Bostrom’s simulation argument (SA) argues that at least one of the following is true:

  1. The human species is very likely to go extinct before reaching a “posthuman” stage;
  2. Any posthuman civilisation is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof); so-called ‘ancestor simulations’;
  3. We are almost certainly living in a computer simulation (the ‘simulation hypothesis’)

If (1) and (2) are false, and the argument works, then (3) must be true, viz. that we are almost certainly living in a computer simulation. Bostrom’s reasoning here is based on the fraction of ‘human-type experiences’ that would then be simulated. Picking up the argument part way through with some embellishment, we begin with some notation:

The fraction of all observers with human-type experiences that live in simulations (fsimf_{sim}) is then:

fsim=fPNˉHˉfPNHˉ+Hˉ=fPNˉfPNˉ+1f_{sim} = \frac{f_P \bar{N} \bar{H}}{f_P N \bar{H} + \bar{H}} = \frac{f_P \bar{N}}{f_P \bar{N}+ 1}

Now let fIf_I stand for the fraction of posthuman civilisations interested in running ancestor simulations, and let NˉI\bar{N}_I stand for the average number of such simulations run by ‘interested’ posthuman civilisations, s.t.

Nˉ=fINˉI\bar{N}= f_I \bar{N}_I


fsim=fPfINˉIfPfINˉI+1f_{sim} = \frac{f_P f_I \bar{N}_I}{f_P f_I \bar{N}_I + 1}

Now Bostrom argues that NˉI\bar{N}_I is extremely large, since a posthuman civilisation interested in running ancestor simulations is likely to have at its disposal very large amounts of computing power — enough to simulate many civilisations’ worth of experiences per simulating civilisation. And if NˉI\bar{N}_I is extremely large, then at least one of the following options must be true, corresponding to the options above:

  1. fP0f_P \approx 0
  2. fI0f_I \approx 0
  3. fsim1f_{sim} \approx 1

Let’s rule out (1) fP0f_P \approx 0 and (2) fI0f_I \approx 0. That is, let’s assume that (1) non-simulated civilisations at the level of technological advancement of our own do have a non-negligible shot at running ancestor simulations (among other kinds of simulation); and (2) a non-negligible fraction of such civilisations which become able to run such simulations choose to do so. It follows that almost all observers with human-type experiences are simulated.

An indifference principle

That doesn’t yet establish that we are most likely simulated. For that we also need some kind of “indifference principle”. Something like this general principle:

Given a way to divide some (finite) class of observers into one or more sub-class such that the observers have no evidence that they are more or less likely to be in a particular sub-class relative to any other observer in that initial class, then each observer’s credence that they are in a particular sub-class should be equal to the number of observers in that sub-class as a fraction of all observers in the initial class.

This is probably too wordy. But hopefully the thought is clear, and Bostrom explains how it cashes out:

[I]f we knew that a fraction xx of all observers with human-type experiences live in simulations, and we don’t have any information that indicate that our own particular experiences are any more or less likely than other human-type experiences to have been implemented in vivo rather than in machina, then our credence that we are in a simulation should equal xx.

(sidenote: A small note here: I think this is clearest if we get to assume that the observers know fsimf_{sim}, otherwise it’s a little confusing what it means to say their credence that they are simulated should equal fsimf_{sim}. But in the case where the observer doesn’t know fsimf_{sim}, perhaps we’re saying that a rational observer should bring their credence into line with their best guess of fsimf_{sim}. )

Cr(Simfsim=x)=x\text{Cr}(\text{Sim}\mid f_{sim} =x)=x

Note that the general indifference principle I give, and the specific claim Bostrom makes, needn’t only apply to (sidenote: I also deliberately assume having a way to count observers to avoid paradoxes where the measure is ambiguous, like Bertrand’s paradox.) — they can accommodate observers with different experiences, so long as those experiences don’t offer evidence about being simulated or not beyond just knowing fsimf_{sim}.

If you can run simulations then maybe you’re in one?

Suppose you’re quite concerned that we may soon build (or we have already built) AI systems which count as ‘digital minds’. In the language of the simulation argument, I think you’d agree that (1) non-simulated civilisations at the level of technological advancement of our own do have a non-negligible shot (sidenote: Maybe ‘ancestor simulations’ may be unnecessarily restrictive: there might be other kinds of simulated observers with experiences such that those experiences do not offer evidence that they are more or less likely to be non-simulated among simulated and non-simulated minds including our own (in other words, digital minds that don’t know how to specifically tell they’re not human, and we don’t know how to specifically tell we’re not them).) (among other kinds of simulation); and (2) a non-negligible fraction of such civilisations which become able to run such simulations choose to do so. In other words, it seems fair to move between “it seems pretty damn likely that this kind of civilisation will end up doing things like ancestor simulations” to “a non-negligible fraction of similar civilisations will”. Let’s call people inside ancestor simulations simulated humans.


But notice that we’ve reached a point where if we (i) buy SA, (ii) take the prospect of digital minds ‘seriously’, and (iii) agree that taking digital minds seriously implies thinking civilisations like ours run many ancestor simulations, then (by the structure of SA, above) it looks like we’re moved to accept that we are likely to be simulated ourselves.

In fewer words:

  1. Assume fP>0f_P > 0, fP≉0f_P \not\approx 0
  2. Asume fI>0f_I > 0, fI≉0f_I \not\approx 0
  3. Hence fsim1f_{sim} \approx 1

Of course, one response for the person who is (sidenote: For instance, someone who sees the prospect of digital sentience as a reason to care a lot and maybe work to improve how the development of artificial sentience goes, or prevent certain kinds of artificial minds from being developed at all.) here is to reject SA. But another response might be to bite the bullet, and accept that we’re likely in a simulation if digital minds are easy to make, but not to change your mind about it being really important to make sure the minds we simulate are looked after.

Emulate or be emulated?

But maybe that bullet-biting thought is missing something; roughly that (according to SA) not only are you likely in a simulation if you assume a non-negligible fraction of not-simulated worlds like the one you observe run ancestor simulations, but also that you should expect running the same kind of simulations in your world to be very difficult or impossible. That is, not only should most human-like observers expect to be simulated, but also most should not expect to be able to simulate many more human-like experiences.

Here’s a simple model where what I just said is true. Suppose that you just can’t run simulations of human-like experiences inside of simulations. Then all observers are either not simulated, in which case they can run simulations, or they are simulated, in which case they cannot run simulations. SA effectively says that either (i) simulations of human-like experiences are very rare, or (ii) the large majority of observers are simulated (and cannot themselves run sims). In either case, you — finding yourself with human-like experiences — should think it is very unlikely that running simulations of human-like experiences in your world is impossible.

We could say:

Now call simulating human-like observers ‘difficult’ for you just in case either (i) you’re not simulated, but fewer human-like observers will be ever be eventually simulated than non-simulated people, or (ii) simulation is impossible because you’re already simulated. Call the chance simulating human-like observers will be difficult for you is just SdiffS_\text{diff}.

Assuming you are randomly drawn from all human-like observers, then:

Sdiff={1if Hˉ>SˉSˉSˉ+Hˉif Sˉ>HˉS_{\text{diff}}=\begin{cases} 1 & \text{if }\bar{H}>\bar{S} \\ \frac{\bar{S}}{\bar{S}+\bar{H}} & \text{if }\bar{S}>\bar{H} \end{cases}

Notice Sdiff>12S_{\text{diff}}>\frac{1}{2} always, no matter Hˉ\bar{H} and Sˉ\bar{S}. Also notice Sdiff1S_{\text{diff}}\approx 1 whenever HˉSˉ1\frac{\bar{H}}{\bar{S}} \gg 1 or SˉHˉ1\frac{\bar{S}}{\bar{H}} \gg 1, in other words when simulated human-like observers are either very rare or very common.

Levels to it

Clear enough; except I’m not sure why we should assume that human-simulations-inside-simulations are strictly impossible. In particular, if you think (digital, computer) simulations of human experiences are possible at all, then you’re probably sympathetic to a computational view of mind, roughly that for any conscious experience, there is some computational process (being agnostic about how it’s implemented) sufficient to give rise to that experience, or which just is equivalent to it. And you probably think that that every effective computation can be implemented by a Turing machine, and surely any simulated environment home to observers with full-blown conscious experiences is Turing complete, in the sense that if its inhabitants can interact with their simulated world at all then they can build computers in it.

But presumably there are limitations to how many simulated minds within simulated worlds within simulated worlds with simulated minds — and so on — you can nest. These are resource limitations. Of course, if you begin with finite memory or a finite number of operations you can do, then there’s a limit to the amount of anything you can simulate. If each simulated world itself contains on average more than one simulated world, then geometric growth in the number of simulated ‘worlds’ with levels of simulation is going to eat up that budget. More pointedly, we also tend to think that simulating some computation within a simulated world is more computationally expensive than just running the computation more ‘directly’.

Think about those people who build programmable computers inside of Minecraft. Those simulated computers have enough memory to add two single digit numbers in just the same way real computers can. Except, of course, the computer running Minecraft could add those same numbers far quicker by adding them ‘directly’. With enough time and memory, a computer within Minecraft could indeed run Minecraft, and within that Minecraft-within-Minecraft game a computer could be built to find the prime factors of 12. But, modulo some complications, we might imagine that the slowdown factor between base and simulated Minecraft should be similar to the slowdown factor between the base computer and base Minecraft.

And of course we can keep going with that. This incredible website shows how the Game of Life can be emulated, recursively, within the Game of Life.

Notice in the above video how each level up you rise, each time many cells zoom out to reveal they are part of a larger cell and a larger Game, the slower the new game relative to the clock speeds of the ‘base layer’ (if there is one).

Now, I read Bostrom’s original SA argument as just talking about the number of non-simulated compared to simulated human-type experiences, and not talking about what fraction are (sidenote: If the human species is decently likely to run ancestor simulations, then whether or not those simulations in turn run simulations (and so on…) you can always conclude you are likely in a simulation.).

But we could finesse the original argument to see what it might say about where in this stack of sims-within-sims you should expect to find yourself.

To begin with, we might guess that the original argument applies at every ‘level’ of simulation — a kind of ‘relative’ variant on the simulation argument. The first level encompasses just those observers in ‘base’ reality but not simulated by it; the second encompasses just those observers simulated by base reality but not simulated within any simulations, and so on.

A simple model:

Then the number of simulated human-like experiences would be:

N=1Hi=1Nfi=H(f1+f1f2+f1f2f3+)\sum_{N=1}^{\infty} H \prod_{i=1}^N f_i = H(f_1 + f_1f_2 + f_1f_2f_3 + \cdots)

Assuming fNf_N is a constant ff:

N=1Hi=1Nf=H(f+f2+f3+)\sum_{N=1}^{\infty} H \prod_{i=1}^N f = H(f + f^2 + f^3 + \cdots)

The sum to infinity of a geometric series is just:

S=a1rS_{\infty} = \frac{a}{1-r}

So the number of simulated human-like experiences is:

N=1Hi=1Nf=Hf1f\sum_{N=1}^{\infty} H \prod_{i=1}^N f = \frac{Hf}{1-f}

Now in the original argument, Bostrom is essentially arguing that f1f_1 is either (1) very large (f11f_1\gg1) or (2) very small (f10f_1\approx 0). But what if, as assumed, ff is constant for every layer? Then the ‘relative’ variant on the simulation argument might go something like: for any layer of simulation including base reality, one is true:

  1. Civilisations in that layer are extremely unlikely to themselves eventually run a significant number of simulations of human-like experiences (f0f\approx 0);
  2. Almost all observers live in simulations relative to the layer being considered (such that you are almost certainly simulated relative to this layer) (f1f\gg1)

Option (1) (sidenote: Although needn’t be true for the reason Bostrom argues it would be true, namely that either almost no (in this case simulated) human-level civilisations would survive to post-humanity, or almost none would choose to simulate. Perhaps instead, on this ‘constant slowdown factor’ model, a simulated post-human civilisation might simply find it technologically difficult somehow.) on this extended version of the argument. But, of course, option (2) can’t be right. ff can’t be a constant >1> 1 at pain of always expecting to be ‘more simulated’ than any layer you’re considering, and proliferating experiences and observers without bound.

ff could also hover in the 0.5<f<10.5 < f < 1 range, in which case most human-like experiences would be simulated despite the total number of experiences being finite. Though I do think, given the many orders of magnitude that ff could span over, 0.5<f<10.5 < f < 1 would feel surprising. Note also, whenever f<1f < 1, the ratio of simulated experiences to ‘same level’ experiences with respect to some level of simulation (call it fnf_{\geq n}) would equal the ratio of all simulated to all non-simulated experiences (f0f_{\geq 0}):

fn=f0=f1ff_{\geq n} = f_{\geq 0} = \frac{f}{1-f}

And this is a somewhat cute result, that on this model, when 0.5<f<10.5 < f < 1 any observer can expect more simulated experiences relative to their ‘level’, and that ratio doesn’t depend on whether they are themselves simulated.

But It’s worth pausing here. On my model, I’m assuming it’s about as hard to simulate minds within simulations (including ‘ancestor simulations’) as it is to do them in the real world. So if simulations are cheap to run in the real world, then they are cheap to run within those simulations. Furthermore, we have good reasons to think that simulations plausibly will be cheap to run in the real world. But that implies they will be cheap at every level, and the number of simulated levels grows without bound. Something is clearly confused here. What gives?

I see two options. The first is to think that simulations are very cheap in the real world, (sidenote: Imagine some future version of Minecraft where the non-playable characters or animals in the Minecraft world are complex enough and appropriately designed to count as ‘observers’ with ‘experiences’. For roughly the reason Bostrom gives in his original paper (roughly, silicon-based computation is cheaper and easier to scale than brains) I think it’s fair to say that we could bring more Minecraft chicken experiences into the Minecraft world than real chicken experiences into the real world. But, within the Minecraft world(s), we should not expect to be able to build computers within the game able to simulate more Minecraft chickens than base-Minecraft chickens — after all, computers in Minecraft are necessarily just vastly bigger than the chickens.
Alt text
. This would be a fundamental difference between the real world and a simulated world: where there is an efficient way to simulate certain experiences in the real world, but not in the simulated world. In the model above, this would be like setting f01f_0\gg1, but 0<fn10 < f_n \ll 1.

But examples like that violate an assumption from the original simulation argument, namely that simulated human-like experiences are — by stipulation — indistinguishable from the perspective of those experiences from non-simulated human experiences (or indeed real non-human experiences). For instance, a character in a Minecraft-like simulation would have a way to test whether they are being simulated, viz. to try to build a computer running Minecraft faster than realtime.

What gives? Well, there is a second option, which is just to drop the assumption of a constant slowdown factor (which ff proxies for). In particular, we could imagine it being the case that people-within-simulations notice they seem close to creating simulated people cheaply, which could come to outnumber them. And indeed this is what I think you should picture when you picture an ‘ancestor simulation’: simulated humans in a simulated world, some of whom are beginning to notice that digital minds could soon proliferate. But also perhaps those simulated humans could succeed in creating digital minds within simulated worlds. And perhaps those sim-within-sim minds do eventually come to outnumber the simulated humans. There is no paradox here as long as things either stop or reach a limit eventually — once the (assumed) finite computational resources of the base-level simulation are exhausted. But before that point, more base-level resources could be spent on upholding the simulations-within-simulation than the simulated humans: a ‘top-heavy’ world that grows until it can’t grow any longer.

I think this second option is the better reply to the apparent tension between “ancestor simulations should always seem cheap to run from the perspective of a simulated human-like observer” and “computational resources are finite”.

Unfortunately, that means that we can’t rely on cute models like the constant-ff model I suggested above.

Do stakes and likelihood cancel out?

For simplicity, let’s loop back to an extremely simple two-level model. Let’s say there are N¬SN_{\neg S} non-simulated human-like observers (viz. real humans) and NSN_S simulated observers, which cannot run simulations. Let’s assume, finding yourself with human-like experiences, your credence in being in either group given the relative fraction of that group among all observers is just that fraction:

Cr(SimNSN¬S+NS=x)=x\text{Cr}\Big(\text{Sim}\Bigg|\frac{N_S}{N_{\neg S}+N_S}=x\Big)=x

Cr(¬SimN¬SN¬S+NS=x)=x\text{Cr}\Big(\neg \text{Sim}\Bigg|\frac{N_{\neg S}}{N_{\neg S}+N_S}=x\Big)=x

Now let’s use Inf\text{Inf} to stand for the number of simulated human-like experiences as a fraction of the number of observers in your layer, such that if you are already simulated, then Inf=0\text{Inf}=0, else Inf=(NS)/(N¬S)\text{Inf}=(N_S)/(N_{\neg S}). I’m calling it Inf\text{Inf} because you could reasonably think of this ration as a proxy for how much influence you have over simulated human-like experiences. Assume all non-simulated observers have influence over all simulated observers in proportion to the ratio of simulated to non-simulated observers. So: if you are not simulated, but there are or will be a large number of simulated observers, then you have a lot of influence. You can think of Inf\text{Inf} as the number of simulated observers per person from your perspective.

Using ¬S\neg S to stand for “I am not simulated” and SS for “I am simulated”. So:

Inf={0if SNSN¬Sif ¬S\text{Inf}=\begin{cases} 0 & \text{if }S \\ \frac{N_S}{N_{\neg_S}} & \text{if }\neg S \end{cases}

Of course, by stipulation, you don’t know whether you’re simulated or not. But let’s imagine you do know NSN_S and N¬SN_{\neg S}. Then your credence in being not simulated Cr(¬S)\text{Cr}(\neg S) is just:

Cr(¬S)=N¬SNS+N¬S\text{Cr}\big(\neg S\big)=\frac{N_{\neg S}}{N_S+N_{\neg S}}

Finally, we can represent your expected influence with respect to simulated minds, E(Inf)\mathbb{E}(\text{Inf}):

E(Inf)=Cr(¬S)NSN¬S\mathbb{E}(\text{Inf}) = \text{Cr}(\neg S)\frac{N_S}{N_{\neg_S}}

=N¬SNS+N¬SNSN¬S=NSNS+N¬S=Cr(S)=\frac{N_{\neg S}}{N_S+N_{\neg S}}\frac{N_S}{N_{\neg_S}}=\frac{N_S}{N_S+N_{\neg_S}}=\text{Cr}(S)

So, on this model, the expected number of simulated observers per person from your perspective is equal to the fraction of all observers who are simulated, which is also (given some kind of indifference principle) your credence that you are simulated.

Therefore, E(Inf)<1\mathbb{E}(\text{Inf})<1 no matter how many simulated and non-simulated people there are, and E(Inf)\mathbb{E}(\text{Inf}) is highest when it is least likely that you are non-simulated.

In this sense, the less likely it is that you are not simulated, the more influence you would have if you weren’t, and here these two factors more or less cancel out. Of course this is a (sidenote: In particular, I’d be interested to see it extended to multi-level models.), but I think (sidenote: Something like: yes, credence in the prospect of digital minds might make it less likely you are not already a simulation of a human, but that can’t push down the expected value of trying to influence human simulations below some threshold.).

Taking stock

Here’s what I see as the back-and-forth so far:

Finally, I’ll note that I now think the simulation argument is a little confused, at least on Bostrom’s original framing. The core of the confusion is in the choice of ‘reference class’ from which we assume we’ve been sampled. We can’t just say “these exact experiences” because probably you are the only person with your exact experiences, so there’s no statistical argument about how most of your experiences are simulated. But then can’t I just say, “Sure, sims will have ‘human-like’ experiences, whatever that means. But they won’t be exactly these experiences. So why is it relevant that most ‘human-like’ experiences are simulated? Why am I supposed to feel forced into being uncertain whether I’m a sim or not?”. It is surprisingly unclear how to resolve this wrinkle, although ultimately much of the core intuition survives. If you’re interested, there’s a very good discussion in Joe Carlsmith’s article on simulation arguments.

Is it surprising to be outnumbered?

I hope these are valid points, but remember that the original motivation for all this discussion was a vague perplexity along the lines of “if digital minds will soon outnumber human ones, why aren’t we them?”. That is, “wouldn’t it be somehow surprising, or unlikely-feeling, if we were humans in a position to create and influence digital minds which outnumber our own?”.

Remember that all these points came from inside a particular framing suggested by Bostrom where we only consider specific kinds of sims; namely ‘ancestor simulations’. That specific focus makes it easier — possible, even — to talk about ‘indifference principles’ between different populations with relevantly similar observations. In particular, imagining that the sims in questions cannot tell whether they are in a simulation makes it meaningful to ask whether we are sims.

But for reasons I’ll explain I think that might be too narrow a framing for the question we’re looking at. So unfortunately I don’t think they’re close to the whole story.

Two points here. First, just note how this narrow focus on ancestor sims sometimes gives the false impression that that most sims (as in something like ‘coherent conscious digital experiences’) are ancestor sims. But Bostrom’s simulation argument doesn’t rely on that assumption or argue for it, and I can’t see why it would be true. In fact, when I imagine a world in which digital minds outnumber biological human minds, I sometimes think of emulated human brains interacting with the real world (not a simulated world), and I sometimes think of a much wider variety of minds — vast and small, discrete and tightly networked, familiar and grossly alien, interacting with one another, with humans, with the physical world, with their own various simulations. “Simulated humans in a simulated human world” is surely a small slice of that space of feasible digital minds.

Second, recall that motivating question: “if digital minds will soon outnumber human ones, why aren’t we them?”. That question is ambiguous. You could read it as (i) “How do we actually know we aren’t (sidenote: That is, specific kinds of digital minds.), despite appearances to the contrary?”. That’s the question which I think the simulation argument framing gets at. But you could also read the question as (ii) “Let’s say we’re not digital minds. Still, if we do eventually create digital minds which vastly outnumber us, wouldn’t it be kind of surprising or weird that we do in fact find ourselves as non-digital minds? And does that tell us anything about the likelihood that we do eventually create many digital minds?”. That’s the question I want to think about now.

What we have to work with here is just a vague feeling of weirdness, not a precise argument. In my head it is the same kind of weirdness that Elon Musk might feel upon waking up and looking in the mirror. Wouldn’t it simply feel kind of weird, or fishy, or surprising to look in the mirror and see Elon Musk look back? Maybe there is some upshot to this realisation for his beliefs, in particular maybe he should feel some degree less confident that things are as they seem. But let’s think about that initial sense of weirdness.

What counts as weird?

One difficulty here is figuring out exactly whether and why it is legitimate for Elon Musk to feel more surprised at his predicament than most other people. After all, you might think the chance that any living person finds themself as that person, given that they are some living person, is always roughly one in eight billion. Similarly, if you shuffled a pack of cards and found them perfectly sorted in order from aces through kings, that would feel more surprising, or weird, than some more “random-looking” deal, though no particular “random-looking” deal is more likely than the perfectly sorted deal after a fair shuffle. Every particular deal is one of 52!52! equally likely deals, after all. Yet I think it certainly would feel ‘weirder’ to deal a perfectly ordered deck, or to wake up as Elon Musk; like more of a cosmic coincidence.

If this weirdness thing sounds vague, then I agree! Maybe it can be made more precise, maybe not. But I’ll point out that if this feeling of weirdness confronts you, you have a few options:

  1. Surprised acceptance. One living person in roughly eight billion just has to accept that they are Elon Musk. Presumably some people will shuffle playing cards into some kind of rare, bizarre, coincidental order. Similarly, most people don’t win the lottery, but some do, and those people don’t require much evidence to end up believing that they just got lucky — a triple-check of their winning numbers should do it. Lottery winners and Elon are kind of just forced to be surprised in this sense.
  2. Indifferent acceptance. I see this as a variant of acceptance. You accept that things are as they seem. But you also deny that you are ‘forced to be surprised’, or that this outcome is somehow ‘weird’ in any deep sense.
  3. Denial. You deny that things are as they seem, on the grounds that things would be too weird (somehow) if they were.

Often one of these options does feel more appropriate to me than another:

Notice how the first and second examples aren’t all that philosophically interesting. In both cases, you got a feeling of “huh, that’s surprising”, and then figured out what that feeling was tracking. In the magic case, you already had a strong prior that the magician was going to trick you. Once you conditioned on seeing her prediction matched the card you ‘chose’, you correctly infer the most likely explanation; that she did not predict your free selection. In the poker case, on seeing your royal flush, you might upweight the possibility that the card dealing is somehow unfair, but not enough to change your mind that you got a royal flush. In both cases, you broadly know what’s going on in terms of how to update your beliefs in light of the evidence in front of you.

In the card shuffling example, your friend is correct that the chance of your particular deal was 1 in 52!52!, but his reaction of incredulity seems inappropriate. In particular, you see no special reason to update any of your broad beliefs about what’s going on, and no reason to doubt that you got the deal you appear to have dealt. Similarly in the stargazing example, your friend’s attitude seems less comically inappropriate — he’s gesturing at something — but you don’t see why or how you should update any of your beliefs based on his observation, and it’s unclear he’s even suggesting you should.

My question is how we should react to our apparent predicament with respect to digital minds, namely that it looks like we might eventually create very many of them. Should we think that it is somehow ‘weird’ or ‘surprising’ at all that we appear not to number among those digital minds? And if so, should we update our beliefs about that initial appearance?

Now the question is almost exactly the same, structurally speaking, as the notorious ‘doomsday argument’, roughly that if humanity survives a long time then we’d number among the very earliest people, and wouldn’t that be kind of weird? Again, here we have an intuitive “kind of weird” notion. If we ordered all human lives from the past to the future and assume you are randomly selected from the order, it would be just as unlikely that (sidenote: Imagine humans living roughly in the middle of all human lives with no knowledge of human history beyond the last few preceding generations. They might wonder, “Some claim humanity has both a vast past behind it, and a vast future ahead of it. But if we are just as likely to have found ourselves as a human in the past or in the future, isn’t it weird that we find ourselves in this narrow slice of living humans? Isn’t that unlikely enough to reject the claim?”), but the latter would seem especially weird. Similarly, it might seem weird that we’re not digital minds if we soon make many of them, but it’s not at all clear what that is supposed to mean for whether we should believe that we will in fact make very many of them.

Rooms with buttons

Here’s another way of seeing the difference between the simulation argument framing, and the present framing:

Rooms with buttons: There are one hundred cubicles, each with a person inside, and a button on the wall. For every button, pressing the button creates one hundred identical rooms with a person and a button, but the buttons are ineffective. Otherwise the cubicles are identical, and you know that most people will press the button. You find yourself in a room facing a button. Will pressing it create one hundred people?

Of course this is supposed to mirror the original, simulation argument-derived line of thought. Here I suggest the answer is “probably not”. There are plenty of variants of this kind of thought experiment in the small literature on ‘anthropic’ principles, but this seems to me like one of the easy cases. (sidenote: Imagine a universe that consists of one hundred cubicles. In each cubicle, there is one person. Ninety of the cubicles are painted blue on the outside and the other ten are painted red. Each person is asked to guess whether she is in a blue or a red cubicle. (And everybody knows all this.)Now, suppose you find yourself in one of these cubicles. What color should you think it has? Since 90% of all people are in blue cubicles, and since you don’t have any other relevant information, it seems you should think that with 90% probability you are in a blue cubicle.)

But then comes the obvious point that most digital minds will probably be able to identify themselves as such. That suggests something like:

Rooms with painted walls: The Creator flips a fair coin. If it comes up heads, she creates ten cubicles, each with the inner walls painted visibly blue. In each cubicle, there is one person. If the coin comes up tails, she creates one thousand cubicles. The walls of ten are painted visibly blue, and the remaining 990 are painted visibly red. You wake up in a cubicle and see its blue walls. What’s the chance the coin came up heads?


Or, making the example feel more relevant:

Rooms with buttons and painted walls: The Creator flips a fair coin. She creates one hundred cubicles painted visibly blue on the inside, each with a button on the wall. If the coin comes up heads, she rigs the buttons so that when pressed, one thousand cubicles are created, each painted visibly red. If it comes up tails, the buttons do nothing. You know that basically everyone presses the button, whether or not it works. You wake up in a cubicle and see its blue walls and the button. What’s the chance the button works?

In this example, there are more ‘observers’ in the case where the buttons work, but the observers can totally tell if they are in a room created by a button or not. This is like the more-general-but-more-hand-wavey second framing I discuss, in which you’re confident you’re not simulated, but still worry it would be somehow fishy if you were in the vanishing minority of non-simulated observers (and less fishy if there just aren’t ever many digital minds). It is also an analogue of the Doomsday argument.

This case is less straightforward. There is an intuitive case for an answer of 50% heads vs tails (there are the same number of observers in rooms with buttons on either heads or tails, and heads and tails are equally likely), and an intuitive case “very likely heads, i.e. very likely the buttons don’t do anything” (it is far more likely I’d find myself in a room with a button compared to no button if the coin came up heads). I won’t go into details here, but I think Joe Carlsmith’s essay on the different assumptions which give different answers (‘SSA’ vs ‘SIA’) is great.

Also note that these questions can be decision-relevant: there are cases where your answer matters for what you should decide to do. Consider:

Saving the cubicles: The Creator flips a fair coin. If it comes up heads, she creates ten cubicles, each with the walls painted visibly blue, each with one person inside. If the coin comes up tails, she creates 1,000 cubicles. The walls of ten are painted visibly blue, and the remaining 990 are painted visibly red, again a person in each. You wake up in a cubicle and see its blue walls. Then you hear an announcement, which you know would have been played in either outcome: all the cubicles except yours will soon be destroyed. You can choose to save either all the blue-painted cubicles, or half of all the red-painted cubicles, if there are any. Red and blue cubicles count the same for you — all you care about is saving the most expected cubicles. Which colour cubicle do you save?

The so-called ‘self-sampling assumption’ (SSA) asks that you reason “as if you were a random sample from the set of all [actual] observers”. That puts a 99.01% probability of the coin having landed heads, and so 0.99% on tails (by Bayes’ rule). The expected number of red cubicles saved is therefore 0.00999900.55<100.0099 \cdot 990 \cdot 0.5 \approx 5 < 10, so SSA recommends saving the blue rooms.

The so-called ‘self-indication assumption’ (SIA), roughly speaking, asks that you reason as if you were a random sample from among all possible observers. There are equally many blue rooms on either outcome from the coin flip, so conditioning on seeing blue gives 50% on heads and tails. The expected number of red cubicles saved is therefore 0.59900.5250100.5 \cdot 990 \cdot 0.5 \approx 250 \gg 10, so SIA recommends saving the red rooms.

The red room
Get it?

Again, there is already a rich discussion on which anthropic principle to use in cases like this (and some discussion about how similar questions may be ambiguous or confused somehow). So I won’t argue for a particular view at any length.

But think about how these thought experiment are relevant for digital minds. What would these two different perspectives have to say? Assuming there are roughly as many non-digital humans in worlds where we eventually make many digital minds versus worlds where we don’t, then on finding that you appear to be a not-digital human, SIA just doesn’t take that as evidence that you are more or less likely to be in a world which eventually gets filled with non-human observers. So all SIA cares about is the relative number of observers or ‘observer moments’ just like your own in either outcome.

SSA, by contrast, says you should reason as if you are a random sample from a set — a ‘reference class’ — of observers in whichever world materialised. In other words, it updates your prior on what world you’re in in proportion to the fraction of observers like you in some reference class in a given world; rather than the number of observers like you. So if the reference class contained digital minds as well as humans, then observing you are human would count as a major update against being in a world where we eventually create many digital minds. But notice SSA needs to answer a question which SIA doesn’t, which is what is this reference class and how am I supposed to know what it contains? In particular, why should it include many kinds of digital minds, which might be very different from our own? If you want to avoid the awkwardness of answering that question, that’s one reason to go in for SIA, which doesn’t take the observation that we are human as evidence either way.

Again, there is much more to this discussion, and I just see myself as mapping the question about whether we should be surprised that we’re not digital minds to these existing questions about anthropics, rather than trying to answer those existing questions. But if you’re interested for my take, then I lean towards SIA, the view that observing we are human shouldn’t count as a major update against the possibility of being in a position to create very many digital minds.

Assume we are in such a position — and SIA allows us to think it’s likely, unlike SSA — should we view our position with surprised acceptance or indifferent acceptance? I think surprised acceptance. Although we might assume there are roughly as many human-like observers whether or not we eventually create very many digital minds, it is still a remarkable, potentially very influential, and — yes — weird predicament to find ourselves in. But I think it is less like being temporarily misled by a magic trick, and more like winning the lottery.

That’s my best guess, but I’m unsure. That feeling of weirdness lingers, and I still feel a pull to modify my actual beliefs until my predicament feels less vaguely weird, though possible sources of that weirdness feeling pushes me back towards surprised acceptance and away from denial.

Summing up

In the first discussion based on the simulation argument, we were asking a question that remains even if we’re confident that ‘real’ humans do create many digital minds. The possibility raised by the simulation hypothesis (i.e. the conclusion of the argument assuming that sims are created) is where to locate ourselves in such a world, and in particular whether we should expect to find ourselves in a simulated world. Then I’m asking what that means for our prospects of being able to ourselves influence other simulated humans, and suggesting that it’s complicated, but this line of thinking should probably limit suspicions that we are non-simulated humans vastly outnumbered by simulated humans with observations relevantly similar to our own. And that’s relevant to the extent we might (eventually) worry about preventing such ‘ancestor’ simulations, or improving how they are handled. But I also think this whole argument is shakier than it seems, because it’s unclear what counts as ‘relevantly similar observations’.

But we don’t need to be actually unsure whether we ourselves are simulated to think that, being confident that we’re not digital minds, the suggestion that we eventually create vastly many digital minds would need to explain what now seems like a kind of cosmic coincidence that we’re not one of them. In other words: “isn’t it a fishy that we’re not digital minds, assuming we’ll make them?”. And that could be another distinct reason to doubt that we’ll eventually create many digital minds.

That second thought is structurally similar to the Doomsday argument, which is largely a question about which view to take on questions involving ‘anthropic’ selection effects. Those are deep and murky waters, but the view I find most elegant and least problematic — the ‘self-indicating assumption’ — suggests we are back where we started, relying on our prior guess before we decided to use our ‘appearing to not be digital minds’ as evidence.

In fact, feeling “back where I started” is how I feel overall, having written this. I do think that philosophical considerations like those above can often point in surprising and action-guiding directions, and I don’t as a rule think that philosophising always means taking a round trip, or “adds up to normality”. But while some of the arguments here are interesting and fun, and not at all conclusive, my takeaway is to pay most attention to more concrete arguments for (in this case) the possibility of digital minds. And, although I didn’t discuss them here, I think those arguments are very compelling.

Further reading

Back to writing