No ghost in the machine

Published 1 December 2025 ⋅ Comment on Substack

Back to writing
Contents (click to toggle)

15,611 words • 79 min read

Introduction

The illusion is irresistible. Behind every face there is a self. We see the signal of consciousness in a gleaming eye and imagine some ethereal space beneath the vault of the skull lit by shifting patterns of feeling and thought, charged with intention. An essence. But what do we find in that space behind the face, when we look? [Nothing but] flesh and blood and bone and brain. I know, I’ve seen. You look down into an open head, watching the brain pulsate, watching the surgeon tug and probe, and you understand with absolute conviction that there is nothing more to it. There’s no one there. It’s a kind of liberation.

— Paul Broks, Into the Silent Land (p. 17)

(sidenote: Major credit to Joe Carlsmith for sharing or inspiring many of the points here, through his writing and in conversation. Errors remain my own, don’t assume he endorses any of this, and so on.) AI systems will soon (or already do) have the kind of properties that should make claims on whether we treat them as mere tools, or more like moral patients. In particular, it will become increasingly apt (at least, not insane) to ascribe welfare to some AI systems: our choices could make some AI system better-off or worse-off.

The thought continues: conscious experience obviously matters for welfare. We lack good theories of consciousness, but — like “temperature” as it was understood in the 1600s — we understand that consciousness is a real and obvious feature of the world; and there are facts about which things have it, and to what degree.

But consciousness is also a private thing. A given conscious state isn’t essentially connected to any particular outward appearances:

And so on.

The thought goes on: if there comes a time when AI systems in fact are conscious subjects, welfare subjects, making real and serious moral claims on us if only we understood them — we’ll remain deeply ignorant about whether they are, and what is truly going on in the inside.

More pointedly, they might present either as gladly subservient, or not conscious at all, but inwardly and privately they might be having a very bad time. (sidenote: For more on the ‘stakes’ at play with AI welfare, this recent post by Joe Carlsmith is excellent.)


(Image) Neurons
Sketches of cortical neurons by Santiago Ramón y Cajal, c. 1899 • Source

In this post, I cast my vote for a particular, and somewhat unpopular, stance on the thoughts I lay out above. You could call it the “deflationary”, “eliminitavist”, or “strong illusionist” view about consciousness. It’s the view that the puzzle of explaining consciousness, properly analysed, is just another research program for the natural sciences; analogous to uncovering the mechanisms of biological life. There will turn out to be no “hard problem”, though there is a question of explaining why so many come to believe there is a hard problem.

It is a frustrating view, because (by the analogy with biological life) it casts doubt on the hope we might someday uncover facts about which things are “truly” conscious, and to what degree. But I think it’s also a hopeful view, because in contrast to every other view of consciousness, it shows how questions about machine consciousness can and likely will be solved or dissolved in some mixture. It will not confound us forever.

But that’s jumping ahead!

The realist research agenda

Here’s one approach to making progress on questions around AI welfare. Let’s call it the “realist agenda”, because of the implicit commitment to a “realist” stance on consciousness, and because (conveniently for me) anything called an “agenda” already sounds shady. It’s a caricature, but I’d guess many people buy into it, and I’m sympathetic to it myself.

I’m describing it so I have something to react against. Because, ultimately, I think it doesn’t quite make sense.


Here’s the plan:

  1. Advance the scientific and philosophical program(s) to identify which kinds of systems, functions, computations, brains etc. are in which conscious states; gradually increasing the accuracy and confidence of our discernment beyond our current system of (more or less) intuitive guesswork barely constrained by empirical knowledge.
  2. Devise tests for valence — a property of conscious states which is either negative or positive. Understand how to build increasingly accurate “valence probes”. Like plunging a thermometer into a roast chicken to check its temperature, valence probes trains on a (candidate) mind (person, animal, or digital), tells you whether there are conscious experiences at all, and how intensely good or bad they are. They could involve interpretability techniques in the AI case.
  3. Based on this work, figure out methods to reduce negatively valenced conscious experiences in the AI systems we’re making (and even promote positively valenced experiences), and figure out ways to design and train systems where the balance of positive over negative valence is naturally higher (including during the training process itself)
  4. Implement these methods.

Pursued to the letter, (sidenote: Because it’s confused, not because it’s hard. I’m not faulting the plan for being ambitious: theories of consciousness are very nascent, and one imagines a truly mature science of consciousness weaving together some very sophisticated strands of neuroscience, cognitive science, psychology, and so on; perhaps only after some heavyweight empirical and conceptual breakthroughs. I’m saying the plan might also just turn out to feel a bit confused.). The reason is (1)–(4) assume something like realism about consciousness (and hedonic welfare). I very much don’t think it would be worse than nothing if people did AI consciousness research with what I’m calling “realist” assumptions. here’s a (presumptive) analogy: heliocentric astronomers like Tycho Brahe collected observations, built theories, designed telescopes and improved on them, and made it easier for their successors to eventually shrug off the geocentric core of their theories. Still, if geocentrism could have been corrected earlier, that (sidenote: Not least because it would have “unclogged” a discipline filling up with literal epicycles and kludges. Anecdotally, I get a sense from some who think about AI welfare, that we’re bound remain very deeply confused by the time we need to make calls about AI consciousness, and we’ll have to muddle through with methods and policies under near-total uncertainty. I think a non-realist has grounds to be slightly less pessimistic than that.).

But I’m going to drop the AI thing for now, and just talk about views of consciousness general.

Physicalist realism is intuitively appealing

Now, here are some natural thoughts about consciousness in general. They’re the kind of thoughts that might motivate the “realist agenda” above. I’ll write them in first-person but note I’m not eventually going to endorse all of it.

You can skip to the next section if you don’t need a reminder of why you might buy into the “realist agenda” above.


Clearly, there are deep facts about which things are conscious. I know this, because I know — more than anything else in the world — that I am conscious. I know the fact that I am conscious is “deep” rather than just linguistic, or somehow theory-dependent, or whatever, because I can’t be mistaken about it. When I smell coffee, there is some conscious percept, quale, raw experience which is the smell of coffee. The smell of coffee does not involve the belief about coffee being nearby, which I could be mistaken about. But it can’t “only seem” like I’m experiencing the smell of coffee — for me to experience the smell of coffee just is that seeming. I also have no reason to believe I’m special. So if there’s a deep fact that I am conscious, there are deep facts about which other things are conscious; facts branded into the universe.

Here I mean “deep” in the sense of not superficial. We can easily be misled by behaviours, appearances, patterns of processing. Deep in the sense of discoverability — under the surface of easily observable facts, there lurk facts which, once we figure them out, will turn out to be undeniably true. Facts like, “this substance is a metal, that one only seems to be”, and “this cognitive system is actually conscious; that one only seems to be”.

I also mean deep in the sense of intrinsic, rather than especially context-dependent. Show me a lump of gold-looking metal. Ask: is this valuable? I’d say that’s not a deep question, once you get precise about what you’re asking. Do you mean “can I sell this for money”? There’s no mystery about how to find out. Instead, you might ask: is this really gold? I’d say that is a deep question, in the sense I have in mind. Fool’s gold (pyrite) looks a lot like real gold, but it isn’t. There was a difference between real and fool’s gold waiting to be discovered: whether the substance is made up of a chemical element with 79 protons in its nucleus.

There are deep facts about which things are ducks. The duck test says: “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck”. True. But some things fail the duck test, like coots, grebes, and robot ducks. There’s a deeper fact to discover about a duck-candidate: (sidenote: This is actually an imperfect example, because “duck” isn’t a taxonomic rank. But ornithologists agree: all ducks belong to the Anatidae family.)? Does it have duck DNA?


(Image) Duck
Jacques de Vaucanson’s “Digesting Duck” (1739) • Source

Similarly, facts about consciousness are “extra” facts over and above other appearances and other known physical quantities. Because we can know, for instance, the weight or shape of a brain and remain unsure about whether it’s conscious, and so on for other physical aspects we use to describe the non-conscious physical world. I won’t try too hard to pin down what I mean by this “extra” thing, but there’s a sense that consciousness sometimes “arises” or “comes along for the ride” or “attaches to” or “contingently happens to be identical to” certain physical things.

Now, I can reasonably infer that other people experience these raw and unmistakable conscious experiences, because they keep telling me about them, and their descriptions line up with my own experiences, and (like I said) I have no reason to believe I’m special (sidenote: Of course I can doubt that anyone else is conscious in principle, but I’m just trying to make reasonable inferences about the world based on simple explanations; that my brain isn’t magically special seems like a pretty solid bet.).

Ok, so there are facts of the matter that living and awake people are conscious, and these facts are deep and unmistakable and kind of “extra” in some way. What about other candidates for consciousness? That seems like a bigger inferential leap. What we’ll need is a theory of consciousness which tells us which things are conscious in which ways, and ideally also explains why.

Ok, what shape should this theory take? Well, people disagree about the metaphysics of consciousness. Some people are dualists; they think that mind or consciousness is somehow fundamentally non-physical. But I prefer a scientific attitude, and an attitude which is strongly disposed to explain things we observe in terms of arrangements of physical things, rather than entities which are entirely outside of our best understanding of basic physics, like souls or divine spirits. Of course it’s totally understandable why many choose to believe that the light of consciousness can survive brain death, or our brains are somehow receiving antennae for thoughts floating in the ether, but that’s totally unscientific. So, although I’m open to being surprised, it seems fairly clear that the right theory of consciousness ends up being a physicalist, or materialist one.

What would that program look like? How does it make progress? Crucially, we can’t observe this thing directly — that’s why there is such a confounding question about which things have internal lives. But we can take the approach of the empirical sciences whenever it is trying to construct a theory around some entity which can’t be directly observed; like a doctor with crude tools theorising about the cause of a mysterious disease. We can look to the correlates of the consciousness experiences which we humans report on, in terms of the brain regions which fire when we smell coffee, and so on. And we can figure out how to extend those theories to animal brains which are increasingly different from our own. Ok, fine.

Over in the theoretical wings, we can imagine strides on the explanatory side — why do these patterns of brain activity correlate with these experiences? Perhaps because they are embodying certain kinds of function, or specifically certain computations, which cause those experiences. Or maybe those experiences just are what certain computations feel like “from the inside”. Something like that.

Of course, we’ve not made much progress on these questions. We haven’t yet boiled down the ingredients of those raw, intrinsic feelings of seeing red, smelling coffee, feeling pain. We are very far from the dream of knowing exactly which things in this world are experiencing which experiences. But these are the prizes. The answers are knowable in principle, because they are deep facts about the world.


I think this is something like the train of thought that many self-consciously scientifically minded folks run through when they wonder about what’s up with consciousness, philosophically speaking. It’s the kind of respectable, appropriately humble, but not overly ‘woo’ view that motivates the agenda on figuring out AI welfare which I describe above. It’s the view I had until recently. Joe Carlsmith calls the views I’m describing (sidenote: Quoting Carlsmith: “We think about consciousness as this deep extra fact, like from this physical system did bloom this internal extra experience. So validating physicalism says that somehow that’s still true, even though it’s also identical with a physical process.”) I’ll call it “physicalist realism”: the view that there are deep facts about phenomenal experience, and they are analysable in physical terms.

Physicalist realism is surprisingly confusing

Unfortunately I think this view is basically untenable.

In particular, I think it’s unstable between a more thoroughgoing skepticism about consciousness being real or special or “extra” in any meaningfully deep sense; or on the other hand a (sidenote: For the time being I’m not going to consider these views very much. That’s because I assume most of my readers are already unsympathetic to substance dualism, and I currently find the other side of the dilemma much more compelling.).

Perhaps a way to see this is to really ponder what it would mean for these two things to be true:

  1. There are deep facts of the matter about the ‘true’, ‘raw’, or ‘phenomenal’ contents of experiences, beyond just the things we say, judgements we form, ways we act, etc.
  2. These are physical facts.

Taking the success of the natural sciences to explain (sidenote: “Life” being one example I‘ll consider. But also: electricity? Gravity? Divine revelation? Spirits and ghosts?) so far, physicalist realism surmises that these ‘phenomenal’ or ‘subjective’ experiences — these things we are searching for and trying to explain — also just are physical things. A conscious state or process is somehow just some particular pattern of brain activity; like how the concentric ripples of a stone dropped into a still lake just are a whole lot of molecules of water interacting with one another. But that view begins to feel increasingly weird and ephemeral when we ask how it could be true in any detail.

Consider what it means to say that “the experience of feeling the pain of searing heat on one’s hand just is a particular kind of (say) process in the brain”.

Is physicalist realism saying that we can analyse the pain as whatever brain process causes you to pull away your hand from the hot thing, and yell out? Well, not literally — you might feel intense pain, but not yell out or pull away your hand. You might feel pain but not pull away your hand. And intuitively, you yell out because it’s painful, not vice-versa.

Can physicalist realism analyse the pain in terms of the judgements we form about the pain, like forming new beliefs that I should try to avoid doing this again, that I’m an idiot, that I need to go to hospital, and so on? Again, physicalist realism is trying to pin down the nature of raw, intrinsic pain. You might think this is totally missing the central reality of raw pain. If there are deep facts about which states are intrinsically painful, it can’t be true that if only you could stop forming downstream beliefs about it, then the pain would go away. Beliefs aren’t intrinsically painful!

What about some more complicated bundle of associations? Can the physicalist realist say that, to explain pain, we just have to explain why you are yelling out, pulling your hand away, forming new beliefs and judgements about how this was a terrible idea, and so on? In other words, can they say that if you do all these kind of things, then that’s just all there is to pain? Is it fine — even appropriately modest — to pin a proviso on the exact bundle of associations between raw pain and physical things, but insist there is some bundle of physical things which are identical to raw pain?

No!

If the most obvious candidate (sidenote: Like the concepts we have from the cognitive sciences.) seem entirely inadequate, and it’s unclear how they could at all combine to a more complex theory which is adequate, I think the physicalist realist is in a weak and confusing position. If there are deep facts about conscious experience, it really feels like any possible candidates for what might make up an analysis of conscious experience in physical terms don’t penetrate the essential reality of conscious experience; they just dance around it.

We could look to less obvious ingredients of physical explanation, like microtubule quantum vibrations or whatever. But if the obvious candidates for physical explanation seemed so clearly doomed, I don’t see any reason to hold out hope that these exotic candidates will eventually make sense as explanations.


The Catholic view on the eucharist is that the bread wafer literally becomes the body of Christ by transubstantiation; though in its “accidental” properties it remains bread. To the Catholic, for the wafer to have the property of essentially and literally being the body of Christ, this is a fact surplus to the ‘accidental’ facts of the wafer, for example, still looking and tasting like bread.

When I was a kid in Sunday school, I felt blindsided by this insistence in the literal identity of Christ’s body and a piece of bread. I think the reason was that this insistence isn’t connected to any particular analysis of Christ-hood and bread, and so doesn’t produce any particular predictions or explanations. I couldn’t rule out transubstantiation, partly because I was so unclear on what it meant, if not any particular notion of physical identity I was familiar with. The best I could do was to throw up my hands and admit it sounded absurd: to stare incredulously.

To the realist physicalist, there are deep facts about conscious states, but they seemingly can’t be connected to any of the familiar (and physically analysable) cognitive concepts we have, like beliefs, desires, reflexes, dispositions, and so on. As I see it, there is just some provisional hope that conscious states can be analysed in terms of some physical properties or processes or whatever. The best I can offer, again, is an incredulous stare.

Reconsidering realism

If you are confident in the reality of transubstantiation, then presumably there is some answer to what it means for (sidenote: Even if the answer is that some parts of transubstantiation is essentially mysterious. Though I don’t think physicalist realists also want to believe that the relationship between experience and substance is essentially mysterious.). If there are deep facts about consciousness, and we insist on being physicalists about consciousness, then superficial attempts at physical explanation aren’t enough. That leaves us with a head-scratcher, about what a ‘deeper’ but nonetheless still physical and scientific explanation looks like. But such an explanation is presumably quite radical or revisionist with respect to our prevailing concepts.

Alternatively, we could drop the assumption that there are relevantly deep or “extra” facts about conscious experience at all. Call these “non-realist” views. Non-realist views don’t cry out for radical or revisionist theories, because they set a more modest challenge: explain the superficial stuff only. But it could turn out that the only way to explain the superficial stuff (talk about consciousness, pain-like reactions, etc.) does nonetheless require some wacky new concepts.

A table will help. These are the options I see for the physicalist who believes that consciousness can be explained scientifically.

→ What do we need to explain?
↓ What explanations do we have? Realism: there are deep facts about consciousness to explain Non-realism: there are no deep facts about consciousness to explain
Deep explanations: notably radical or revisionist new (but nonetheless still scientific) insights (1) What are these insights? As far as I can tell, no good ideas, and no reasonable attacks are forthcoming. Any candidate must break with current paradigms. In this respect, unlike virtually any other unexplained phenomenon in the world. (2) A strangely unmotivated view to reach: if there are no deep facts about consciousness, then it’s very unclear why we need radical or revisionist new insights to explain them. Non-starter.
Superficial explanations: continuous with our existing conceptual toolkit (3) Not tenable, because it seems like for any candidate explanation of consciousness, we can imagine deep facts about consciousness varying with respect to the explanans. (4) Tenable, but counterintuitive. Perhaps doesn’t take consciousness seriously enough: we can explain consciousness-related talk and behaviour, but we’re left with a feeling that we’re only talking around the key thing.

No option here is safe; every option is some flavour of really quite weird. Still, we can compare them.

Option (2) seems strictly less plausible than option (4), so let’s rule that out.

Option (3) is trying to have its cake and eat it. The cake is “taking consciousness seriously” as a real and deep phenomenon. The eating it is hoping that we can nonetheless explain it in terms of a non-radical extension of psychology, neuroscience, the cognitive and behavioural sciences, and so on. The result is that we try to give non-radical standard scientific explanations of different chunks of what we associate with consciousness as a deep phenomenon: why we behave the way we do when we’re in pain, the brain circuitry of vision, etc. But if the realist insists that there’s always a deeper thing to explain, beyond those superficial explanations, some cake always remains uneaten!

Option (1) has some precedent in the history of science. Atomic line spectra, blackbody radiation and the ultraviolet catastrophe, the photoelectric effect, and so on, actually did call out for a radical extension to the physical sciences in the early 20th century (quantum theory). One issue is that it’s not at all clear what such a radical extension would look like. But a more serious issue is that it’s hard to imagine any physical theory satisfying the realist who believes in certain deep facts about consciousness. When quantum physics was being worked out, physicists proposed atomic models that would explain some phenomenon if true, but the models just turned out to be substantially wrong. In the case of consciousness, what new kind of scientific theory or model would possibly explain something “raw and intrinsic subjectivity” in a satisfying way, whether or not it turns out to be true? So I submit that (1) is not an appealing option at all.

That leaves (4). The reaction to (4) is that treating consciousness just as a bundle of behaviours and brain patterns and so on, which don’t require especially radical explanations, is wilfully obtuse. In other words: if we want to insist on a superficial explanation of subjective consciousness, then we have to show that we’re not forced to explain more than the superficial facts about consciousness, despite the overwhelmingly forceful observation each one of us has access to, that I am absolutely and unambiguously conscious. That is, the question is whether we needed to be realists (in the sense I’m considering) in the first place.

And here I think there is a surprisingly strong case that there are no deep facts about consciousness to explain after all. Here are some reasons why.


(Image) Neurons
Neurons and glial cells • Source

Debunking and the meta-problem

The meta-problem of consciousness is interesting not least because it is hard to avoid taking a position that others regard as crazy.
David Chalmers

The meta-problem of consciousness is, roughly, the problem of explaining why we think that there is a (hard) problem of consciousness; or roughly why we’re inclined to be realists about consciousness.

The “meta-problem” of the Riemann hypothesis is the question of why people think the Riemann hypothesis poses a genuine problem. The best simple explanation is that it actually is a genuine problem: it is a well-posed and consequential question with an answer which is unknown despite much effort. If it wasn’t a well-posed problem, mathematicians wouldn’t think it is one. Of course, answering the meta-problem of the Riemann hypothesis doesn’t really teach us anything interesting about the Riemann hypothesis!

Similarly, we could say that the best explanation for why people think there is a hard problem of consciousness is because there actually is a problem of consciousness which is well-posed, unknown, and hard. And if there wasn’t one, people wouldn’t think there is one.

By contrast: we could say the “meta-problem” of the Cottingley fairies is why in the early 20th century (sidenote: Including Arthur Conan Doyle!) came to wonder about how and why fairies were living in Cottingley, UK. The answer is that they had seen hoax photographs, and what looked like fairies were really artfully decorated cardboard cutouts. And seeing this, we realise that there never was a meaningful question of how and why fairies were living in Cottingley, UK; the question falsely presumed there were any fairies.

We can learn a lot about the “hard” problem of consciousness by asking why people think it’s a problem. I claim we should expect in principle to be able to explain the meta-problem of consciousness without invoking (sidenote: In the canonical presentation of the meta-problem, David Chalmers talks about “topic-neutral” answers to meta-problems: explanations which don’t need to invoke the phenomena which is apparently problematic (but don’t necessarily deny it). That’s what I have in mind here.).

Why think this? If you are a physicalist, you think that we can in principle explain why people say stuff, and write stuff, and behave in certain ways, and so on, all in terms of well-understood physical things like neural firings and sensory inputs, and perhaps more abstract concepts built on top of them like ‘beliefs’ and ‘dispositions’; but not the consciousness phenomena which people puzzle over.

Well, here are some things people do:

Why did David Chalmers write about why consciousness is deeply puzzling? Maybe it is deeply puzzling — we’re not currently assuming that it is or isn’t. We’re asking whether we can give a consciousness-neutral account of why he wrote all those words. If David Chalmers’ writing is the product of keystrokes which are the product of complex nerve firings and brain goings-on, then I presume we can give a full account of his brain goings-on without explicitly invoking consciousness. Suppose we recorded a perfect 3D scan of Chalmers’ brain and its environment running over his entire life. Suppose you knew all the physics and chemistry of neurons and other brain stuff. If you had a few decades to millions of years to kill, you could flip from frame to frame, and ask: do I understand why this brain activity happened? In some cases, there will be a satisfying high-level explanation: the hunger neurons are firing because he hasn’t eaten in a while. As a fallback, though, you could always just trace every minute pattern of brain activity.

Now, there will be times when Chalmers starts pondering the hard problem of consciousness. What does the brain scan look like? Perhaps the neural activity that arises breaks known physics; as if moved by spirits or a soul. Maybe, but: why think that? And, moreover: the physicalist realist expressly does not think this will happen! Perhaps the neural activity is physically explicable, but we don’t yet know the relevant physics. Again: why think that? What would that look like? I’m tempted to press this point, but I hope most readers see what I’m saying. I don’t expect physicalist realists think the missing explanations of consciousness make different predictions from known physics about how individual neurons operate.


(Image) Neurons
Purkinje neurons • Source

So, ok, if you’re a sensible physicalist, you can in principle explain (and even predict) why a philosopher wrote a book about the hard problem of consciousness in terms which don’t invoke consciousness. What should we make of that?

I think we should react in the same way we naturally react when we learn why people were puzzled by the question of why fairies were living in a town in the UK. Because we can explain their puzzlement without assuming there were fairies, we have debunked the view that there were any real fairies. (sidenote: By the way: Chalmers thinks there is a hard problem of consciousness. I think it’s commendable and impressive that he also lays out the most compelling argument for anti-realism about consciousness I know of, and he — a dualist — totally owns the weirdness and trickiness of how it is that everything he writes about consciousness has a physical explanation!)

  1. There is a correct explanation of our realist beliefs about consciousness that is (sidenote: Roughly in the sense that we’d form the same beliefs whether or not they were true.).
  2. If there is a correct explanation of our realist beliefs about consciousness that is independent of realism about consciousness being true, those beliefs are not justified.
  3. (Therefore) our realist beliefs about consciousness are not justified.

Anyway: I think this is a very powerful argument. If we can ‘debunk’ realist beliefs about consciousness, do we have any other evidence or reasons to be realists?

We might in principle. It could be that consciousness is real, but by coincidence we form beliefs about it by mechanisms which don’t depend on consciousness. Maybe, but that seems to me like a bizarre view.

You could also reasonably maintain a belief even if you think you can debunk why everyone else believes that same belief. The Pope can reasonably assume that every other living person who believes they are the Pope of the Roman Catholic Church is deluded, except him. In the case of consciousness, that could matter if you think the overwhelmingly strongest reason to believe in deep facts about consciousness come from your own conscious experience, not the testimony or arguments of others. I share this intuition pretty strongly, but I think it’s missing the point. The debunking argument above applies just as much to my own beliefs — and your own beliefs — as anybody else’s.

Debunking arguments are fragile things. There’s a kind of overzealous debunking, which says that because we can explain Q-related phenomena, like beliefs about Q, without directly invoking Q itself, then Q isn’t real. For example, you’re likely currently reading words on a screen. I could say: you think there are words on that screen, but really they are just ensembles of pixels. You think the words move when you scroll, but really there is no movement, just the same fixed pixels cycling very fast between light and dark. This is an eye-rolling kind of reductionism. Some phenomenon can be perfectively real and (sidenote: Are countries real? Groups of people? Individual people? I’m perfectly happy with the common-sense answer here. I’m not trying to sound edgy in a first-year philosophy seminar.) When I say: “the words are just pixels”, you say: “sure — pixels which make up words. We’re talking about the same thing from two angles.”.

Rather, the kind of debunking I have in mind needs to establish that beliefs about the concepts or things in question are unjustified; not reliably tracking the truth. This would be the case if, for example, there’s a way of explaining how people come to form beliefs about some thing Q in totally Q-neutral terms. Say you come to believe that aliens visited Earth yesterday because you say a convincing photo, but later you learned the photo was AI-generated. (sidenote: The physicalist realist could press that whatever physical facts explain consciousness-talk (like why people believe there is a meta problem) actually won’t turn out to be consciousness-neutral, once we have a correct and complete account of consciousness. I do think this is an interesting line to press and I’d be keen to hear someone defend it. My pushback is that there will be a way to explain consciousness-talk in terms which leave no room for “deep” or “intrinsic” or “extra” properties, which the realists insist upon as being essential to consciousness, so the explanation is properly consciousness-neutral. But now I feel like I’m going round in circles.)


(Image) Descartes diagrams
Diagrams by René Descartes, La Dioptrique (1637) • Source

What exactly are we debunking here?

So far, I’ve tried to establish that there are “debunking” arguments against our “realist” beliefs about consciousness, which undermine the case for realism.

But I have lots of very different consciousness-related beliefs: “I smell coffee”, “I feel peaceful”, “I’m in pain”, “this here is the qualia of seeing mauve”, and so on. Which of them are debunk-able?

Surely not all of them. Something meaningfully and usefully true is being said, when someone says that they feel peaceful, or that they’re in pain, or that they smell coffee. There’s some back-and-forth in the relevant corners of philosophy of mind about how many “folk” psychological concepts are really referring to anything real. I don’t have a strong view, or especially think it matters, but I think we don’t need to be edge-lords and insist that people are talking nonsense or falsehoods when they say they smell coffee or feel pain. But if you were to quiz the person who says they smell coffee about what they mean, they might start to say debunkable things.

For example, imagine you tell me that there are qualia related to smelling coffee, such that the qualia make no functional difference to your behaviour, but do make a difference to your subjective experience. I say this is debunkable, because if qualia make no functional difference, then they don’t influence what you say, including about the supposed qualia. Yet you are telling me all these things about aspects of consciousness which supposedly have no influence on your behaviour. So they must have some explanation which doesn’t at all rely on those qualities being real. So non-functional, ‘epiphenomenal’ qualities of consciousness are debunkable — your testimony about them doesn’t supply me any evidence for them.

But what if you just told me that you smell coffee? I don’t think this is easily debunk-able, because if I were to try to explain why you said that without invoking smell — in terms of complex brain processes triggered by olfactory sensory inputs and so on — you can say, “sure, very clever, but that’s just a complicated way of re-describing what it means to smell something”. Very fair.

Now, what if you told me that you are in pain? Here I expect things get complicated. Say Alice and Bob are playing tennis, and Alice stops and says “I’m sorry, I’m in pain right now — it’s my knee.” There’s no dubious metaphysical import there — Alice is communicating that something is wrong with her knee, and she wants to stop playing. But suppose Bob and Alice are discussing consciousness, and Alice pinches herself in front of Bob, and said, “Look — I feel pain right now!”. Then Bob might hear Alice as saying something like, “…and I mean the kind of pain which can’t just be reduced to known quantities in psychology — a kind of raw, private, ineffable, unobservable (etc.) pain you can’t superficially explain.” Here, for reasons discussed, Bob could reasonably argue that Alice is saying something false; she is not in pain in any sense which can be debunked, that is in any sense which would make no difference to what she’s saying.

So there is a line that separates regular talk about how we’re feeling, from ‘debunkable’ claims about consciousness, and I think the line maintains that most non-philosophical talk has totally reasonable true interpretations, and I maintain I’m not just trying to be edgy and disagreeable. So the debunking argument against realism isn’t a wrecking ball which causes collateral damage against our states of mind in general. But many of the more philosophically coloured views some of us do have about consciousness do seem vulnerable.

I think this is the line between what I’ve loosely been calling ‘deep’ and ‘superficial’ properties of consciousness. A superficial property can be broken down to the kind of cognitive pulleys and gears studies in the empirical sciences of the mind. A ‘deep’ property is some ‘extra’ property over and above the pulleys and gears, and as such it can be debunked.

Consciousness as illusion?

Our introspective world certainly seems to be painted with rich and potent qualitative properties. But, to adapt James Randi, if Mother Nature is creating that impression by actually equipping our experiences with such properties, then she’s doing it the hard way.
Keith Frankish

So far I’ve avoided one name for the view I’m describing, which is “illusionism”. This is the view that, when we entertain beliefs about ‘deep’ properties of consciousness (of the kind which can be debunked), we are under the throes of an illusion.

I’m not too fussed about whether “illusionism” is a great label or not, but it’s worth pondering.

Why not stick with a term like “non-realism”? (sidenote: Suggested by Keith Frankish.), is that some of the properties we ascribe to consciousness aren’t literally real, but words like “qualia” are still getting at something, and there’s a whole bundle of consciousness-related things which are real and worth caring about, and the vibe of “non-realism” is too minimising and dismissive.

But a second reason is to emphasise that, whatever this view is, it’s hard to avoid concluding that we are very often quite wrong about the nature of consciousness, especially when we try to philosophise about it. If you want to take physicalist realism seriously, I think you do end up having to conclude that when we confront questions around mind and consciousness, we run ourselves into intuitions that are hard to shake, whether or not we think they’re true. Perhaps you don’t believe in immaterial souls, for example, but I’m sure you appreciate why so many people do. Or you might agree it strongly seems on first blush like p-zombies should be metaphysically possible, and so on. Our brains really are playing tricks on us (or themselves, I suppose).

Moreover, to say consciousness is “illusory” is more than saying realists about consciousness are wrong — you can be wrong but not subject to an illusion, and vice-versa. It’s more like: all of us seem vulnerable to some fairly universal and often very strong mistake-generating intuitions when we reflect on the nature of consciousness.

Some visual illusions, for example, are basically impossible to un-see, or some ambiguous images are impossible to see in some other way. I never really learned to see how that photo of a black and blue dress could instead be a photo of a white and gold dress, for example. But I judge, as a matter of fact, that it could be a photo of a white and gold dress, as indeed it turns out to be. That to say, illusionism doesn’t imply we can easily see, grok, grasp, apprehend how every debunk-able intuition about consciousness, like the belief that qualia exist, could be mistaken. But neither does that undermine illusionism, more than my failure to see the dress as white and gold undermines my factual belief that it is white and gold.

Some people find illusionism patently absurd for a different reason: to experience an illusion, you need to be subjectively experiencing it. But illusionists are denying that there is subjective experience. (sidenote: Similarly: “of course I believe in free will, I have no choice!”). The reply is to point out one can be mistaken — that is, subject to an illusion — without the kind of subjective experience that is, for the illusionist, not real.

So I don’t think illusionism is self-undermining, but I do think it’s a weird and radical view. It’s a view which I lean towards thinking is true, because I think it has the best arguments in favour, and other views have (it seems to me) strong arguments against. But I can’t feel it in my bones.

As I write I’m looking out of a plane window, and the sky is intensely blue. I cannot convince myself that there is nothing ineffable, raw, private, or intrinsic about that bright colour. I can’t convince myself there isn’t some deep fact of the matter about which colour, painted in mental ink, fills my visual field.

But illusionism predicts that confusion, too; at least the non-question-begging cognitive aspects of my confusion. It predicts I write words like this. So there’s a tension between a very strong set of intuitions, and a very strong set of arguments.

Chalmers captures the kind of inner dialogue that inevitably follows:

And the dialogue goes on. Dialectically, the illusionist side is much more interesting than the realist side. Looking at the dialectic abstractly, it is easy to sympathize with the illusionist’s debunking against the realist’s foot-stamping. Still, reflecting on all the data, I think that the realist’s side is the right one.

The analogy to life

So “illusionism” says our intuitions about consciousness are wrong — deeply, perhaps intractably and unshakeably wrong. But there’s a more constructive angle on the kind of view; this is the analogy to biological life.

The corner-cutting version of the story is that for the longest time everyone believed in a non-physical “life force” which animated living things. Then the empirical life sciences matured, and by the late 19th century or so, scientists understood that living things are big sacks of controlled chemical reactions. Mysteries remain — in some sense there are more well-scoped open problems in the life sciences today than any point in history — but every serious biologist grasps that biological life in general doesn’t call out for extremely radical or non-physical explanation.

Inconveniently, I do think a more careful account of the history of the “life” concept would be messier. There was no single view on what “life force” meant; sometimes it was intertwined with the idea of a “soul”, but sometimes it wasn’t avowedly non-physical. Descartes and others viewed nonhuman animals as mechanical, but humans as animated by a soul. The influential geologist James Hutton took as given some kind of animating and explanatorily necessary “life force”, but tried to reframe the concept away from metaphysics, and more in terms of some kind of organising principle distinctive to life which was nonetheless entirely physical. The idea of “élan vitale” came later, from Henri Bergson’s 1907 L’Évolution créatrice, and shifted focus away from the details of cellular processes, and toward the idea of a “creative force” driving evolution itself.

Life can still seem miraculous, including and especially to the experienced biologist. The point isn’t that the gestalt sense of amazement was lost; the point is that no deep, or radical, or metaphysically revisionist explanation turned out to be needed. Nor did open questions go away. Questions about life just became continuous with questions about established sciences.

When the life sciences were understood as continuous with other empirical sciences, something happened to the concept of “life” itself: it was no longer so tenable to suppose there is exactly one correct conception of what “life” is waiting to be discovered. If you ask whether a virus is alive, or a replicating entity in the Game of Life, or a slime mold or a prion or a computer virus, well, take it up with your dictionary. “Life” turns out to be associated with a bundle of properties, and sometimes they come apart and leave genuinely ambiguous cases.

I’m not saying that there aren’t interesting, predictively useful, and non-obvious things to say from thinking about what features divide living and non-living things. Schrödinger, writing in 1944, correctly theorised that biological life must support hereditary information, that this information has some way of not degrading, that this kind of stability must rely on the discreteness of the quantum world, and that heritable information is thus stored as some kind of “aperiodic crystal” with “code-script” properties. (sidenote: And I’ll return to this point — some scientific concepts, with a little kickstarting from experiment and surrounding theory, turn out to point to a surprisingly singular or neat or distinctively-shaped “joint” of nature, despite the concept itself not directly implying as much.)

Still, it becomes clearer how aliveness can be ambiguous, not in the sense of varying degrees of aliveness, but varying meanings and interpretations of “alive”.

Is a virus alive? It’s a contentious question, but not because a virus is only 60% alive, and the threshold for deserving the “life” title is vague or disputed. Nor does the answer depend on some testable physical facts we first need to know, but don’t currently know. It’s a linguistic ambiguity: just what do you choose to mean by “life”? If you’re wondering whether a virus is living, you might protest that it really feels like there has to be something to discover — some way to peer into the virus’ essence. But a reasonable reaction from a virologist is just to shrug: “I don’t know what to tell you! It has some life-related features, and lacks others!”

In some sense you need to have a radical view to be open to the analogy with life in the first place. Some maintain that life just has none of the deep, or extra, or radical properties that consciousness must have, perhaps because “life” clearly supervenes on physical biology, but consciousness doesn’t. But if you buy the arguments above, then I do think the analogy is suitable.

Here is Brian Tomasik with a fairly stark expression of the view we’ve reached:

It doesn’t make sense to ask questions like, Does a computer program of a mind really instantiate consciousness? That question reifies consciousness as a thing that may or may not be produced by a program. Rather, the particles constituting the computer just move—and that’s it. The question of whether a given physical operation is “conscious” is not a factual dispute but a definitional one: Do we want to define consciousness as including those sorts of physical operations?

I’m not so sure about the “and that’s it” part, for the record.

Mind sciences and life sciences

…Do not all charms fly
At the mere touch of cold philosophy?
There was an awful rainbow once in heaven:
We know her woof, her texture; she is given
In the dull catalogue of common things.
Philosophy will clip an Angel’s wings,
Conquer all mysteries by rule and line,
Empty the haunted air, and gnomed mine—
Unweave a rainbow…

— Keats, Lamia (1820)

If the analogy is good, then we might expect the “science” of consciousness to be continuous with the extant sciences of the mind and behaviour — psychology, neuroscience, cognitive science, and so on.

In particular, we’d expect “folk” intuitions to more often be complicated, disambiguated, or explained away, rather than validated. Take the widely held intuition that there is something deep and essential about how my personal identity flows through time: at different times, there is a deep and discoverable fact about who, if anybody, is me. If I undergo a hemispherectomy in order to ‘split’ my brain into two functional halves, ‘I’ remain in exactly one of them, if any. Or if all my atoms are destroyed and near-instantly remade in just the same arrangement on Mars, I don’t go with my copy — that is another person.

As far as I see it, careful thinking about personal identity (Parfit comes to mind) has shown that widely-held intuitions about deep facts of personal identity — facts lurking beneath superficial properties like behaviour, psychological and causal continuity, shared memories, and so on — are useful but mistaken. They’re mistaken in large part because they are debunkable, because we can explain them away without validating them. After all, it’s not surprising that we’d form such strong intuitions when “splitting” or “teletransportation” cases are either rare or fictional, so that we’re rarely confronted with challenging cases. In our neck of the woods — brains in relatively impervious skulls that we are — there’s very little practical use to forming more complicated views on personal identity.

Finally, though, we should remember the Schrödinger example. Schrödinger figured out something substantially true about living things, which does turn out to be a hallmark of basically every system we’d intuitively say is genuinely alive, which is that (in my attempt at paraphrasing) living things must generally maintain and propagate information encoded in aperiodic structures that are stable against thermal noise. Genes and gene-like mechanisms do turn out to carve out a neater and more interesting “joint” in nature than turn-of-the-century scientists might have expected, having established that “life” isn’t — as a matter of definition — some deep and singular feature of the universe.

Maybe consciousness talk turns out to be some totally arbitrary spandrel of human genetic and cultural evolution: some wires got crossed in our prefrontal cortex and now we’re all tangled up in these random conceptual schemes that aliens and independently-evolved AIs would find quaintly bizarre, perhaps themselves hostage to similarly random complexes of ideas and confusions about their own minds. But I suspect not. I suspect there are some general mechanisms that generate consciousness-intuitions are fairly abstract from the details of being human, which would suggest both that naturally-arising consciousness intuitions are somewhat non-arbitrary and shared. It also suggests that we can (sidenote: For an example of someone who has taken swipes at elaborating on possible mechanisms, I’d nominate Douglas Hofstadter, centrally in I Am a Strange Loop.) about when consciousness intuitions are present, how they change, what they require, and so on.

The analogy between the life sciences and the study of consciousness suggests a kind of spiritual disenchantment, voiced by that famous Lamia excerpt. I think that’s really the wrong vibe. I think it’s exciting when scientific processes gets to work untouchable object. The image is not closing down fabulous metaphysical beliefs, but opening up new scientific problems and explanations, and more follow-up problems, and so on.

What about pain?

What I am after all is a kind of utilitarian manqué. That is to say, I’d like to be utilitarian but the only problem is I have nowhere those utilities come from […] What are those objects we are adding up? I have no objection to adding them up if there’s something to add.
Kenneth Arrow

Still, there’s an awkwardness about the view I’m arguing for. One reason we care about consciousness is because many people think that consciousness matters in some ultimate way. For example: physical pain seems bad because it’s a conscious state with negative valence. And it seems important to help avoid pain in ourselves and others. The mere outward signs of pain aren’t themselves bad — we’re not roused to get up from our seats and help the actress portraying Juliet stabbing herself. If there are no deep facts about which states are truly painful, that’s especially inconvenient, because we have to choose how to act — we can’t dodge that question.

(sidenote: Joe Carlsmith suggested a similar example to me.). Imagine you are a quiet-lover; somebody who cares about avoiding loud noises as an ultimate goal. You live and act in a crowded city, and every source of noise you’ve minimised so far has been some noise which people always hear: car horns on the busy roads, music in public squares. One day, you learn about a volcano poised to erupt, unless somebody urgently diffuses the volcano by pouring a special diffusing formula into the vent. If the volcano erupts and anybody is standing by, it would be the loudest thing they ever heard. But no person and no creature will be standing by if it does erupt: the volcano is on a remote and lifeless island. For the first time, you realise you need to figure out what exactly are these “loud noises” you care about. Like the idiom goes, if a volcano erupts with nobody to hear it, does it make a sound?

In this case, it’s not that you need missing empirical information. It’s not a deep mystery whether the isolated volcano makes a loud noise. There are just different senses of “noise” which always coincided before, and now you’ve got to pick one in order to know how to act.

What are your options? The most obvious option is to consider the reasons you cared about loud noises in the first place. Ok: you decide it’s because because they disrupt the peace of the people who hear them. Here you’ve found a more ultimate justification, and you’ve used it to pick out a particular sense of a previously ambiguous concept. You retreat to a more ultimate thing — something like ‘promoting peace’ — which was being tracked by ‘avoiding loud noises’. But you might notice you do still care a little about the volcano eruption. Maybe you struggle to find some neat unifying principle which explains why you should ultimately care about both volcanos and car horns.

That’s fine, of course: you can care about many things. But it makes your life’s mission feel a little less well-grounded; more arbitrary; messier. You’ve just got to live with that.

It might go this way in the case of pain, and ‘valenced’ conscious states in general. You might start out hoping that there are deep facts about which things are in pain, or what counts as a negative conscious state. Of course there is some ambiguity about how the word “pain” is used: you might casually say that a house plant is in pain because you’re not watering it. And of course it’s not an issue, on this view, that the word “pain” is at all ambiguous or vague, just that there is some deep property that pretty obviously is the pain property you care about.

But the view I’m advocating is that there may be no such ‘deep’ property of pain at all. In other words, we can always pick away at candidate definitions until we start feeling really confused about how we can ground out some ultimate source of (dis)value with whatever remains. Here’s how the dialogue might go:

And so on. I’m not trying to make a crisp argument here, I’m pointing to the difficulty that the physicalist realist is likely to have when they really think about how and why certain conscious states are essentially and deeply good or bad, in a way which grounds views about overall goodness and badness, and how we should act. It’s a difficulty I feel quite strongly, since I share the strong intuition that there is something bad about pain in a deep, ultimate, and simple way.

In particular, as I tried to point out, (sidenote: For more on this line of thinking, see Daniel Dennett’s wonderful “Quining Qualia”.) The realist about consciousness needs to draw a line between the value of some “raw” experience, and judgements, preferences, dispositions etc surrounding the experience (which can be wrong). And thinking about where to draw the line can induce the feeling that there isn’t a valid distinction to draw the line between in the first place.

The non-realist physicalist can avoid getting confused about how to draw the line, because their view denies that there are "raw"experiences, or at least doesn’t carve out any special or non-arbitrary role for them. This is different from the view that judgements or preferences about experiences are always right according to the non-realist; though some view more grounded in preferences might look less hopeless in comparison to a view grounded in the ultimate hedonic value of experiences. In any case, the cost of the non-realist’s view is that it’s far, far less clear how any conception of “pain” can play the normative role many people want to demand of it.

So (A) non-realism is the right view on consciousness looks incompatible with (B) the intrinsic goodness or badness of conscious states ultimately ground out a big part (or all) of what matters, and how we should act. There are a couple of ways you can react to the confusions that result:

  1. (B) is right. At least, I choose not to untether myself from such a crucial mooring point for how I act. So in any case, I reject (A).
  2. (A) is right, so we’ve got to give up on (B):
    1. … by making a small revision to (B), such as by dropping the requirement that conscious states be intrinsically bad, or that they ultimately ground out what matters.
    2. … by making a major revision to (B), such as by switching out talk of phenomenal states with some notion of (your own, or everyone’s) preferences compatible with non-realism about consciousness, adopting some more rules or virtue-based guides for action, or becoming a nihilist.

I’m absolutely not going to suggest  an answer here. But I’ll say what goes through my mind: a sense that option (B)(1) is sensible and realistic, then head-spinning confusion on more reflection.

The first thought is this: it would be very convenient if what to do, or at least how to compare outcomes by value, significantly depends on unambiguous facts about an intrinsic property (phenomenal consciousness). The property that matters becomes more like gold — where we can ‘discover’ what is true gold versus pyrite — and less like ‘biological life’ or ‘personhood’, where ethical disputes which hinge on what’s alive or what counts as a person blur confusingly into semantic disputes about what those words mean at all.

We might reason: I seek out and value lots of different things, and I’m confused about what they have in common. Ah — one thing they have in common is that they route through my own experience, so it’s the experiences they cause that matter. And, ah — since all those experiences must have something in common, that something must be some kind of intrinsically value-conducive property which makes me seek them out and value them, or perhaps makes them worth seeking out. And we can call this “pleasure” or “happiness” or “positive hedonic tone” or whatever.

But it would be too convenient. Are we saying anything more than the circular conclusion that we should seek out good experiences because we seek them out? Perhaps there is a worth-seeking-out quality to those experiences. (sidenote: This is the ‘heterogeneity problem’ for ‘hedonic phenomenalism’.) The thrill of intense exercise is just so unlike getting lost in a sad film, which is unlike the nervous excitement of falling in love, and so on; and in many ways those experiences are more obviously similar to, correspondingly, straightforward physical pain, feeling ‘legitimately’ sad, or experiencing generalised anxiety. Other than, of course, the fact that we seek out and endorse (etc.) items on the first list, and vice-versa.

Intrinsic value and disvalue wouldn’t just give us a way to tie together disparate experiences within a person, it would give us a way (in principle) to compare the value of experiences across experiencers. It would mean there is a simple fact about whether, say, preventing five minutes of pain for Bob justifies depriving Alice of an hour of joy. One experience isn’t better than another only for Alice, but simpliciter. Our brains become purses to a shared currency of hedonic valence.

Taking the non-realist (or ‘deflationary’) view, then, means giving up on what could have been an amazingly convenient and unifying vision of ethics: the hidden scoreboard written deeply into the universe, the balance of essentially good and essentially bad states of mind.

The hope for the non-realist is that they can drop all the metaphysical ambition, and leave behind some more prosaic ethical system(s) which still justifies much of everything we care about in practice.

Why think this? Because of where I think most our ethical views come from before some of us theorise too much. Presumably we form most of our ethical attitudes first, and then propose ideas around intrinsically valuable conscious states as some kind of explanation or theory for those views, and then perhaps add some extra views which are uniquely suggested by our theorising about consciousness. If the structural foundations of our ethical thinking form before theorising about intrinsically valuable conscious states, then winding back that theorising should leave most the structure standing.

As a first pass, we can imagine taking the concern we thought we had for intrinsically (dis)valuable phenomenally conscious states, and shifting that concern toward some close substitute that makes sense: something like self-endorsement, or preference satisfaction, or knowledge of preference satisfaction, or some ideas of cognitive ‘healthiness’ or ‘wholesomeness’, or (as the case may be) a big ol’ mix. Indeed, I expect the kind of action-guiding principles that a concern for intrinsically (dis)valuable phenomenal conscious states can largely survive, because many of the arguments that route through can be rerouted to avoid committing to such states existing.

It’s unclear how far the non-realist can cleverly argue their way back up to justifying richer kinds of comparability between experiences and experiencers, without just assuming the conclusion.

For now, Brian Tomasik comes to mind. He is the person I think of when I think of people who centrally care about avoiding suffering for its own sake, but he also does not believe that qualia exist. That’s a set of beliefs you are allowed to have, and which apparently (sidenote: Brian Tomasik is a reflective guy!).


Tomasik makes a germane point here:

Suppose there were some sort of “pain particle” corresponding to the quale of suffering. Why care about that? What makes that any less arbitrary than a particular class of particle movements corresponding to particular cognitive algorithms within certain sorts of self-aware brains?

To expand on that, suppose there were deep facts about what states are pain; which things have negatively valenced “qualia”. Presumably we humans are wired to respond to pain qualia in the appropriate ways — we yell out, try to avoid experiencing them, and so on. But since qualia are supposed to be essential, non-functional things, we could imagine some creature that earnestly seeks out pain qualia. Despite being truly wiser and more reflective than any of us, the creature reacts with apparently earnest delight and no regret at all when it experiences them. (sidenote: For more on this line of thinking, I recommend David Lewis’ classic, “Mad Pain and Martian Pain”.) On what grounds could you argue it only seems to want pain ‘qualia’, or is unjustified in wanting them? Doesn’t the thought experiment strain credulity in the first place?

I’m as confused about ethics as the next person. But I do want to push back against the framing which says: non-realism or illusionism about consciousness is so radically destructive and counterintuitive — what are its implications? This, to me, smells like “if God is dead, everything must be permitted”. If your theoretical gloss on why pain is bad doesn’t work, that doesn’t make pain not bad; and you shouldn’t feel worried about lacking a deep theoretical justification for that view.

Virtually all ethical progress to date has not relied on or invoked theory-laden conceptions of phenomenal consciousness. So I expect many of the arguments which seemingly rest on some commitment to realism about phenomenal consciousness can be rerouted. For example, we can still point out how, if we care about avoiding our own future pain, it might be irrational not to care about the future pain of others (whatever pain is in the final analysis). Or if we care at all about the pain of cute animals, and we strive not to let ethically arbitrary features limit the extent of our care, and we acknowledge cuteness is ethically arbitrary, then we might reason we ought to extend our care to other creatures in pain. And so on.

I really want to emphasise this. Compared to a hoped-for realist theory of consciousness, a messy, anti-realist, and deflationary view of consciousness needn’t recommend that you care less about things like the suffering of nonhuman animals, or newborn babies, or digital minds, or whatever else. Realist and deflationary views of consciousness don’t straightforwardly disagree over degrees of consciousness.

We were right, in a messy and circumscribed way, that life matters. We were wrong that there is a deep, discoverable, essence of life. We didn’t care especially less about life — even for its own sake — after we learned it’s not a deep thing. Ethical thinking can be at once systematic, rigorous, demanding, and (for now) ungrounded.

Weren’t we talking about digital minds?

Oh yes. Does any of this practically matter for AI welfare?

One upshot is that the ‘realist research agenda’ will — strictly and pedantically speaking — fail. Projects like identifying the ‘correlates’ of consciousness, figuring out criteria for when the AIs are ‘really’ conscious, devising tests to derive an unambiguous cardinal measure ‘how conscious’ a digital system is; these will turn out to be slightly confused ambitions. Working on them could then be bad, because in the meantime, they’ll absorb the efforts of some earnest, smart, well-meaning people. The opportunity cost of confused research here is high!

You could reasonably object that I’m tilting at windmills. A very small number of people are seriously working on issues around digital consciousness, and as far as I know they are not committed to a research agenda with strongly or explicitly realist vibes. Eleos is the only research organisation I know of doing work on digital welfare, and for the most part their work seems to involve consensus-forming around the importance of the overall topic; convening top researchers in academia, pushing for sensible policy asks which are currently pretty insensitive to realism, and so on. Anthropic have an “AI welfare officer” (Kyle Fish), I don’t think any of his or Anthropic’s work (sidenote: Some decently big names in AI land have written-off concerns around AI consciousness on grounds that you could say draw on realist intuitions. For example, Mustafa Suleyman seems to think there is a deep distinction between biological consciousness and ersatz simulations of consciousness-related-process. That’s a view which makes more sense when you think consciousness is a deep and extra property, and makes less sense on a more non-realist or deflationary view. That said, I am confident that folks who (i) have more realist intuitions; and (ii) do care about AI consciousness, also think Suleyman is not being careful enough. So you can totally resist sillier kinds of skepticism about AI consciousness from a realist standpoint.). At some point, though, I imagine the rubber will hit the “let’s do object-level research” road, and philosophical commitments might become more relevant then.

Second, you could object that it’s largely fine to set out on a scientific enterprise while you’re still unsure or even wrong about fuzzier questions around metaphysics or philosophy, because the detailed empirical and scientific work tends to clarify confusions which initially felt purely philosophical (cf. life). I think that’s fairly reasonable, though I worry that the philosophical tail is more likely to wag the scientific dog when it comes to AI consciousness, since the questions are so wrapped up with strongly-held intuitions, ethical peril, and incentives to look the other way from inconvenient conclusions. So it could be unusually important to proactively try to (sidenote: Incidentally, I think we should gear up to use AI to help us figure out fuzzy questions like this, but that might be for another post.), or at least to remain appropriately open to a range of answers, in tandem with the interpretability and other more science-like work.

On the other hand, I think the non-realist view I’m arguing for is potentially great news for concrete research projects, because it naturally suggests scientific angles of attack.

The project I am most excited about is making progress on the ‘meta problem of consciousness’ — how, when, and why do some thinking systems start saying stuff about consciousness, especially stuff along the lines that there is a hard problem of it. Extending that question, why do we imagine that experiences have essential or intrinsic properties, or that they are uneliminably first-personal, and so on? Luke Muehlhauser and Buck Shlegeris have a really cool writeup where they build a toy “software agent” which, if you squint extremely hard, generates some outputs which can be interpreted as consciousness-like intuitions. Chalmers suggests some hypotheses of his own, as does Keith Frankish, (sidenote: It’s a bit unfair that I’ve got this far and not discussed actual hypotheses about the meta-problem. For now: sorry, maybe I could do that in another post. I don’t think my main argument hinges on which particular hypotheses are onto something.). But work on these questions strikes me (sidenote: Here is David Chalmers in a 2017 Reddit AMA: “I agree the key is finding a functional explanations of why we make judgments such as “I am conscious”, “consciousness is mysterious”, “there’s a hard problem of consciousness over and above the easy problems”, and so on. I tried to give the beginnings of such an explanation at a couple of points in The Conscious Mind, but it wasn’t well-developed… Illusionists like Dennett, Humphrey, Graziano, Drescher, and others have also tried giving elements of such a story, but usually also in a very sketchy way that doesn’t seem fully adequate to the behavior that needs to be explained. Still I think there is a real research program here that philosophers and scientists of all stripes ought to be able to buy into… It’s an under-researched area at the moment and I hope it gets a lot more attention in the coming years. I’m hoping to return soon to this area myself.”).

Similarly, I could imagine research studying the circumstances in which AI systems “naturally” hit on consciousness-like talk. Are the set of (realist or otherwise) intuitions we have around phenomenal consciousness some idiosyncratic upshot of how human brains are wired? Or do thinking systems from a wide basin of starting points end up with very similar views? When studying LLMs, of course, there are huge knots to undo because the LLMs have been trained on human talk about consciousness. One ideal but (as far as I know) practically Herculean experiment would be to train an AI system on some corpus where all consciousness talk, and semantic ‘neighbours’ of consciousness talk, are removed. If the LLMs spontaneously regenerate human intuitions about consciousness (with respect to their own experiences), that would be huge. And if we can’t literally do that experiment, are there more feasible alternatives?

A related and more general question is something like: “under what conditions do the models self-ascribe conscious experience?” This excellent paper presents some interesting results, where prompting the models to engage in sustained kinds of self-referential thinking makes them more likely to talk about themselves as conscious, and suppressing features related to deception increases consciousness-talk. I think the non-realist gloss is appealing here: there are patterns of thinking which — in some reliable way across particular cognitive architectures — yield consciousness-like intuitions. In fact, there is an even wider set of questions around what AI introspection could involve, mechanistically. Under what conditions can we talk about anything like “honest” or “accurate” introspection? Anthropic have some great work along these lines; I’m sure there’s a ton more to be done.

Against ambiguity

Lastly, I’ll suggest a policy-relevant upshot. Maybe we should deliberately design the AIs, and the systems they’re part of, to make more (ethical) sense to us. What I mean is this: we arrive at these questions of AI consciousness carrying a bunch of existing, tested ethical and political and philosophical intuitions.

We have concepts like “person”, which tend to be unique entities which are psychologically continuous over time. We know how those things fit with existing institutions and rules and norms. And we could, in principle, devise AI systems in a way so that it’s overall fairly clear how they naturally fit with that picture we all broadly agree on and understand. Which is to say, we could aim at a world where you can look at an AI system and confidently discern, “ok, this thing (say, my AI photo editor or flight booker) is a tool, and it has no affordances or dispositions to make anyone believe otherwise”; or otherwise, “this thing is an AI person — it’s been built to non-deceptively report on its beliefs, including about itself. It knows, and we know that it knows and so on, what’s up, and what rights and duties attach to it. And where relevant and non-contrived, it shares some deep properties with human people.”

We could fail to do that in a few ways. We could deliberately suppress consciousness-talk in AI systems, like by penalising such talk in training. Initially, we could agree that person-like AI systems can’t (for example) be split into a million copies, or have their memories constantly wiped, or be constantly deceived about very basic self-locating beliefs, or be specifically trained to (sidenote: Isn’t this question-begging? I don’t think so. You could coherently require that a model’s beliefs about (say) its physical location are screened off from deliberate attempts to induce a specific answer. There is a difference between training a model to believe it’s running on a data centre in Virginia, and the model accurately inferring as much — something like the difference between “lookup” and “inference” in the original sense. And there’s a similar difference between the model outputting consciousness-like talk because some people trained it to say those particular things; and reaching the same outputs “on their own”.).

Eric Schwitzgebel has made a similar point (most recently here), which he calls the “design policy of the excluded middle”, according to which society should avoid creating AI systems “about which it is unclear whether they deserve full human-grade rights because it is (sidenote: Mustafa Suleyman advocates against building person-like or “seemingly conscious” AIs at all (and also predicts that the AIs will never be seemingly conscious unless we design them to be).) or to what degree”. If I’m reading Schwitgebel and his co-authors right, their argument routes through the cost of uncertainty: if we go ahead and build “ambiguously conscious” AIs, some reasonable views of consciousness will say they’re conscious, and others won’t. Whether or not we act as if they’re conscious, some reasonable views will say we’re making a grave error. Because the downside of making grave errors in this context are big compared to the usefulness of going ahead and building ambiguously conscious AIs, we should avoid making them in the first place.

I want to emphasise a specific angle on that idea, based in particular on the kind of non-realist view I’ve been arguing for. In what ways could it be uncertain or ambiguous whether an AI is conscious? You can be empirically uncertain, of course. Or you can be unsure which theory of consciousness is right. Or you can know the empirical facts, but take a philosophical view which says some AI falls into a vague middle-ground in terms of its level or degree of consciousness; like the space between sleep and waking, or (sidenote: Imagine you meet an old colleague, and they have a few wisps of hair on their head. You can take a magnifying glass to his ambiguously bald head, but you remain unsure whether they’re bald, because you’re unsure about the threshold for baldness: a kind of ambiguity (arguably) without empirical uncertainty. But then suppose they get a hair transplant, or start wearing a wig. You might get into an argument about whether they are “really” hirsute, but likely because there is something else at stake which causes you to quibble over definition.). But there’s yet another kind of ambiguity which non-realism surfaces, which is a more expansive disagreement about which features, when all is said and done, we should agree to care about.

The worry, then, is that some AIs will be ambiguously conscious in a way that doesn’t involve uncertainty between metaphysical theories, doesn’t involve empirical uncertainty, and doesn’t involve vagueness. If this non-realist view is right, all the model interpretability and clarity about the metaphysics of consciousness alone won’t resolve questions of how to treat systems which fit awkwardly into our existing normative systems.

One option is to quickly patch all the holes and ambiguities in our normative systems, in time for the Cambrian explosion of mind-like things to disperse all across the world. Another option is to constrain the systems we design, at least in the beginning, to fit the laws, norms, ethical intuitions, and so on which we’re already fairly comfortable with and agreed on. Then we can relax the design constraints of AI systems to look weirder, and we can test how our normative systems handle them, adapting and adding to them when needed, and set off a kind of co-evolution. I think that’s how we got to the world of basically functional laws and norms we have today, so I’m more hopeful about the co-evolution plan, than a plan which says we should breach the dam and let the deluge of new forms of intelligence in all at once.

Conclusion

Let’s go back to the ‘realist research agenda’, and think about upshots.

  1. Advance the scientific and philosophical program(s) to identify which kinds of systems, functions, computations, brains etc. are in which conscious states…

The spirit here is right on, but the literal wording is presumptive, because it implies we’ll get some kind of canonical mapping to “conscious states”. Replacing “conscious states” for “consciousness-related phenomena” and we’re good to go.

  1. Devise tests for valence — a property of conscious states which is either negative or positive. Understand how to build increasingly accurate “valence probes”…

Something like this is still going to be hugely useful. But, on a non-realist or deflationary view, a literal “valence probe” might not make sense even in theory. We could reword “devise tests for valence” for something like “devise tests for the kinds of mental phenomena we care about (and carefully establish what we care about)” — and perhaps also, “build systems which make it easiest to administer uncontroversial and unambiguous tests of stakes-y consciousness-related phenomena like valence”.

  1. Based on this work, figure out methods to reduce negatively valenced conscious experiences in the AI systems we’re making…
  2. Implement these methods.

Hard to disagree with that. Though I might note a slight uncomfortableness around “reducing” and “promoting” language. The framing of intervening to directly reduce pain feels most apt when we’re thinking about nonhuman animals like chickens, or humans who need our proactive care, like children. When thinking about sane, autonomous, empowered humans, it’s also apt to think about “how can we set things up so people are free to avoid what they don’t like, and to help themselves?” The AIs we’re getting are going to be more human than chicken; so I think that would be a complementary framing device.

It’s hard to know how to relate to the possibility that consciousness, human or otherwise, is more like a magic trick than real magic. There’s disbelief, which is how my own gut reacts. There’s disenchantment; the feeling of losing the part of the universe you hoped to hang your ethics on. But, as I’ve tried to argue, there’s excitement. It means that questions around AI consciousness are answerable.



Back to writing