Which Computations Do I Care About?

By Brian Tomasik

First published: . Last nontrivial update: .

Summary

We rightly care about conscious suffering, including by ourselves, fellow humans, and other higher animals. However, what are we to make of trickier cases where our intuitions are less refined? Should we care about smaller, simpler animals like insects? Digital minds? Simulations? I survey these questions from two angles: (1) If you were one of two minds with different constructions, would you be able to tell subjectively which mind you were? (2) Does the mind experience conscious emotions? I suggest that a broad class of abstract minds deserves to be cared about, including embodied and non-embodied digital minds, provided they're running the right algorithms. External, physical features of minds tend not to matter in comparison with the emergent algorithm that they implement. My discussion raises as many questions as answers, and I conclude by exploring a few of them.

Contents

Introduction

When we adopt the goal of reducing suffering, we need to establish what sorts of things in the world can suffer and by how much. This problem doesn't have a well-defined answer in the same way as, say, what sorts of things in the world have mass and how much. Instead, we need to choose what we mean by "suffering," i.e., choose what kinds of things we want to care about as being suffering. While in principle this answer could be anything, most of us feel an obligation to apply regularity conditions on our concern, such that, for instance, two things which only differ in trivial ways receive a similar degree of concern.

Types of caring

There are many levels on which we care about things. On the practical level, I do many things which imply that I care more about myself than other organisms, even after accounting for the significant instrumental value of personal health and willpower. This is unfortunate, but it's the reality on the ground. It doesn't, however, mean that I can't describe an "idealized" caring framework that I aim to shoot for with my altruistic actions. This idealized framework is what I explore in the current essay.

We also have different modes of altruistic concern. One is a quick emotional response—recoiling in disgust, crying with sadness, or yelling with anger when we see someone harmed. Usually it has decent precision, though sometimes it can misfire: e.g., when we see an already-dead animal being cut open, or when someone stabs a life-like doll. It also doesn't fire enough in many cases, such as when the organisms being injured are out of sight, when the harm is reported in numerical form ("The death of one man is a tragedy, the death of millions is a statistic."), when the person suffering is unattractive or evil-looking, or when the animal that's in pain is gross or scary (spiders, snakes, leeches).

We try to go beyond our visceral responses to suffering by thinking more deeply about what's going on. Even though it looks disgusting to cut open a human body, if that body is dead, then there's no one actually feeling the incision. Even though cockroaches look disgusting, we have to remember that they have some of the most sophisticated brains in the insect subphylum. A recording of a baby's cry sounds awful, but it doesn't actually represent anyone in the immediate vicinity who needs help. These kinds of realizations constitute a reflective mode of concern, and most of us agree that these opinions should trump our immediate reactions. Over time, neural rewiring may indeed make these reflective sentiments become our more immediate responses.

A final level of caring comes when we consider modifying the boundaries of our more reflective concern itself. Many compassionate people feel sympathy for the plight of poor humans nearby them but don't think much about those far away. Many more are concerned with humans all over the world but don't give much thought to animals. Those who do also reflectively feel sympathy for animals need to determine the boundaries of that concern (for example, can insects suffer?). And the future holds the possibility of vast numbers of non-biological minds whose moral consideration will need to be sorted out. Upon thinking more about the similarities among types of minds and what at core we care about in other minds, we may change the scope of beings that we think can suffer.

Criterion of Subjective Indistinguishability

Suppose doctors make you unconscious. They put you into a machine that scans your body and creates an atom-for-atom copy. They put this copy on another table just like the one that you laid on before going to sleep. Now you wake up. Which person are you—the original or the copy? You can't tell. The two experiences are indistinguishable (at least until the doctors inform you of the answer).

In the moral realm, we might introduce what I'll call the

Criterion of Subjective Indistinguishability: Given two versions of your brain, if you can't subjectively tell which one you are, then the two minds deserve equal moral weight.

The "subjectively" adverb is important, because you might be able to distinguish two brains of yours by having outsiders examine them, by being told which is which, by looking in a mirror, etc. What matters is whether you can tell just "from the inside."

This principle seems innocuous enough, but its implications are dramatic—so much so that even I have some trouble accepting a few of them. Let's examine some below.

Substrate independence

David Chalmers has a famous "fading qualia" thought experiment for substrate independence of minds. While I sharply diverge from Chalmers on what consciousness is, this particular example works just as well for a reductionist about consciousness like myself. The scenario imagines that scientists have devised artificial silicon-based neurons that completely mimic the behavior of natural carbon-based ones.

The original neurons in your brain are one-by-one replaced with these artificial chips. Your thinking ability and behavior remain unchanged during the operation, since by assumption, the artificial neurons replicate all the important operations of regular neurons, and Occam's razor militates against anything further being involved in the regulation of your brain.a If this process were done to you while you were unconscious, upon waking up, you wouldn't be able to tell which kind of neurons you had. On the inside, it would "feel" the same.

Taking a slightly bigger leap, we might imagine that it doesn't matter if memory is stored in patterns of neural association or in electronic memory cells, as long as the content is the same when it's accessed by the brain. It also doesn't matter if the electronic memory is RAM, disk, tape, flash drive, or whatever. It could even be scribbles on a sheet of paper as long as the access speed would be sufficiently high (which of course it wouldn't be in that case). Here I'm assuming that the storage mechanism for neural memory is not a crucial component of consciousness. This seems plausible but should be explored further.

Other hardware components also wouldn't matter: What kinds of wires or wiring structure was used, what materials composed the processors, how densely the material was packed, etc.

We're familiar with a distinction between biological neurons and digital logic circuits, but in fact, the space of computing substrates is much larger. For instance, the field of reservoir computing uses recurrent neural networks in which inputs are processed in arbitrary, random, nonlinear ways by an intermediate cluster of neurons, and then these processed features can be used alongside the original inputs in making predictions. (This is reminiscent of kernel methods in machine learning.) In addition to using digital computers to calculate these intermediate neural activations, researchers have also employed (see pp. 3-4) analog electronics, crystallized networks, optical systems, and even physical water waves. As the discipline of field computation makes clear, physics allows for many substrates of computation.

These proposals give new perspective on the (confused) anti-functionalist argument that digital computers can't be sentient because we can interpret various other physical systems as doing similar sorts of computations (e.g., "Why Computers Can't Feel Pain" by Mark Bishop). The anti-functionalist claims, "Computers can't have consciousness because other physical processes could be seen as doing analogous operations." The functionalist replies: "Who says you can't use those other physical processes to create conscious minds as well?" Computing with analog physics is not a complete response to the anti-functionalist concern (for more, see "What is a computation?" later in this piece), but it is a suggestive rebuttal to the incredulity that some philosophers feel when imagining that, for example, the physics of water could play any meaningful role in intelligent computations.

One way in which computing substrate might matter was if certain types of computations were only possible with certain types of substrates. For instance, if continuous physical operations are actually an integral part of consciousness, then it might be legitimate to regard physical operations using that substrate differently from digital operations that present-day computers do. Of course, people might potentially hook up continuous physical processing to artificial computers, so the distinction is not between biology vs. artificial computation but just between digital vs. continuous computation. In any case, I don't think there's any clear evidence that the brain's computations require continuous rather than digital physics, and progress so far in AI shows that we can get very far with just digital algorithms. Even if our brains used continuous physical properties, why not also care about a digital AI that could perform similar functions?

Simulations

Suppose the sensory inputs to your brain were gradually and undetectably replaced by fake inputs, like those used in futuristic virtual reality. In place of real photons hitting your eyes, the visual signals from your optic nerves to your brain are created by artificial electrical impulses encoding the same data. Similarly for your ears and nose and tongue. Then for the neurons on your skin and internal organs. Reality feels the same as it always did because the inputs to your brain are indistinguishable from what they were before. There's no way you can subjectively tell this apart from a "real" world if the inputs are realistic enough.

Non-conscious algorithms don't matter

Your body does a lot of "computations" of which you're mostly unconscious: Blood-sugar management, hormone release, digestion, autonomic nervous regulation, cell repair, RNA transcription, etc. You can detect some of these through signals sent to your brain, but the details of how the computations are done is opaque. The same is true for a lot of what happens even in your brain: Many basic and sophisticated behaviors happen unconsciously. One fascinating illustration of this is blindsight. In fact, most of what the brain does is unconscious (the "iceberg" analogy).

As long as the inputs to your conscious mind are the same, you can't tell what algorithms produced those inputs, so it doesn't matter (to you). If the inputs are nearly identical, then it hardly matters at all how they were generated. For example, it's plausible that the human brain uses something like a neural-net architecture for classification problems. But what if we hooked up decision trees instead? Would that make a difference as long as the classification performance was similar? I guess it wouldn't. The same goes for support-vector machines, ensemble methods, boosting, bagging, or whatever else. The computation could be serial or parallel. It could be done locally or sent off to a computing cluster in the neighboring city (as long as the response latency would be low enough).

Clock speed

This section has moved here.

Brain size?

Naively it seems plausible that if the underlying algorithm that your brain is running is the same, then the physical size of the materials doing that computation shouldn't matter, because you can't tell how big the materials are. For example, if your brain was electronic and ran on conducting wires, it wouldn't matter if you doubled the thickness of those wires. Similarly, if a big brain uses many neurons to transmit the same signal that a small brain would transmit with a single neuron, then the number of neurons used doesn't matter.

Insulator thought experiment

Nick Bostrom has a thought experiment that can be taken to suggest why brain size should matter for amount of experience: Suppose you have a brain made of conducting wires, and you innocently slide an electrical insulator down the middle of the wires, to the point that electrons now don't cross the gap, all while the computation is running. Have you thereby doubled the amount of subjective experience? Many people think not. After all, they might say, if you were that brain beforehand, you wouldn't notice the insulator being slid into you, so the end product is the same as the original. Of course, this needn't be the case, and Bostrom himself thinks that something important does happen when the insulator is inserted. There's no law of conservation of consciousness analogous to the law of conservation of mass/energy. If the context changes, by converting one unified conscious actor into two, then our assessment can change.

Who is "you"?

This example highlights

Hidden assumption in the Criterion of Subjective Indistinguishability: Defining "you" when talking about "not being able to tell which brain you are" already bakes in notions of what "you" are and how many of "you" there are.

In the example with the electrical insulator, we saw that it's not clear after the insulation whether there's still just one "you" or two of them. In other cases, like that in the next section, some may dispute whether there is any "you" at all.

The problem with this "you" talk is that it runs close to the homunculus fallacy of picturing your consciousness as having a single "seat" where it resides and surveys everything that goes on. But in fact, the sense of self is a byproduct of parallel processes that have no definite center, beginning points, or ending points. Consciousness does not want to fit into our binary pigeonholes; the brain just does what it does and forces us to come up with the categorizations.

There's much more to say in this discussion, but it became too long for this essay, so I moved it to "Is Brain Size Morally Relevant?"

Physics simulations?

Suppose I have a software physics simulation. It's so accurate that it computes the position of every proton, neutron, and electron in your body and stores them in a big list of numbers.b For example: At time t = 8.2384, proton #1 is at (18.2348, 7.2374, 9.0992), ..., neutron #1 is at (1.2012, 97.2286, 34.1235), ..., and so on. If we run this program forward—advancing the time step, computing the updates to each particle's position, and replacing the old list of numbers with a new list of numbers—is the result a conscious instantiation of your mind? I have some trouble accepting this, because the numerical updates seem so unlike what happens in my brain. When we're just talking about different hardware, or differences in the non-essential subroutines, I can squint and say that the new brain seems like my own brain enough to imagine being that new brain. In this case, it doesn't even look like there's any mind at all; like with the Chinese room, a first-pass intuition says that all that's going on is a bunch of dumb updates unrelated to any neural processing, much less any thinking. You could try to deny that there is any mind here at all, so that the Criterion of Subjective Indistinguishability doesn't apply.

But think about it this way: Real physics is also just a bunch of dumb updates by tiny particles. These updates move together in such a way that if you look at a big enough scale, you see a structure emerging that looks like a mind. We begin to adopt a "phenomenal stance" (similar to the "intentional stance") to that ensemble of changes that are fundamentally just dumb particle updates. So, too, we can squint at this series of numbers and recognize a higher-level pattern emerging that's visible in its structure—a defined algorithm showing its face. Compare to the way in which ASCII art produces images we immediately recognize just from "dumb character sequences," or regular images from "dumb pixels."

The algorithm that we see emerging from the dumb numerical updates can be used to predict roughly what the macro-scale structure of the future numbers will be even without doing the micro-scale physics. It begins to seem plausible to regard this process as a mind of its own, and once we do, the Criterion of Subjective Indistinguishability implies that this mind can't tell itself apart from the same mind run in another manner.

There is a difference between a brain run in hardware versus a brain whose hardware is built through a software data representation of physics. But at the end of the day, it's the algorithmic structure of the operation that seems to be what we care about most. We've seen plenty of examples above in which specifics of the hardware don't matter. Anyway, before we become physics chauvinists, we should remember that maybe our own reality is itself just a mathematical representation.

Hardware vs. software

The good news about the question of physics simulations is that it may not be very important. Micro-level physics computations are extremely expensive to compute, and almost all simulations will likely be at a higher level of abstraction.

This brings us to a more general and more important question of whether hardware-based simulations (the mind is composed of hardware components that do its functioning) should be treated equally with software-based simulations (the mind is represented by abstract data structures for its components, and the interactions are done abstractly via software). Arguments analogous to those in the physics case suggest that software should be treated on equal footing as hardware, though I'm still not totally sure of this.

Once an algorithm is well established for industrial use, it may be moved from software to hardware. One of many examples is hardware neural nets:

When the particular task at hand does not require super fast speed, most designers of neural network solutions find a software implementation on a PC or workstation with no special hardware add-ons a satisfactory solution. However, even the fastest sequential processor cannot provide real-time response and learning for networks with large numbers of neurons and synapses. Parallel processing with multiple simple processing elements (PEs), on the other hand, can provide tremendous speedups. Some specialized applications have motivated the use of hardware neural networks. For example, cheap dedicated devices, such as those for speech recognition in consumer
products, and analog neuromorphic devices, such as silicon retinas, which directly implement the desired functions.

When implemented in hardware, neural networks can take full advantage of their inherent parallelism and run orders of magnitude faster than software simulations.

If the long-term move from software to hardware is a general principle, then we might expect many of the minds of the future to be implemented via hardware, in which case the hardware vs. software question wouldn't matter as much. Whether most future minds will be hardware-based is an interesting topic for further study.

Abstract computers

Could an arbitrary universal Turing machine give rise to consciousness? Does that include a man manipulating symbols in a closed-off room, or a Lego computer? What about a mechanical device performing a single Turing-machine operation every million years? Again, it seems like the answers are "yes, if the Turing machine is implementing the right algorithms" because you as that computation wouldn't be able to tell what you were running on if the environmental-input informational signals were identical to those of a biological human.

Update, July 2016: A critique of the Criterion of Subjective Indistinguishability

I wrote most of this piece in 2013. My views on consciousness have changed slightly since then, and I now (in 2016) regard the Criterion of Subjective Indistinguishability as slightly inadequate.

As hinted in the "Hidden assumption in the Criterion of Subjective Indistinguishability", it's tricky to define what computations are instances of "you" and which aren't. Indeed, any physical process can be seen as instantiating "you" given some interpretation. But let's assume we have a framework for identifying those computations that most saliently constitute instances of "you".

Another question is to explain what "subjective distinguishability" means. What's the standard for deciding whether a mind can distinguish itself from another mind? We could define "distinguishing yourself from another mind" to mean "making verbal statements that correctly identify that the algorithm implementing your thoughts is not some other algorithm". For example, a Windows computer might correctly output a statement that "I'm not running a Mac operating system". But why make verbal outputs the gold standard? What if a person has severe expressive aphasia and can't make any statements? Maybe an alternative could be that a person presses a button to correctly distinguish herself/himself. But what if the person also can't move? Or what if the person can't voluntarily control thoughts at all but still has proto-thoughts relating to subjective distinguishability. Or what if we consider more basic neuronal subprocesses in the person's brain to implicitly declare themselves to be distinct from other sorts of subprocesses in other brains? Drawing a sharp line around what it means to subjectively distinguish oneself from different algorithms is tricky.

Finally, rather than thinking about myself as one particular clump of matter, I find it more helpful to conceive of my abstract cognitive algorithms as determining the evolution of many physical systems jointly. In this case, the Criterion of Subjective Indistinguishability could be reformulated to say that "you should give equal weight to all the saliently instantiated versions of your brain, regardless of substrate or implementation details". However, this still begs the question of what counts as an instantiation of my brain. How much detail does it need to have? Does a computer simulation of my brain not count because it's not implementing atomic-level quantum mechanics, even if the high-level behavior is similar? Conversely, does a tiny model of my brain's general behavior count even if it doesn't contain neuron-level dynamics? The intuition behind the Criterion of Subjective Indistinguishability wants to say "an instantiation counts if it's subjectively indistinguishable from your own biological brain", but we already saw the difficulties of making that notion precise.

Criterion of Conscious Feeling

We've gotten surprisingly far without leaving our own mind! Inability to subjectively distinguish our mind among biological, artificial, simulation, non-essential-subroutine, and physics cases has brought a large class of brains into our moral sphere.

But the Criterion of Subjective Indistinguishability is clearly insufficient to cover all moral cases. For example, I can subjectively distinguish myself from you just by consulting my memory to check whether I know a certain obscure fact from my childhood. I can subjectively distinguish myself from a rural farmer in Mexico by checking whether I can do integral calculus. Yet it's very clear that you and rural Mexican farmers deserve moral consideration as well.

We need a different kind of principle that captures what it is about other minds that makes them valuable. This is extremely tricky, but colloquially, these minds matter because they have a "feeling of what it's like" to experience their emotions. For example, when I think to myself, "Wow, I'm conscious! It's amazing that there's a distinct texture to my experiences," this is getting at the thing that we value. Of course, it's not crucial that I be able to verbalize this as a sentence—doing so would exclude young children and nonverbal animals, but it seems likely that they too have this sensation in an implicit, nonverbal way.

Libraries could be filled with writings throughout the ages trying to capture the essence of what I'm getting at, so I won't endeavor to theorize in greater detail. Roughly, I'm proposing a

Criterion of Conscious Feeling: The organism can explicitly or implicitly reflect on its emotions as "something that it feels like" to have this experience. The emotions themselves are the source of value or disvalue.

The mirror test might be roughly sufficient for suggesting consciousness in present-day animals, though I'm pretty sure it's not a necessary condition. For example, dogs don't pass the mirror test, but this could be because they're not primarily visual animals, and instead, they can recognize their own scent through urine. Maybe other animals can't recognize physical signs of themselves at all but still have this ability to reflect on their emotions under the surface. D. M. Broom echoes this point (p. 100). It's not at all clear how far down the animal kingdom conscious feelings extend, but given the high level of uncertainty of what consciousness even is, I wouldn't rule it out for crustaceans, insects, worms, etc. I would mostly rule it out for, say, "introspection" and "reflection" in a programming language (unless the language is implementing an actual brain-like algorithm) because they seem not to capture the full extent of what's involved in conscious self-reflection. A thermostat, too, "reflects" on its temperature and aims to adjust it, but I'm pretty sure I don't care about a thermostat (or I care at most to an extremely small degree).

Similar brains matter similarly

The Criterion of Conscious Feeling implies that if brains only differ in non-emotional ways, then they matter equally. For example, if you've read the final Harry Potter book and I haven't, then except for the difference in emotions produced by this fact, it has no bearing on the value of the emotional experiences we undergo. A smart person's suffering is only morally different from that of a less smart person insofar as intelligence directly impinges upon the experience of the suffering. A pig running a nearly identical algorithm for "feeling satisfied" as Socrates has comparable moral worth, at least at the level of that algorithm if not the whole brain (which may indeed be materially different).

Non-hedonic computations don't matter

In general, only the hedonically valenced conscious computations in a brain matter morally. (This is surely a controversial statement, but it's how hedonistic utilitarians feel.) Thus, we could hypothetically strip out the non-hedonic parts of a brain without affecting how much we care about it. Depending on empirical details, this might include parts of the cortex and many other brain regions that don't do as much emotional processing. Of course, when doing this stripping away, the brain needs to remain conscious, or else the moral importance is lost.

The cortex, for instance, is often seen as important in consciousness, though apparently it's not essential:

The evidence for the non-conscious nature of the cerebral cortex consists of lesion studies in which large amounts of cortex can be removed without removing consciousness and physiological studies in which it is demonstrated that the cerebral cortex can be active without conscious experience.

Lesion studies have shown that up to 60% of the cerebral cortex can be removed without abolishing consciousness (Austin and Grant 1958). An entire hemisphere can be removed or much of the front or back of the cerebral cortex can be cut off yet consciousness persists.

Fiset et al. (1999) and Cariani (2000) have shown that cortical activity can be normal or even elevated during the unconscious state of general anaesthesia. Alkire et al. (1996) also showed that cortical activity related to word recognition occurred during general anaesthesia.

Our intuitive assessments of when a brain is able to perform a certain function are not always accurate. For example:

External appearances don't matter

As discussed in the opening, we don't want to extend our sympathies willy-nilly, because this isn't fair to the things that we actually care about. So, for example, we shouldn't care about realistic dolls, characters in fiction books, actors feigning injury, already-dead animals, people being cut open under general anaesthesia, and so on. The reason is that these things do not have conscious feelings.

Similarly, attractiveness, cuteness, disgustingness, criminal guilt, mental health, and so forth are also not factors that affect the intrinsic importance of someone's emotions. (There can of course be instrumental reasons to treat people differently in some of these cases.)

Giant lookup tables

We've seen that features of a mind besides its core algorithms for conscious emotions don't seem to matter. What are we to make of a mind that has a very different algorithm for controlling its behavior—e.g., one that determines its next move using a giant lookup table based on its past history and current state? I think this mind is probably not conscious, because while it may indeed exhibit all external, behavioral signs of consciousness, it's not running the right algorithm on the inside. It seems like the important part with consciousness is actually running through the algorithm, not just finding out the answer that the algorithm would yield. What matters is the journey, not the destination.

Does my position on giant lookup tables violate the Criterion of Subjective Indistinguishability? Would I not be able to tell if I were a giant lookup table? My objection to such an argument comes down to the "hidden assumption" caveat: It's not clear there would be a "you" as a giant lookup table. What exactly constitutes something as "you" or not? Is it just behavior to an external observer? Certainly the lookup table would claim to be you and would claim not to be able to tell itself apart from you. However, I think "being you" also has to do with something going on inside. After all, I could paint a sign on a rock that says "I feel indistinguishable from you," but this obviously isn't sufficient. I think a giant lookup table needn't be "you" either. I agree this is a shady area, and the fact that one can bake desired conclusions into the definition of "you" suggests that the Criterion of Subjective Indistinguishability, while a good intuition pump, maybe isn't carving nature at its joints.

Further questions

What are the boundaries of a mind?

In the brain-size discussion, we saw this question crop up. For example, when the metal-wired brain was split by the electrical insulator, did it become two minds instead of one? We could ask a similar question for split-brain patients, although because they remain hooked up to the same body, one might wonder whether the separation is full enough to count.

What if there are relatively isolated subsystems within a large brain (or even within the peripheral nervous system), and if those subsystems were separated, they would behave like autonomous creatures? Are there such subsystems in the human nervous system? Do the subsystems count as separate minds, or not because they're hooked up to the larger system? Are they separate "you"s in the Criterion of Subjective Indistinguishability? Do they have to actually be separate agents acting on the world before they count as possible "you"s?

Perhaps the thinking behind these questions is influenced by the homunculus fallacy of picturing a mind as a specific point inside the brain where consciousness lives. Certainly some parts of the brain are more important for consciousness than others, but there's not a single point at which the important thing happens. Moreover, it's possible consciousness algorithms take place at a macro-scale and also at smaller micro-scales. Maybe small structures that we would independently call conscious contribute to a bigger whole that we also want to call conscious.

An example of this is a China brain. How should we count the weight of the China brain compared against all the individual Chinese citizens who comprise it? Is the big brain merely equal to any given constituent person? This is a really confusing problem, but remember that there's nothing physically mysterious going on. The China brain is what it is: A material machine doing things. The weirdness comes from our attempts to delineate boundaries of minds and count them, to say how many there are and how much they matter. That process is fuzzy because we're sort of trying to fit a square peg into a round hole. Nature just is as it is; our attempts at pigeonholing sometimes get confused. Anyway, we have to do the best we can unless we want to radically revise our program for delineating and weighing various minds.

No clear separation of "conscious" and "non-conscious" processing

As the homunculus discussion above highlights, there's not a distinct point in the brain that produces consciousness while everything else is non-conscious. Lots of parts work together to do the processes that overall we call consciousness. It does seem clear that some parts are not very important (e.g., much of a person's body, many peripheral nerves, some cortical regions, etc.), but as we keep stripping away things that look non-essential, we risk having nothing left. By way of analogy, I imagine looking in a big box full of stuffing for a tiny ring, only to find that there is no ring and that the stuffing itself was the content of the gift. (In the case of the brain, it's not the stuffing itself that matters but the structure and algorithmic behavior.)

I suggested above that we can replace the non-conscious parts of a brain with anything we want without affecting subjective experience, as long as the inputs and outputs of the non-conscious subroutines are identical. But what are the boundaries of the "non-conscious" parts? There aren't necessarily precise ones. Does this then mean that we could replace the whole brain with anything else so long as the inputs and outputs were the same? I don't think so, because, for instance, I don't think a giant lookup table should count as conscious. Say you're conducting the Turing test against a robot that uses a giant lookup table. You insult the robot, and it responds by crying, with tears coming down its cheeks. This might arouse our immediate sympathies, but on reflection, I don't think we've caused the robot to suffer, because the response seems like an automatic reflex, and the important intermediate processing hasn't gone on under the hood.

So the character of the algorithms does matter, not just the inputs and outputs. Maybe the algorithms matter more for the parts of the brain that are more closely related to what we think of as consciousness, and the more extraneous a subroutine is, the less I care about its algorithm. In this case, perhaps I'm not completely indifferent between neural nets vs. support-vector machines (vs. lookup mechanisms like memoization), but I'm almost completely indifferent if it turns out that these algorithms are not near the "core" of conscious brain processing (admitting the fuzziness in defining a "core").

"Hedonic work"

Eliezer Yudkowsky asks: "For that matter, IEEE 754 provides special representations for +/-INF, that is to say, positive and negative infinity. What happens if a bit flip makes you experience infinite pleasure? Does that mean you Win The Game?"

There are two angles from which to approach this. One is behavioral. A rational agent maximizing expected utility will modify its behavior according to the landscape of possible utility outcomes. Once the utility of an outcome becomes large enough, that outcome will dominate the calculations. At this point, any further increase in the utility value is indistinguishable from a behavioral standpoint. So, given a utility landscape with +INF as a possibility, there exist other utility landscapes with really big finite rewards that generate the exact same behavior (unless the probabilities themselves are allowed to be infinitesimal). The +INF symbol matters through its implications for the behavior of the algorithm but beyond that, it's just a symbol, so it's not obvious we should take it to literally mean infinity in our own value assessments.

A second perspective is hedonic. It's unclear whether an actually conscious hedonic mind could have an input so simple as a numerical representation like this. The human brain does appear to trigger pleasure with relatively rudimentary stimulation (e.g., certain drugs leading to dopamine release), but the actual pleasure process may be more complex, involving additional "glossing" procedures to be performed. Presumably the amount of pleasure is somehow proportional to the amount of this glossing action. We might think of the glossing operations as "hedonic work": The brain needs to do a lot of things to generate the pleasure feeling, and a simple number encoding this is not enough. To generate lots of pleasure, it seems you'd need lots of computational activity to take place. +INF isn't enough. I should add that this is all very speculative, and I welcome corrections based on actual findings in neuroscience.

Similarly, static mathematical representations of an appropriate computation are not the same as actually performing the computation. Of course, the universe might ultimately be a timeless mathematical structure in which temporal operations are embedded, but this is different from Plato merely imagining that there exist computations that correspond to conscious emotion, which doesn't count as conscious emotion (except insofar as it physically induces conscious emotions in Plato's own brain).

What is a computation?

A more thorough discussion of the ideas in this section can be found in "How to Interpret a Physical System as a Mind".

In The Rediscovery of Mind, John Searle claimed:

the wall behind my back is right now implementing the Wordstar program, because there is some pattern of molecule movements that is isomorphic with the formal structure of Wordstar. But if the wall is implementing Wordstar, if it is a big enough wall it is implementing any program, including any program implemented in the brain.

In Good and Real, pp. 54-55, Gary Drescher terms these "joke interpretations" of consciousness. As a trivial example, if our computer produces the output "11," does this mean eleven (in base 10) or two (in binary)? Or -1 in 2-bit two's complement? Or, for that matter, -187.2634 in some bizzare mapping scheme? Mark Bishop gives another simple example (p. 7) in which the same input-output pairs can represent either the AND function or the OR function depending on the interpretation. Similarly, if a brain performs some operation, does this correspond to happiness or suffering? There may be some contorted interpretation of symbols on which what appears to be agony is actually bliss. This is called the triviality argument in functionalism. The idea that physical, formal, or symbolic manipulations don't carry intrinsic semantic content was a main point behind Searle's Chinese room and has relevance in domains other than philosophy of mind, such as Haskell Curry's philosophy of formalist mathematics.

In "The Rise and Fall of Computational Functionalism," Oron Shagrir reviews what he calls the "Realization Problem." In particular, Shagrir cites the proof in Hilary Putnam's, "Representation and Reality" "that every ordinary open system is a realization of every abstract finite automaton." Shagrir continues:

Differently put, if a functional organization of a certain complexity is sufficient for having a mind, as the functionalist claims, then the rock too should be deemed to have a mind. In fact, almost everything, given that it realizes this automaton, has a mind. Moreover, if Putnam's theorem is true, then my brain simultaneously implements infinitely many different functional organizations, each constituting a different mind. It thus seems that I should simultaneously be endowed with all possible minds!

This is worth contemplating. However, it's nothing mysterious. Physics is physics—particles move around. It's up to us how we want to respond to particular sorts of particle movements. One proposal I've heard is to care about all physical operations with a different measure—that is, in proportion to how "naturally" they can be interpreted as the algorithm we care about. This measure is not physical like the measure proposed by the many-worlds interpretation (MWI) of quantum mechanics; rather, it's just a "degree of resemblance." Since "consciousness" simply refers to a cluster of physical operations within thing-space, we can decide to use a fuzzy logic-style approach in saying that some things are closer to what we mean by "conscious" than others.

When interpreting a physical process as a computation, should we only use the simplest interpretation, or should we use all of them? In theory I would care about all interpretations inversely weighted by their complexity, but in practice, this would probably be dominated by the simplest interpretation. In an analogous way, the algorithmic probability of a string is theoretically a sum over all possible programs that might have produced it, but in practice, this sum is dominated by the shortest program (see "Discrete Universal A Priori Probability").

How do we define a "simple" interpretation? I haven't thought about this question in depth, because typically if an interpretation is strained, you know it when you see it. Of course, it's possible that what we think of as "simple" might reflect the idiosyncrasies of our own environmental circumstances. Maybe one proposal for assessing the reasonableness of an interpretation could be to compute

P(interpretation | algorithm) ∝ P(algorithm | interpretation) P(interpretation).

Here, P(algorithm | interpretation) would say how easy it would be to write the algorithm if you just told someone the interpretation, i.e., the "gist" of what was going on in it. And P(interpretation) would reflect how common that interpretation is in the multiverse. Interpretations are labeled clusters in algorithm-space, so this equation is essentially saying that the probability of a cluster given an algorithm is proportional to the fraction of algorithms of that type within the cluster times the total number of algorithms in the cluster.

In addition to deciding whether or not a computation is "conscious," a similar point applies for determining what kind of emotion a system exhibits. There is some contorted mapping of symbols according to which what appears to be suffering is actually bliss, but especially for non-trivial computational processes, one of the interpretations is clearly more natural than the other. For instance, suppose you interpret a video-game character that employs sophisticated cognitive algorithms either as suffering when it gets shot or as happy when it gets shot. But if it were happy to be shot, why would it run away? You could make up some interpretation, such as that running away is the character's way of expressing its delight. Or perhaps the character wants to savor each bullet and so space them out to enjoy each one to its fullest. And the expressions of agony on its face are its alien way of showing elation. This interpretation can work, but it's much more strained than the simple interpretation that the character actually doesn't like being shot. The bullet-savoring interpretation would have a low P(interpretation) because creatures that run away from rewards to savor them are rare in the multiverse compared with creatures that try to escape painful stimuli. Also, the video-game character doesn't have any code that would make it compute thoughts like, "I want to run away to savor the bullets." It just has code that makes it run away and that reduces its reward level when it gets injured. Finally, even if the character were running away to savor bullets, the fact remains that by shooting it, you would be doing something it doesn't want to happen; it doesn't want a bullet now because it's trying to savor them for later. (Thanks to a conversation with Saulius Šimčikas for inspiring this and the previous two paragraphs.)

The preceding account might seem weird. "How can it be a matter of interpretation whether something is happy?!" But remember that "happiness" is inherently a complex, fuzzy concept, just like "freedom", "justice", or "consciousness". "Happiness" doesn't refer to any singular, uniform property that somehow imbues certain systems. Rather, different systems just behave differently, and we have to choose which ones we think count as happy to what degrees. (Of course, self-reports by sufficiently intelligent systems can play an important role in such assessments.) The exact processes happening in my brain when I'm happy are not identical to those happening in your brain when you're happy, so an identity approach to defining happiness can't work. We need to get a little messy by considering a general class of algorithms, and at that point, assessing happiness requires fuzzy categorization. Even for relatively simple systems, we may be able to say that certain processes look plausibly more like happiness than suffering—not the full-blown, deep happiness and suffering that we know from our own experiences (since simple systems lack most of the neural machinery to implement such detailed textures), but still crude forms that we may consider marginally valuable and disvaluable.

The infinite monkey theorem implies that, for example, your brain is encoded somewhere in the digits of pi (infinitely often, in fact). If we can interpret digits as corresponding to your brain via some encoding, does pi contain infinitely many moments of your experiencing bliss and agony? In a sense, yes, but remember that pi does not exist "out there" in the world. Of course, some computers print millions of digits of pi, but there probably don't exist physical instantiations of enough digits of pi to contain an encoding of your brain according to some simple, natural coding scheme. But even if there were a googolplex digits of pi written out somewhere, it would be up to us to decide how much we cared about the brains implicitly encoded thereby. After all, they wouldn't be moving dynamically in time, and it's not clear how much an instantaneous snapshot of a brain matters. In "A Computer Scientist’s View of Life, the Universe, and Everything", Jürgen Schmidhuber imagines massive bitstrings produced by the Great Programmer, some subsets of which can be interpreted as life.

There's further literature on the topic of discerning what counts as a computation, e.g., David Chalmers, "On implementing a computation," and the "Similar books and articles" at the bottom of that page.

"Interpreting a physical process as being a particular computation" doesn't have to be fundamental to deciding what we care about. An interpretation is just a way that we conceptualize and talk about a process and how we feel about it. It's analogous to using words that describe some underlying physical system. Words and interpretations can help us make sense of and organize our thoughts and feelings, but they're not required. We could, for instance, just intuitively compare a given physical process X to a physical process Y that happens when we suffer and to a physical process Z that happens when we feel joy, and then if the neural networks in our brains decide that similarity(X,Y) is much greater than similarity(X,Z), we count the process as suffering. Or we could even just respond to seeing some instance of a physical process by directly valuing or disvaluing it. We are fundamentally functions that map input stimuli to output responses, including assessments of how much we like or dislike things. Using a framework of interpreting computations as being instances of higher-level concepts like suffering or aversion or goal-seeking can help us make what we feel are more sophisticated and enlightened assessments (rather than, say, seeing a fly eating a rotting animal, finding it disgusting, and deciding that it's bad), but there's nothing magical going on when we do this. Words and interpretations are just clusters in idea-space that we make up. They can influence how we think about things (the Sapir-Whorf hypothesis is not completely false), but our views and actions are also influenced by non-verbal, non-conceptual features of the world.

Functionalism is for ethics, not ontology

Contemplating the possibility of many interpretations for the same physical process can be confusing. One might think: "I can feel that I'm myself. Sure, maybe my brain can be interpreted as executing some different mind on some contorted mapping scheme, but I know subjectively that I'm experiencing my experiences, not that one. How can it be that there are other interpretations? This is the right one!"

In fact, this intuition is largely correct. You are what you are, which is physical atoms moving around. At bottom, all that's happening is that physics is playing out. Talking about "whose mind you are" is an artificial discussion, an abstraction that we impose on the underlying reality in a similar way as humans draw country boundaries over maps of the Earth, even though all that's really there is the Earth itself. Who are you? You are the atoms that are moving in that spot. This is perfectly clear. Everything else we say is abstraction.

Some abstractions are more useful than others. For instance, suppose I want to predict where you'll go for lunch. I would achieve better prediction by modeling the clump of atoms that is you with a model that includes a preference for Val's Vegan House than if my model contains a preference for Dan's Dogfood Store. Sure, there are contorted interpretations of your brain under which it's implementing the algorithms of your dog's mind, but these are less helpful for accurate predictions. This is one sense in which some interpretations of minds are more "natural" and have higher "measure" than others: using them makes better predictions.

Another use of abstraction is to define your "self-identity." There are interpretations of your brain on which you are running George Washington's thoughts and feelings, but presumably you'd rather create your self-identity based on a more common-sense view of who you are.

A final use of abstraction is for ethics: Do I care about this thing, and if so, how much? If your ethical concern for something is at least partly based on the algorithms it implements, then you get into the business of deciding how much a given physical process instantiates the algorithm you care about. You have to decide on interpretational weights, though in practice it's often pretty obvious which algorithm is the dominant interpretation for any nontrivial system.

Thus, functionalism—seeing mental algorithms in physical operations—is useful for abstract things, like prediction and deciding how much we care. It doesn't refer to something "out there" in base-level reality.

"But," one might say, "it feels like something to be me. Does it also feel like something to be my wall? Either it does or it doesn't. I'd really like to know whether my wall is a subject of conscious experience!" This question is confused. Your wall is what it is—a bunch of atoms. Whether there's something it "feels like" to be a wall is an abstract question, in the realm of representations and models rather than base reality. My answer is that there are only extremely weak interpretations on which a wall does higher-level mental operations, so it basically doesn't feel like anything to be a wall (though this answer is definitely subject to revision upon further discoveries in science). You can choose a different answer if you want.

"But I'm certain it feels like something to be me," one might reply. "How could it be a matter of interpretation?" Well, look at that wooden thing in your kitchen that has legs and is supporting plates of food. I'm certain that's a table. How could I be wrong? How could it be a matter of interpretation? Yet, "table" is not an ontological primitive of the universe; it's an abstraction that we impose on underlying reality. So too with the feeling of what it's like to be conscious.

Note that when I say "the feeling of what it's like is an abstraction," this doesn't just mean a verbal, intellectual abstraction. More fundamentally, it's an abstraction that our brains generate even without our trying. Our brains contain representations of themselves—their own thoughts and emotions and experiences. It seems so fundamental to think about the raw feeling of being conscious because our brains are constantly generating that abstraction for us. Similarly, it feels obvious that the image we see on our screen is a face, even though it's actually just a bunch of pixels, which are actually just a bunch of photons emitted by our computer screen. The image, and our feeling of what emotions are like, are unified experiences, because one of the functions of the conscious brain is to unify disparate features into larger, conceptual, holistic representations. Our brains are abstraction generators.

It's worth noting that functionalism in the philosophy of mind shares superficial resemblance to structuralism in the philosophy of science: Both views assert that relations are what matter, not the relata. However, functionalism is not identical with structuralism, because (ontic) structural realism is a statement about ontology and hence must refer to fundamental physics, not to high-level processes like consciousness. Functionalism is, in my view, an ethical/conceptual statement about how we want to think about higher-level processes that emerge from fundamental physics. As an example of the distinction, functionalism suggests that whether a computer program is implemented using a 32-bit or 64-bit computer shouldn't affect whether we call it conscious, as long as it performs the same functional operations. But we can see that the physical operations between the two cases differ, and hence the underlying ontic structures moving around differ.

Counterfactual robustness

How important is the counterfactual robustness of the particle movements constituting a mind? For instance, suppose some dust particles, by sheer chance, were blown in such a configuration that they mimicked the neuronal firings of a conscious brain for a fleeting second? We might say that because the particles did the right moves, they count as conscious. On the other hand, we might feel that the brain they constituted wasn't "real" because if, counterfactually, the brain had been put in even slightly different conditions, its brain-like behavior would have fallen through.

David Chalmers and other philosophers believe that counterfactuals matter, even if those branches of possibility are never executed. Typically this is a move to avoid joke-interpreting any random sequence of occurrences as being isomorphic to a particular sequence of executed brain states in a conscious mind. I think the same penalty to joke interpretations can be applied using the weighting described in the previous section, in which interpretations are weighted by how simply and naturally they map on to the physical process, although counterfactuals may certainly play a role in this judgment call.

I personally think Boltzmann brains matter ethically. The good news is that these are probably extremely rare relative to normal minds.

A modified Turing test for consciousness

Whether we want to call a mind "conscious" is fundamentally an issue for our hearts to decide. That said, there are criteria we can develop to help us along, especially when our intuitions are lacking.

One place from which to draw inspiration can be assessments of intelligence, which is different from though related to consciousness. The Turing test is a standard proposal for determining whether a computer has human-level intelligence. In this test, a human and computer both communicate with an interrogator, and the computer passes the test if the interrogator can't tell which responses are from the human and which are from the computer. Talking remotely via text chat can shield off irrelevant features like physical appearance.

As a criterion for assessing consciousness, the Turing test fails on the example of giant lookup tables, which could pretend to be a human perfectly without doing any meaningful computation besides retrieving stored answers to the questions. Intuitively, what makes a giant lookup table not a good "model" for a human brain is that it's vastly overfit. The model has enormous complexity due to having all the answers built in, whereas good models usually have simple structure and let computation produce the output.

Plausible scientific models tend to have low "message length," i.e., there's a small number of bits required to describe the model itself plus the errors between the model's predictions and the actual data. A giant lookup table has immense message length because it encodes all possible answers for all possible questions, without a predictive computational algorithm. As a scientific description of how the brain works, a giant lookup table is a terrible hypothesis, even though it can fit the observed data very closely.

A similar idea can apply for our feelings about consciousness. Intuitively, the digital brain most likely conscious is the one that is the best scientific hypothesis about how animal brains actually work. In the Turing test itself, these assessments are done only via communication interactions. An extended Turing test could potentially also consider anatomical similarities, but it's arguable that behavioral and personality features of a mind are the most important parts morally, whereas the anatomy of the brain may be mostly irrelevant. In that case, a Turing test for consciousness based on having low message length with respect to an actual brain based (only) on psychological (rather than neurological) data could be a decent test. Of course, this only works for existing animals (especially existing humans with language), but it gives us a place to start. Once we have a more abstract understanding of the brain's operations derived through this kind of scientific approach, we can generalize and decide how much we want to call other things conscious when they possess some but not all of these characteristics.

I'm not proposing the modified Turing test as an absolute standard, but it can be a good heuristic to shape our intuitions: A brain that is a better scientific model of actual brains is more likely conscious. Clearly, this is a sufficiency test only: If a computer passes this test, then it's likely conscious, but it's not the case that if it's conscious, then it has to pass this test.

Campbell's law in consciousness tests

Previously I proposed that the mirror test might be a roughly sufficient condition (though probably not a necessary one) for suggesting consciousness in present-day animals. Even if this is true, the test is not a sufficient condition for consciousness in general minds, because it would be fairly easy to build a robot that passes the mirror test just by knowing the rules of the game. Merely set the robot to look in the mirror, run a classifier to see if the image matches a known model of its "face," and if so, output a robotic statement: "I see myself." Or perhaps even more simply, the robot could execute particular movements and check whether it saw those same movements in the mirror. I don't think such a robot is conscious.

What's going on here is something like Campbell's law: As soon as we have a simple, concrete test for consciousness, that test can be trivially passed by a non-conscious computer designed to game the system.

As another example, some authors point to social awareness of one's place in a group as an important indicator and possible originator of consciousness, but again, it would be easy to implement naive versions of this—not only in robotic form but even in a simple Python program. A really trivial version could involve a series of objects checking equality between themselves and other objects. If they used a comparison function, they could be seen as implementing "an understanding of social hierarchy." Of course, this is a far cry from what we actually mean by social consciousness, but it underscores the infectiousness of Campbell's law in our assessments of what is and isn't conscious by some criterion.

I think the modified Turing test proposed above is pretty robust even against Campbell's law, but this is partly due to its complexity; it's not an easy thing to verify, because it requires building a scientific model of the brain over a large number of speaking interactions. And even this test may have some (less obvious) blind spots for use as a metric. Maybe it overlooks a few crucial components of what we want to consider to be a conscious mind.

The usual solution to Campbell's law is to employ multiple metrics that attack the problem from different angles. We can do this for consciousness by examining lots of components of an organism's brain and behavior: Social awareness, mirror test, reactions to injury, display of emotion-like responses to situations, presence of a "personality" and other folk-psychology traits, etc. On pp. 13-14 of "Aspects of the biology and welfare of animals used for experimental and other scientific purposes," Conclusions 1 and 2 list some of the many measures that scientists use to assess sentience in animals. Maybe this list contains some unnecessary features and omits other important ones, but these kinds of characteristics can serve as rudimentary metrics for purposes of overcoming Campbell's law. If an artificial mind scores high on all of these traits, then I am more inclined to call it conscious and sentient.

I used to dismiss the suggestion that "If we just add enough parts to our AIs, eventually they'll become conscious." I thought this was a form of superficial pattern matching, like the idea that if you tape feathers and a beak onto a box, it'll turn into a bird. But now I see a sophisticated sort of wisdom in the suggestion. Obviously, there are many algorithms that have nothing to do with consciousness. There are many algorithms that may look conscious from the outside but aren't conscious on the inside. That said, as we've seen above with Campbell's law and based on what I can tell from neuroscience, there may not be a single, simple operation that instantiates consciousness, while everything else does not. Rather, consciousness may result from a collection of many components working together. Consciousness may be like a symphony orchestra rather than a single instrument. There are many things that are not orchestras, but it's also true that there's a lot of flexibility in how you construct an orchestra before it stops being one. You can mix and match many instruments in many ways.

So there's probably not an "easy" way to decide whether a digital brain is conscious. We need to do the hard work of constructing measures that seem important to us and compiling enough of them that we don't feel as though we've left out too many important components. This also suggests that whether we regard something as conscious needn't be binary: If something has many but not all of the traits, would we consider it partially conscious? On the other hand, the distinction we see in ourselves between being conscious (e.g., you right now) or not (e.g., you during general anaesthesia) seems to be pretty binary. I'm hesitant to say that someone under general anaesthesia is still "partly conscious" because she still exhibits many important components of consciousness in that state. But is this seeming binaryness an illusion of the fact that the parts of pain processing that get blocked off are just the highest levels that could verbalize and remember the experience, while the lower levels are still in intense agony?

Unfortunately, when it comes to other animals and especially artificial minds, we probably can't use whatever narrow brain measures discriminate consciousness in humans. We can see an example of this in the case of vision: "fish, lobsters and octopi all have vision, Elwood said, despite lacking a visual cortex, which allows humans to see" (from "Lobsters and crabs feel pain, study shows" by Jennifer Viegas). D. M. Broom, "Cognitive ability and sentience: Which aquatic animals should be protected?" (p. 100):

Although some mammals have high level analysis functions in the cerebral cortex, a comparable high level of analysis occurs in areas of the striatum in birds and in a variety of brain regions in fish and cephalopods.

For arbitrary minds, we need to generalize and rely on multiple metrics, and this does seem to open the possibility of shades of gray in consciousness.

A word of caution with this line of reasoning: It can be tempting to identify lots of measures that correlate strongly with consciousness in animals that we know today. Optimizing for these things may be relatively safe within existing biology. However, unless these measures represent things that matter intrinsically to us, then the waters become treacherous when we apply these measures to arbitrary minds. While there is probably not a single, simple operation that entirely encapsulates conscious emotion, it's probably also true that many cognitive operations are not fundamental to consciousness.

An example of a measure that correlates strongly with consciousness is cognitive intelligence. Certainly some intelligence is required for conscious emotion in order to have a basic awareness of how things feel to you, but you don't necessarily need to be able to solve puzzles, remember things for a long time, be able to make rational decisions, etc. In animals, adaptive/flexible behavior is correlated with consciousness because enabling complicated decision-making is one of the purposes of consciousness, but this needn't mean that for arbitrary artificial minds the relation still needs to obtain. At the same time that we need to avoid the illusion that consciousness can be defined by a particular narrow algorithm, we should also eschew the fallacy that any cognitive process that reminds us of something done by humans is an important part of consciousness.

Graded sentience and value pluralism

We've seen that consciousness, even in the non-confused sense of "computations I care about," is not binary or precise. There's not a single point where "the lights turn on," while before all was darkness. There are somewhat concrete boundaries delineating, say, whether an organism can verbalize its emotional states, but these boundaries cannot be a fully general definition of what we mean by consciousness because of joke instances of physical processes that meet these criteria. (In the case of an organism verbalizing its emotional states, for instance, consider a tape recorder that says, "I'm happy" when it plays cheerful music and "I'm sad" when it plays plaintive music.)

So we need multiple metrics to capture what we value, and in fact, each of these aspects may be its own source of value. If you take a human brain and remove a single capacity—say that of language—you don't thereby remove all the moral value from the mind. Yet we could keep stripping off one seemingly unimportant capacity after another, until eventually we arrived at something almost totally unimportant, like a rock. Assuming there isn't a critical step at which the "lights go out" morally (and there doesn't seem to be based on what I can tell), then the high value we place on the human brain must be a combination of many smaller components of value. They may interact together; we don't need to value each piece independently of the others. But it seems we do need many loci for the value of consciousness, each contributing to a graded nature of the moral importance of a given computation.

We can think of these as, in Carl Shulman's words, "bells and whistles" that are added to the barebones algorithms (reinforcement learning, network broadcasting of signals, feedback loops, etc.) that make these algorithms more vividly compelling as being another mind that we feel deserves compassion. As we strip away bells and whistles from a human, the value declines, but it doesn't drop to zero. Thus, questions like whether insects are conscious may ultimately not be "yes" or "no" but something like, "they have some of the traits that we consider morally important, though maybe in reduced degree, so we might value them somewhat less than humans." In some sense, this is what speciesists have alleged all along: That animals are less important because they have slightly fewer bells and whistles than humans. I would respond by noting that

  1. The drop-off in my own feelings of moral value for minds is a lot less steep than speciesists claim—e.g., just making up numbers, maybe a dog is within a factor of 2 of a human, a chicken within a factor of 4, and an insect within a factor of 150.
  2. The discrimination is not arbitrary based on species because it's based on the organism's brain, not the organism's species per se. So, for example, the actual intrinsic value of one human may differ from that of another human depending on depth of emotional experience, etc. That said, we (probably rightly) refrain from making these pronouncements because (a) the differences are likely very small (maybe less than 5-10%??), (b) we can't easily assess them anyway, and (c) making these sorts of comparisons would create a degrading cultural experience, especially for those deemed "less valuable." Even with respect to animals, we might wish to pretend that their worth is equal with that of humans if only to counterbalance the tendency for most of society to give (non-companion) animals basically no moral worth whatsoever.

Daniel Dennett explained well this idea that sentience should be seen in gradations:

It is a big mistake to think of consciousness as all-or-nothing. And I don't think that our everyday understanding of consciousness bears that kind of weight. [...] The question that people like Nagel and Searle are obliged to take seriously I don't. They want to know where in the great chain of being does consciousness start? Are clams conscious? Are fish conscious? Are vertebrates conscious? Are octopuses conscious? And I think that those are just ill-formed questions. Let's talk about what they can do in each case and what their motivational systems are, what emotional possibilities they have. And as we sort that all out, what doesn't happen, I don't think, is that we see emerging from the gradual fog a sort of sharp line at any point.

And from his Consciousness Explained (p. 447):

we have worked hard to escape[...] the assumption that consciousness is a special all-or-nothing property that sunders the universe into two vastly different categories: the things that have it [...] and the things that lack it. Even in our own case, we cannot draw the line separating our conscious mental states from our unconscious mental states. The theory of consciousness we have sketched allows for many variations of functional architecture [...].

Far from placing humans at the pinnacle of value, this "graded sentience" approach may actually humble us, depending on how far we decide to extend the moral value we see in non-human algorithms. If we take a highly parochial view, we can say that a brain needs to have all or most of the standard human brain algorithms operating together in a fashion very similar to what we see in the human brain, and otherwise it doesn't count as valuable. A more cosmopolitan view might see similarities to human-type algorithms in other places—most notably animal brains and human-like computer emulations, but maybe even to a diminished degree in more abstract algorithms (e.g., for reinforcement learning, network communication, or other processes that resemble parts of what make human brains "experience conscious emotion"). The parochial and cosmopolitan views are not binary but represent directions on a continuum of how much weight is placed on things that are more and more abstracted from human brains.

We should remember that more cosmopolitan is not always better. I don't want to start worrying about river currents in a stream, unless I'm shown specific ways in which the water-molecule movements involved resemble something I know I care about. However, I might feel at least a little concern about, say, vast numbers of reinforcement-learning algorithms that resemble suffering, even if they're run very abstractly. This degree of concern is low but nonzero. And I might later change my mind and decide that reinforcement learning without other accompanying bells and whistles doesn't matter to me at all. Remember, we only have a fixed caring budget, so caring more about abstract algorithms means caring less about, say, real animals being eaten alive. We have to take seriously the tradeoff between our concern for various consciousness operations and not just go for a feel-good attitude like, "Let's care about everything!"

It's also worth remembering that, with our current state of knowledge, we don't always know what kinds of algorithms/abilities are even present in various kinds of organisms, nor to what degree apparent bells and whistles are actually key to the architecture of the system. So our assessments of sentience are not purely subjective but depend upon empirical knowledge.

Consciousness disagreements are differences of emphasis

People differ widely in their assessments of which minds are conscious. It seems everyone has his or her own pet theory of consciousness and what this implies about which entities have moral significance. People then debate which theory of consciousness is the right one.

This framing of the debate is misguided. There are many attributes and abilities of a mind that one can consider important, and arguments about whether a given mind is conscious reflect different priorities among those in the discussion about which kinds of mental functions matter most. "Consciousness" is not one single thing; it's a word used in many ways by many people, and what's actually at issue is the question of which traits matter more than which other traits. As Marvin Minsky said:

the idea that there's a central "I" who has the experience I think is a typical case of taking a common-sense concept and not realizing that it has no good technical counterpart, but it has 20 or 30 different meanings, and you keep switching from one to the other without knowing it, so it all seems like one thing.

The situation resembles debates over morality, where different sides claim that they have the proper definition of "morality", when in fact, each side is just encouraging greater emphasis on the moral values that it considers important.

Aaron Sloman and Ron Chrisley echo this point:

‘consciousness’ is a cluster concept, in that it refers to a collection of loosely-related and ill-defined phenomena. [...] A cluster concept is one that is used in an indeterminate way to refer to an ill-defined subset of a collection of features or abilities. Some combinations of features are treated as definitely justifying application of the concept, and others definitely fail to justify it, whereas for some intermediate combinations the question whether something is or is not an instance is indeterminate. People may disagree, and the same person may find conflicting reasons for and against applying the concept in those cases. Thus, most people will agree that humans have emotions and that viruses do not, but may disagree as to whether insects, or fish, or unborn human infants do. This is not an empirical disagreement, for they cannot agree on what sorts of new evidence would settle the question.

Following is a very incomplete list of features that different people may consider crucial for consciousness. I've heard most of these defended by someone or other.

People like Eliezer Yudkowsky who insist that, say, frogs are not conscious because they (allegedly) lack high-level reflective self-models are correct that frog experiences are not like ours because frogs lack some abilities that we have. But it's a mistake to conclude that minds with more restricted mental abilities relevant to processing pain don't matter at all.

I expect that the view of sentience as binary will be challenged as artificial intelligence progresses and we're able to see more and more different ways that minds can be built, with different abilities, tendencies, strengths, and weaknesses. (Note that advancing artificial intelligence and especially artificial sentience faster is likely to increase suffering, so this prediction should not be interpreted as a prescription.) It will become more apparent that there's a large multidimensional space of mind characteristics that are not sharply divided between conscious and unconscious but are merely different from one another. As an analogy, someone raised entirely in a Christian community might think of religion along a single dimension: Either you're a Christian or you're an atheist. But in fact there's a far wider array of religions that share some traits with Christianity and don't share others.

Why bells and whistles matter

Bells and whistles are fancy additions to an algorithmic process that make it more complex but not fundamentally different from a simpler outline of the same idea. Why should these be morally relevant? Might there not be some fundamental process involved with consciousness that matters regardless of its complexity? Ultimately our answer to this question is a matter of intuition, but let me give two simple examples to pump the intuition that complexity is plausibly relevant.

"Self-reflective" decision maker

Maybe the crucial feature of consciousness and agent moral worth is self-reflection. After all, what I introspectively seem to care about is my experience of observing my own emotions. If I don't notice my emotions, they don't seem to matter to me. What if we designated self-reflective awareness of one's emotional state as the key deciding criterion for moral value?

The problem is that, like with any other simple definition for moral worth, we can fairly easily come up with trivial versions of processes that display this property. For example, suppose we have an object in a programming language. During execution of our code, we reach the following statement, where "self" refers to the current instance of the object:

if(self.mood == "happy") { self.actionToTake = "smile"; }
else if (self.mood == "sad") { self.actionToTake = "cry"; }

Our object is engaged in "self reflection" on its own feelings as a way to decide how it should behave. Yet, it seems clear that this operation doesn't have the same moral importance as a human who feels happy or sad after reflecting on internal state variables. The difference lies in all the other additional things going on in the human that, together, create a unified whole that we feel is more ethically significant.

Of course, it's just my intuition that a human's self-reflection is more valuable than what this code snippet would do; others are welcome to disagree. If you do insist on a very simple criterion for moral worth, make sure to look out for the many, many instances in physics where operations of this type can be seen to be happening. The self.mood naming is suggestive to our human linguistic centers, evoking far more intuitive structure than is actually being demonstrated in this process. All that's actually happening here is a decision between two outcomes based on some property of the thing. But we can see that all the time in even inanimate contexts, e.g.:

"Self-modeling" agent

Maybe self-reflection is too simple a property to delineate consciousness. Maybe you have to not just observe a property of yourself but actually simulate possible outcomes for actions you might take. Perhaps this is a more sophisticated property that captures why conscious beings matter morally?

Consider a Q-learning agent that has trained its Q(s,a) values as a function of states s and possible actions a. To pick the next action, suppose it's really simple and just chooses the a that maximizes Q(s,a) given its current state (say, state number 3):

currState = 3;
double bestQ = -1;
int bestAction = -1;
for(int action = 0; action < possibleActions.Length; action++)
{
     double curQ = Q[currState, action];
     if(curQ > bestQ)
     {
          bestQ = curQ;
          bestAction = action;
     }
}
takeAction(bestAction);

So far, we might call this a "dumb" agent that's just "unconsciously" choosing an action based on learned value estimates. Instead, conscious agents run self-simulations to predict what would happen if they made a choice. This is what consciousness is, right? Well, consider this small modification of the above:

currState = 3;
double bestQ = -1;
int bestAction = -1;
for(int action = 0; action < possibleActions.Length; action++)
{
    double curQ = ImagineTakingAction(currState, action, Q);
    if(curQ > bestQ)
    {
        bestQ = curQ;
        bestAction = action;
    }
}
takeAction(bestAction);

// ... skip some lines ...
public double ImagineTakingAction(int state, int action, double[,] Qvalues) {     int world = state;     int myAction = action;     double predictedOutcome = Qvalues[world, myAction];     return predictedOutcome; }

Of course, the ImagineTakingAction function is totally trivial. It just uses linguistic disguises to make more suggestive the operation of returning the Q(s,a) value for the current action being evaluated. Is there really a difference between the first and the second code snippet? Probably not a big one. What is meant by "imagining what it would be like to do something" is itself a fuzzy class of operations, and like with sentience in general, we'd probably be more inclined to consider this process morally relevant the more complex was the agent doing it.

"Small network argument"

The 2007 paper "Consciousness & the small network argument" demonstrates that many scientific theories of consciousness allow for building a tiny conscious brain—typically with fewer than 10 neurons. The authors take this as absurd and suggest that consciousness must involve more than simple principles. The paper speaks about consciousness as a binary and objective property, which betrays implicit dualism, but we can just as easily reframe the paper's point as suggesting that current theories of consciousness allow for building tiny brains that don't have a high degree of what we think of as consciousness, just like I showed with my two previous examples.

Psychological account

As a descriptive account of why humans care about bells and whistles, one reason may be that we can best empathize with minds that are as complex as ours. When I just look at the above code snippets as a programmer would, they seem completely dull and ordinary. If I imagined myself as the object carrying out those operations, I can evoke some actual sympathy for the agents. But this is almost entirely illegitimate sympathy due to smuggling in all of my psychology through anthropomorphism. It includes picturing myself as the agent in some computational environment, contemplating these various actions with an internal monologue about my hopes and dreams for how things might turn out. Almost all of this is extra complexity that my mind can't help imposing on the situation. This in some sense reinforces the point that humans care about complexity.

Of course, it remains up for grabs what the shape is of our caring-about function with respect to complexity. Maybe it drops off at a certain point? Maybe once an agent is sufficiently complex, we start treating it equally with other agents that might be slightly more complex? These are tricky issues for our hearts to sort out. In any event, this discussion demonstrates that even for those of us who only care about conscious emotions, what we value is delicate to specify and probably can't be captured by one or two basic properties.

What motivates concern for abstract computations?

Most human motivations are self-directed—concerned with personal maintainence, security, and wellbeing. There are also sometimes motivations for other-directed concern:

Consistent with multiple motivations for other-directed concern is a finding that humans have at least two different empathy systems: One emotional and one cognitive. The emotional system may be more related to mis-firing of mirror neurons.

Regardless of the origins, it seems that altruism for the powerless is a spandrel. Caring only about your family, your tribe, and your trading partners seems close to optimal from a survival standpoint; if you also care about diseased minnows and people dying of malaria on the other side of the planet, that takes away resources that you could be using for self-advancement. It's not implausible that the spandrel of pure empathy will vanish with time, perhaps when artificial minds emerge that can remove their spandrel empathy while preserving and strengthening their ability to make binding commitments to help trading partners.

It's interesting to reflect on what motivates the altruism as discussed in this essay. Why would a primate think about the similarities between what happens in its own brain and what could be programmed into computer software and decide that it also feels concern for the computer software? Which of the empathy mechanisms are at play here? It could be a combination of several of the above. For instance, the process is partly about evaluating a "like me" relation: How similar is this other mind to myself? This might evoke concern along the lines of kin altruism or reciprocal altruism with trading partners. There's also an element of "imagining" myself as the other organism, which results in my feeling shadows of that organism's own emotions, along the lines of misfiring mirror neurons. And then there's some systematizing and elegance seeking going on as well. It just doesn't seem right for an ethical stance to gerrymander certain mental systems for substantially different treatment without a corresponding large difference in the traits of those mental systems.

Naively, many people assume that if brains were just computers and emotions were just information processing, then we would no longer care about them; after all, who cares about a computer? But I don't know anyone who stopped enjoying sunshine or exercise due to learning cognitive science. When we begin to internalize how computations in the brain work and draw the analogies to artificial algorithms, we not only continue to care about ourselves but extend that concern to a broader class of physical operations, about whom we can say, "that's kind of like me."

"Consciousness" detectors

Our brains seem to have a number of neural networks that transform input stimuli to output "scores"—e.g., How much does that object look like a face?, How tasty is that food?, How attractive is that potential mate?, How much does that thing in the grass look like a snake? For many concepts, we make "likeness" judgments; for instance, how much does this rock that people place their food on count as being a table? Our judgments about whether an entity is conscious and deserving of moral consideration are another instance of the same kind of process.

First we perceive raw input features of an entity: its color, shape, texture, arrangement, and so on. These feed into various layers of visual processing. We also hear its cries and giggles and maybe feel its skin. The sensory inputs are aggregated into higher and higher layers of our neural networks, transforming them from raw features to more advanced concepts. The features of the current entity merge with our pre-existing knowledge about what these entities are like and our cognitive understanding of what kinds of algorithms they run, what society's attitudes are toward them, and so on. At higher layers of this deep network, perhaps (purely speculatively) we have neural signals representing things like "has face," "is laughing," "wants to pick up its toy," "is a member of my tribe," and so on. The final level of the caring-about network could be, in this stylized depiction, a linear model on top of these high-level input nodes. For instance:

consciousness_score = 2 * is_human + 0.7 * has_eyes + 0.5 * can_scream + 1.5 * has_desires + 1 * is_intelligent + 0.3 * does_reinforcement_learning + ...

(I'm not defending this particular choice of weights—just portraying it as a possible set of weights that a human might have.) What we've been discussing in this essay is which inputs to this model are relevant and what their weights should be.

Remembering why it matters

These discussions of mind algorithms can feel abstract and distant. We might be tempted to make our assessments of moral importance purely on aesthetic grounds, or even not to feel much motivation at all concerning the issue of what others experience. It's important not to let this happen, or at least to complement these periods of abstract thought with more viscerally compelling reminders of how bad suffering can be.

The reason to explore these abstractions is not (only) because it's fun to muse about them. Rather, it's because there might be minds suffering in terrible ways that we haven't previously realized were suffering. Systematizing our compassion matters for those we would otherwise have overlooked. We just have to make sure we don't become cold in the process.

A tolerance for ambiguity

Back in 2005-2006, I saw ethics as relatively straightforward. Happiness is good, suffering is bad, and you maximize the sum total. (This was before I put significantly higher priority on reducing suffering compared against creating happiness.) Sure, people would object that you can't measure happiness and suffering, but I figured this was mainly a limitation of our instruments and that we could get pretty close. When people would raise other qualitative ideas about value in ethics, I would think to myself, "No, you're making this too complicated. It's just the happiness and suffering that matter, and you only assume these other things matter because they make you happy." I was relatively dogmatic about my approach to ethics.

Perhaps more than anything else, dissolving my confusion about consciousness and grappling with questions of what computations I cared about changed this simplistic attitude. I now realized that even if you only cared about happiness and suffering, there would remain a host of qualitative, fuzzy, and poetic questions that would need to be answered, like which parts of brains count how much, the relevance of brain size, the degree of importance of consciousness and what should even count as consciousness, how abstract vs. concrete algorithmic implementations can be, and much more. These questions sound as open-ended, philosophical, and up for interpretation as many of the debates in other areas of ethics that I had previously dismissed as easily solved by my particular moral framework. Ethics was once again complex and touchy-feely—once again like a humanities course rather than an economics course.

I hope this piece conveys the preceding sentiment and encourages greater tolerance of each other's ethical opinions, because these questions are not black and white.

Acknowledgements

I owe a great debt of gratitude to Carl Shulman for refining my understanding of consciousness and helping me explore many of the issues discussed above. My inspiration for the Criterion of Subjective Indistinguishability came from a discussion with Jonathan Lee, who also provided feedback on a draft of this piece. A comment by Max Maxwell Brian Carpendale about computations that exhibit folk psychology inspired my modified Turing test. Many other friends have contributed to my thoughts on these issues as well. Some of what I discuss in this piece may have been lifted from other authors, but I'm not aware of the exact sources except when cited via hyperlinks; a lot of these concepts are simply "lore" within the sci-fi and transhumanist communities.

Footnotes

  1. Ok, probably glial cells and other non-neuronal physical factors also matter. Let's assume these are also replaced by appropriate artificial substitutes.  (back)
  2. In fact, the uncertainty principle implies that this is impossible. To get around this complication, we can either make the objects being described sufficiently macroscopic that quantum effects become unimportant, or if that obscures important features of brain processing, then instead consider a full quantum-level description of the wave functions of everything involved, again stored in some software data structure as a list of numbers. Computing the updates would be more computationally intensive, but that doesn't affect the philosophical principle of the thought experiment.  (back)