Is my suffering focus a bias based on unfamiliarity with superhappiness?

By Brian Tomasik

First published: 2022 Oct 06. Last nontrivial update: 2022 Oct 06.

Summary

I espouse suffering-focused ethics (SFE), meaning that I think preventing suffering, especially extreme suffering, has special moral urgency compared with creating positive goods like happiness. My version of SFE is motivated by several different intuitions, such as the wrongness of forcing one person to suffer for some increased happiness on the part of other people.

My strongest motivation for SFE is the feeling that extreme suffering seems just horrible in a way that positive goods can't compare to. A frequent objection to this intuition is that it's biased by the fact that in humans, the worst sufferings can be significantly more intense than the best pleasures, but this needn't be true for arbitrary artificial minds. So, the allegation goes, this suffering-focused intuition is just based on the fact that I can imagine much stronger suffering than happiness.

Advocates of SFE reply to this argument in diverse ways (Knutsson 2017; DiGiovanni 2021; Vinding 2022). In my particular case, my reply is that I don't feel much motivation to force myself to care more about super-intense happiness (which I'll refer to as "superhappiness" hereinafter), especially since valuing superhappiness more would mean caring less about preventing extreme suffering. One might say that "You would care more about superhappiness if you experienced it", but this may not be very surprising. If experiencing superhappiness necessarily involves caring a lot about that experience at some level, then it's unsurprising that experiencing superhappiness would change my motivations, even after the experience ended. But so what? If I experienced what it was like to feel intense motivation to create paperclips, I might retain some of that motivation toward creating paperclips afterward. Moral values are extremely arbitrary and fragile, so guarding against accidental value drift seems to me at least as important as trying to alter my brain in various ways to see how my values would change.

Summary
SFE intuitions not based on intensity
The "sheer intensity" intuition for SFE
The bias objection
I am my contingent biases
Would reflection change my opinions?
- Case 1: Purely intellectual arguments
- Case 2: Actually experiencing superhappiness
Is my approach to ethics selfish?
Acknowledgments

SFE intuitions not based on intensity

There are many arguments and intuitions in favor of SFE (Vinding 2020).

Many people including myself have the intuition that it's wrong to cause harm to one person in order to give extra happiness to another person, at least in cases where the the harms and benefits seem comparable in scale. For example, it seems uncool to give Bob a mild headache for 1 minute so that Alice can enjoy an additional ice-cream cone that she doesn't currently crave. Presumably this intuition is partly based on deontological impulses against sacrificial harms in general. However, I don't think it's wrong to give Bob a mild headache for 1 minute in order to prevent Alice from having a mild headache for 2 minutes. So I think the intuition is also importantly about a distinction in kind between suffering and happiness. There's a sense that happiness is somehow more morally trivial than suffering, except in cases where happiness itself prevents suffering, such as by alleviating cravings.

DiGiovanni (2021) lists some further pairs of happiness and suffering of comparable intensity, or even where the happiness has seemingly greater intensity (to the extent this comparison is meaningful at all). He concludes: "While the choice isn't always intuitively obvious, ultimately when I imagine subjecting someone to both experiences in sequence, or worse yet a different person for each of the two, it doesn't look worth it."

I share this intuition for the case of one suffering experience against one happy experience. However, if we imagine, say, 1000 moderately happy experiences compared against one moderately bad experience, I begin to feel like the moderately bad experience can be outweighed. For example, one person stubbing his toe and feeling some pain for 15 seconds but without any lasting injuries seems worth it for two other people to experience an extra week of a joyful honeymoon, even if that honeymoon doesn't contain any instantaneous experience of higher hedonic intensity than toe stubbing.

(This honeymoon example might be misleading because in the real world, an extra week of a honeymoon would probably reduce suffering and not just create happiness. For example, during that extra week, the couple would avoid returning to their possibly boring jobs, wouldn't be stressed about chores, and so on. The fond memories of the experience might be a source of future comfort. So to make this example fair, it would be better to talk about an experience machine that creates a de novo happy couple on their honeymoon for a week, after which they would poof out of existence again. I still think it's worth a toe stubbing to create this experience-machine honeymoon, though I don't feel very strongly either way.)

My intuition that moderate happiness doesn't outweigh moderate suffering 1-for-1 but can outweigh moderate suffering in high enough quantities aligns with what's called "weak negative utilitarianism" (weak NU), which is the view that happiness and suffering both count, but suffering is more morally weighty, even if the suffering and happy experiences have the same intensity. (It's ambiguous how to define raw intensities of hedonic experiences as separated from our assessments of their moral goodness or badness, but there are various options. We could ask people to rate how intense a given experience seems. We could examine how much brain activity of various types occurs in response to a stimulus. We could study how organisms make tradeoffs between different stimuli.) Strong NU, in contrast, is the view that only suffering matters, and happiness can never outweigh it.

If the trade ratio between suffering and happiness is large enough, such as 10-to-1 or 1000-to-1, then weak NU already leads to importantly different conclusions than so-called "symmetric" utilitarianism. For example, while a symmetric utilitarian may be uncertain whether human life on average contains more good or bad, a weak NU with even a 10-to-1 exchange rate would conclude that human life contains more disvalue than value in the world as a whole. (Contestabile (2022) argues that even an exchange rate of 1.5-to-1 or 3-to-1 yields the same conclusion.) As for the far future, I find it plausible that the ratio of expected happiness to expected suffering resulting from Earth-originating space colonization is on the order of 10-to-1, or maybe 100-to-1. So a strong enough weak NU also suggests that the far future is net bad in expectation.

Thus, maybe weak NU is all that's needed to act roughly in the same way on practical issues that someone who holds strong NU would act. Therefore, the argument that I'll discuss in the remainder of this piece wouldn't topple SFE even if it worked, which is a conclusion that DiGiovanni (2021) also defends. However, for me personally, the "sheer intensity" argument for suffering focus is my strongest intuition in favor of SFE, so I think it is also worth addressing.

The "sheer intensity" intuition for SFE

When I contemplate (or worse, watch) an experience like burning alive, my brain is overloaded with a feeling that "this is horrible and must stop". Most people share that reaction, of course. However, my reaction seems stronger than the reaction many other people have. The feeling I come away with is that "burning alive is so horrible that positive experiences can't compare to it". I'm uncertain whether I would say that no amount of happiness can outweigh burning alive or merely that it would take ~billions of years of happiness to outweigh one minute of burning alive. But in either case, my degree of suffering focus is much stronger than in the case of mild pains and pleasures, where I might say that ~100 people experiencing mild pleasure can outweigh 1 person experiencing mild pain.

This feeling of being overwhelmed by the sheer awfulness of something like torture is the strongest reason for my SFE. My other pro-SFE intuitions are weaker, and I can imagine being persuaded out of them. For example, my main non-intensity-based SFE intuition might be the wrongness of imposing suffering on some people for the happiness of others, but it feels less wrong to choose to impose suffering on oneself for the sake of happiness. If I adopted a viewpoint like "open individualism" (the view that everyone is part of the same, universe-wide mind), then there would be no separateness of persons, and any suffering imposed for the sake of happiness would just be suffering imposed on "oneself".

The "sheer intensity" intuition for SFE is my moral bedrock. The view that torture is astronomically more morally serious than positive goods is something I'm more confident in than almost any other moral claim, and I'd sooner abandon almost any other moral intuition than abandon this one.

The bias objection

In part because I endorse the "sheer intensity" intuition for SFE, some in the effective-altruism community have argued against it. I think the following objection originates from Carl Shulman, and it's now widely echoed by others when arguing against SFE.

Shulman (2012) explains that while the most intense pains for evolved animals are plausibly stronger than the most intense pleasures, this needn't be true of artificial minds:

our intuitions and experience may mislead us about the intensity of pain and pleasure which are possible. In humans, the pleasure of orgasm may be less than the pain of deadly injury, since death is a much larger loss of reproductive success than a single sex act is a gain. But there is nothing problematic about the idea of much more intense pleasures[...].

Omnizoid (2021) echoes:

Given the way the world works right now, there is no way to experience as much happiness as one experiences suffering when they get horrifically tortured. [...] transhumanism opens the possibility for extreme amounts of happiness, as great as the suffering from brutal torture.

I probably agree with these points if we're talking about hedonic intensities as measured by objective properties of minds or tradeoffs that agents would make, rather than moral valuation. So these points do at least argue against a sort of empirical rather than moral suffering-focused view according to which even in posthuman futures suffering will outweigh happiness merely in virtue of suffering having a higher intensity than happiness. Rather, it's plausible that posthumans could (if they wanted to, which they probably won't) create superhappiness that's at least as intense as torture in terms of its objective brain properties.

However, there's the further question of how to morally value extreme happiness against extreme suffering. I commented on Shulman (2012) to say that I still care more about extreme suffering than extreme happiness even if the objective intensities are equal. Shulman replied:

I suspect that intuitions are skewed by our relative lack of experience with [extreme] pleasures; as we have discussed elsewhere, this is reason to think that your fully informed self that had experienced both extreme pains and extreme pleasures would be less negative-skewed[.]

Omnizoid (2021) similarly wrote in a comment discussion with me: "We literally can't conceive of how good transhuman bliss might be". When I said I still don't find superhappiness very motivating, Omnizoid replied: "That's true of your current intuitions but I care about what we would care about if we were fully rational and informed."

In other words, these authors argue

that I'm biased to think that suffering has so much more moral significance than happiness because of the contingent fact that my brain is more able to feel and imagine extreme suffering than extreme happiness, and
that given more information and reflection, I would likely come to care more about extreme happiness than I do now.

I'll discuss each of these points in turn.

I am my contingent biases

My "sheer intensity" intuition for SFE plausibly does come from the contingent fact that my emotional reaction to extreme suffering is so strong while my emotional reaction to hypothetical extreme happiness is relatively weak. The horror I feel when contemplating torture has been "burned in" to my motivational system at a deep level, while notions of superhappiness feel abstract. Even if I imagine specific pleasurable experiences I've had, they don't hold a candle to the awfulness of torture, and it's difficult to conceive of pleasures vastly more intense than those I've experienced. If my brain were wired differently such that it could experience extreme pleasures, I might feel a "sheer intensity" intuition about the immense importance of creating superhappiness, not just preventing extreme suffering.

So if my brain had been wired differently, I would have different moral values. But so what? If my brain had been wired to morally care about creating paperclips, I would want to create paperclips. If my brain had been wired to morally value causing suffering, I would want to cause suffering. And so on. The contingent circumstances that led to my current values are what made me me rather than someone else.

That humans value happiness and suffering is also a contingent fact based on evolution. If evolution had created organisms whose behavior was driven in other ways, I probably wouldn't care much about happiness or suffering; they would both seem like abstract, unfamiliar concepts.

Many people feel that novelty and diversity are important when creating good futures. For example, most transhumanists aren't satisfied by the idea of tiling the universe with a single, uniform, maximally blissful experience. But the value these people place on diversity of good things may be the result of contingent evolutionary forces. It's instrumentally useful for animals to seek novelty. For example, boredom motivates exploration, sensory-specific satiety encourages animals to consume different kinds of foods with different nutrients, and the Coolidge effect encourages males to impregnate many different females rather than just one. As a result, evolution gave us an intuition in favor of novelty and diversity in general, which may incline us to think that novelty and diversity are intrinsically valuable. Maybe if we had evolved in a world where seeking out diverse experiences wasn't useful, people wouldn't care much about creating a diversity of different good things. Yet even if this preference for diversity can be explained by contingent facts of evolutionary biology, most people don't regard the intuition as "debunked". Rather, we continue to cherish the intuition as part of the complexity of our idiosyncratic human values.

The (just-so) story I offered to explain the widespread intuition that novelty has intrinsic value seems fairly analogous to the story behind my "sheer intensity" intuition for SFE:

In the case of novelty, humans notice that they're emotionally drawn toward diversity and feel bored by the thought of uniform good things filling the universe. Even though these reactions are based on a contingent fact about how human nervous systems currently work—because boredom happens to be useful for us, though this needn't apply to non-evolved artificial minds—humans convert these emotions they feel into the belief that diversity has intrinsic moral value. We feel that too much uniformity is not just undesirable for human minds but is undesirable per se. Rather than just seeking diversity in our own lives, we want the whole universe to be diverse.
In the case of my strongest SFE intuition, I notice that I have a vastly stronger emotional reaction when contemplating extreme suffering than when contemplating any positive goods. Even though this reaction is based on a contingent fact about how human nervous systems currently work—because experiencing stronger worst-case pains than best-case pleasures happens to be useful for us, though this needn't apply to non-evolved artificial minds—I convert this emotion I feel into the belief that extreme suffering has intrinsically more moral significance than any positive goods. I feel that extreme suffering is not just paramount for human minds but is paramount per se. Rather than just focusing on preventing extreme suffering in my own life, I want the whole universe to focus on preventing extreme suffering.

Presumably there are also disanalogies between these two cases. But it at least doesn't seem obvious to me why we would discard the "sheer intensity" SFE intuition while keeping the diversity intuition.

Imagine a symmetric hedonistic utilitarian named Shu. (The letters of her name are an acronym for her moral creed. The name "Shu" also means "warm-hearted" in China.) Shu might react to the analogy that I just described by saying that it seems to debunk both diversity and "sheer intensity" SFE. Therefore, Shu might say, we should convert our future light cone into a homogeneous blob of maximal pleasure after all. To this I would reply that Shu's own moral values can also be characterized with the same template as I used for the two intuitions that she rejects. In particular:

In the case of symmetric hedonistic utilitarianism, Shu notices that she's emotionally drawn toward happiness and feels bad when contemplating suffering. Even though these reactions are based on a contingent fact about how human nervous systems currently work—because caring about happiness and suffering happen to be useful for us, though this needn't apply to non-evolved artificial minds—Shu converts these emotions she feels into the belief that happiness and absence of suffering have intrinsic moral value. Shu feels that happiness is not just desirable for human minds but is desirable per se. Rather than just seeking happiness in her own life, Shu wants the whole universe to be filled with happiness.

Ultimately, our moral views are biases that we choose to keep. If we weren't biased in some direction or other, we wouldn't care about anything or would care about completely random things.

All of that said, it's true that we sometimes do choose to jettison biases. For example, we might initially assume that humans are vastly more morally important than non-human animals, but after reflecting on arguments against speciesism, we may choose to drop that bias. So there remains the question of whether I would choose to drop the bias of focusing on extreme suffering given more information and reflection.

Would reflection change my opinions?

Effective altruists sometimes refer to "what I would care about upon moral reflection" as though there were a well defined answer to this question. In fact, there's almost an infinite number of ways we could undertake moral reflection that, I suspect, would lead to an almost infinite number of output moral views (Tomasik "Dealing ...").

There are some types of reflection that are fairly unambiguously welcome. For example, if you care about reducing wild-animal suffering, then having more accurate data on how many wild animals of specific types exist is an improvement to your views. It's hard to defend putting your ostrich head in the sand regarding purely factual information like that.

At the other extreme, some types of "reflection" are unambiguously unwelcome. To use an example that I believe originates from Eliezer Yudkowsky, if we had a pill that would make people want to become murderers, then a currently pacifist Gandhi would refuse to take the pill, because it would cause value drift of a kind that he currently finds abhorrent. Forced brainwashing, brain damage, and unwelcome brain editing presumably also count as "bad" ways to change your current values.

In between these extremes is a wide gray area, where it's not always clear how much the procedure in question counts as welcome moral reflection versus merely unwelcome value drift. Philosophical arguments for particular moral views probably fall mostly on the side of welcome updates, while optimized propaganda might fall more on the side of being unwelcome. Having new experiences, meeting new people, and growing older also have some mixture of welcome and unwelcome moral change. These processes teach us new information and give us a richer repertoire of ideas to draw from, though they also partly modify our values "by brute force" via tweaking our emotions and motivations more directly. Of course, we might feel that some amount of brute-force change to our motivations, such as from the cognitive shifts that happen when growing older, is also desirable. Ultimately, it's up to us whether a given process for altering our moral views is welcome.

With that preamble, let's return to the question of whether my "sheer intensity" intuitions for SFE would be mollified on reflection. It's unclear what "on reflection" means, but I'll consider two broad cases.

Case 1: Purely intellectual arguments

One way of doing reflection on this topic is simply to hear more philosophical arguments about it, such as the bias argument that I've been discussing. Of course, I don't know how I would react to hearing future arguments that I'm not already aware of, but I can comment on how I feel about the existing anti-bias arguments.

When I became persuaded that superhappiness stronger in objective intensity than torture was probably possible, I found this somewhat compelling from a very abstract perspective. Like, if I'm in a mindset of trying to silence my emotional reactions and focus instead on intellectual elegance, then it would seem reasonable that extreme happiness could (at least in sufficient quantity) outweigh extreme suffering. But if I imagine actually acting on the principle that extreme happiness can counterbalance torture, it would feel unmotivating and almost disgusting. I just can't get myself to care that much about creating new happiness, even extreme happiness, while people and animals are enduring unbearable torment. My emotional reactions are the drivers of my morality, and intellectual elegance has to take a back seat. If my current emotional reactions have been burned into my brain based on more familiarity with extreme suffering than extreme happiness, then so be it. Those are the emotions I have, and they're the ones I want to keep.

I'm just describing my own reactions. I know that for some people, especially effective altruists with a systematizing mindset, intellectual elegance can trump emotional intuitions, and I think a number of people do find these intellectual observations about superhappiness compelling. That said, if one does find the evolutionary debunking argument against the "sheer intensity" SFE intuition compelling, then why not the evolutionary debunking argument against placing intrinsic value on novelty? Or a debunking argument against caring about hedonic wellbeing or consciousness at all? And so on. It's not at all clear where to draw the line between emotions to keep versus emotions to reject.

Part of what I want to say in this piece is just that you don't have to find the bias objection (or any moral argument) compelling if you don't want to. I can imagine that some effective altruists don't really care about creating superhappiness instead of preventing torture but feel "bullied" into the idea that they have to care about it, because people smarter than they are have arguments in favor of it. I would tell such people that your values can be whatever you want them to be. You shouldn't be bullied into holding moral views that don't sit well with you.

Case 2: Actually experiencing superhappiness

Imagine that instead of merely hearing intellectual arguments for caring more about superhappiness, I actually experienced it. By virtue of the "bad is stronger than good" fact of human brains, experiencing superhappiness might require some nontrivial edits to my nervous system. We could alternatively imagine a weaker approximation in which I experienced the strongest happiness that it's possible for my current brain to generate.

I suspect this process would have a nontrivial effect in making me care more about superhappiness at an emotional level, maybe to the point where I would trade it off against torture at comparable ratios as I do for mild pleasures and pains. I see a weak shadow of this phenomenon in my current life: I'm somewhat more inclined to think that enough happiness can outweigh moderate suffering when I'm in a good mood rather than a miserable mood, although the effect isn't huge. In other words, experiencing more happiness tends to make me value happiness a bit more. So experiencing extreme happiness might make me value happiness a lot more. But this seems like a pretty unsurprising fact. Experiencing happiness means my brain feels motivation to continue having brain states like the one I'm in, and it's plausible that this motivation regarding my own brain state "bleeds over" into my altruistic motivations too, making me more inclined to bring other brains into similar mental states.

I think the defining feature of suffering is motivation to end one's current mental state. The sensory experience of pain per se isn't suffering unless it's accompanied by a desire for the experience to stop. Pain asymbolia refers to a condition in which pain is felt without it being bothersome, and this doesn't count as suffering. Meanwhile, any mental state that one wishes to get out of is suffering, even if there's no attendant physical pain.

It's plausible that happiness would be characterized symmetrically, as a mental state that one is motivated to continue being in, or at least that one cares about having. Imagine a case of "pleasure asymbolia" in which a patient feels the physical sensations associated with pleasure but doesn't care about them. Is that a form of happiness? It seems to me that you need to care about your pleasure for it to count as happiness, just like you need to care about your pain for it to count as suffering. That said, I'm less sure about this claim in the case of happiness than in the case of suffering.

If we do assume that "experiencing happiness" is centally about "being motivated to continue the given mental state" or "caring about having the mental state", then experiencing superhappiness would mean being extremely motivated to continue the given mental state or caring a lot about having it. In other words, having the experience of superhappiness is not merely or even primarily about "learning new information". It's instead to a significant extent about changing (even if only transiently) one's motivations, similar to the murder pill that Gandhi refuses to take. During the moments of ecstasy, I might feel like: "This is so amazing it's worth some torture for." If the happiness were intense enough, maybe I could even have thoughts like: "This is worth experiencing infinite suffering for." Such sentiments would reflect a significant change to my motivational system and count as major value drift relative to my current views. Hopefully my values would mostly revert back to normal afterward, but probably some amount of the motivational change would remain "burned into" my brain, plausibly even if I experienced superhappiness just once.

The claim that "If you experienced stronger pleasures, you would care more about pleasure" is probably true but almost trivial. It says that if I experienced stronger motivation to continue pleasurable brain states, I would probably retain some of that motivation even afterward. But so what? Why would I want to change my motivations in that way?

We could make analogous statements about all sorts of possible values. For example, consider the value of patriotism, which I'll take to mean defending your country's honor, even at the cost of extreme suffering, such as when dying in battle or being tortured by the enemy. Suppose you currently think hedonic experience is important but don't care intrinsically about patriotism. If you don't currently care about patriotism, then you could benefit from learning what it feels like to be patriotic, right? That could be an important part of moral reflection. So we could rewire your brain a bit so that it feels superpatriotism, meaning an extremely intense desire to defend one's country, stronger than any current human levels of patriotism. After that experience was over, it's plausible your brain would retain some traces (or even large amounts) of patriotic motivation, to whatever extent the strong motivation you felt led to lasting changes in your neural connectivity. Analogous to the case of superhappiness, we could claim: "If you experienced stronger patriotism, you would probably care more about patriotism". But again, so what?

We could do a similar procedure for almost any kind of motivation. Maybe you currently lack knowledge of what sadism feels like, so it would benefit your moral reflection to experience extreme motivation to cause harm to others? And maybe you should learn what it feels like to be extremely motivated to create paperclips, to such a degree that you would give up centuries of happiness and endure centuries of torment in order to create just one more of these metallic office products. And so on.

Suffering and happiness are just two of the numerous types of motivation that are possible. Suffering and happiness are motivations about the contents of one's own mental states—namely, to end or continue those mental states. But agents can also have outward-directed motivations about things in the external world, such as protecting one's children, defending the honor of one's country, creating paperclips, destroying staples, solving the Riemann hypothesis, etc.

While I've been drawing an analogy between happiness and other forms of motivation like patriotism and wanting to create paperclips, I acknowledge that there may be some differences between them as well. Phenomenologically, happiness feels different from patriotism, which suggests that its neural implementation probably differs somewhat too. For example, maybe a lot of the motivation in happiness occurs in "lower-level" brain structures, while patriotism is more about a person's higher-level, explicit beliefs? (I have no idea if that's true; it's just an example of the kinds of distinctions that might exist between these things.) Maybe it would be possible to edit my brain so that the lower-level parts feel superhappiness motivation, without editing the higher-level explicit thoughts to also care about superhappiness? Maybe this could be one attempt at a compromise between trying to give me some of the information value of what superhappiness feels like without also hijacking my higher-level motivations too much.

Alternatively, maybe it would be possible to have memories of superhappiness without retaining any of the motivation toward superhappiness, though I'm not sure if that can be done. Remembering a past emotion often includes feeling a little bit of that emotion again in the present. For example, recollecting a time when you were sad may make you feel a little bit sad again. So recollecting superhappiness might "sneak in" some of the same experience, including the motivation to experience it, even though we were trying to not allow the motivational aspects of the experience to stick around. And if it were possible to remember past superhappiness in a more dispassionate way that wouldn't reproduce any of its motivational aspects, then maybe the memories of superhappiness wouldn't increase my moral valuation of it after all?

There's a wide spectrum of possible ways we could edit my brain to have various experiences. We could try to look for ways to give me information value about new things without also corrupting my motivational neural networks, though fundamentally I doubt there are clean distinctions between "just providing information" versus "changing motivations" in a messy, interconnected brain where different subsystems are not crisply partitioned.

As an analogy for the blurry line between information and behavioral hijacking, consider that opening a malicious data file (e.g., a specially crafted MP4 video) can give an attacker control over a computer, such as via a buffer overflow. In some sense, the MP4 file is "just information" (a blob of 0s and 1s), but it can also seize control of the computer's behavior. In the context of artificial-intelligence (AI) safety, some researchers worry that a superintelligent AI might be able to hijack people's values merely by telling them certain things, without even needing to directly edit their brains. That said, one could argue that these cases are somewhat different from experiencing superhappiness because superhappiness hasn't been specially crafted to hijack people's values. (Or has it? Changing people's motivations is the evolutionary purpose of happiness.)

When we appreciate how truly arbitrary our motivations are, and how fragile they are relative to a vast space of possible motivations we could have, I find that I have little inclination to engage in the kind of hard-core moral reflection that includes undergoing lots of changes to my brain to see what pops out at the end. My current values are like a glass vase, and it seems very easy for them to get broken during a long journey to their destination. If values are so arbitrary anyway, why not stick with the ones I already know and love?

In the first sentence of this piece, I said that I espouse SFE. I chose the word "espouse" deliberately, because it originally comes from the meaning "to take as a spouse". I feel loyal to those enduring extreme suffering who wish for it to stop. I want to stay faithful to them rather than abandoning them for some other project that might seduce my motivational system, such as creating superhappiness. (As a lazy human, I'm very often preoccupied with interests other than reducing extreme suffering. But here I'm talking about where I direct the part of my life that is spent on consequentialist altruism.)

Muehlhauser (2017), in footnote 242, proposes a version of moral reflection in which he could hypothetically create thousands of uploaded copies of himself who would consider various moral arguments, have various experiences, and talk with others. These copies could then discuss and negotiate among themselves, before advising his original self on what they suggest his values should be. This reflection procedure seems potentially safer than merely making updates to the single brain that I actually have, because if a few of the copies get seriously corrupted by some experience, hopefully there would be enough other copies who didn't get corrupted to counterbalance them. Of course, what counts as "getting corrupted" is unclear. Suppose that most of the uploaded copies experienced superhappiness and then reported how morally important it was. Copies who didn't do that might regard the majority as having been corrupted, in a similar way as some religious people regard most of the world as having strayed from God. Meanwhile, if only a few of the copies experience superhappiness, then maybe in the end the collective judgement of the thousands of copies wouldn't care much about superhappiness after all. Thus, even if we have thousands of copies to play with, I expect the output moral judgments would still depend a lot on the parameters of how the reflection was set up, such as what fraction of the copies would experience superhappiness (or superpatriotism, supersadism, etc).

Someone might ask me: "Aren't you curious to try feeling superhappiness?" I am somewhat curious, because it would be both intellectually interesting and fun. However, my degree of curiosity about it isn't that high, and I would worry about the experience causing some amount of "brute force" value drift in my brain. If I were going to experience superhappiness, it would be only fair for me to also try experiencing supersuffering, so that my motivations wouldn't become biased too much in the pro-happiness direction. But if someone offered me to experience a package of an hour each of superhappiness and supersuffering, I would say: "Hell no!" If someone offered me a million years of superhappiness for an hour of supersuffering, I would still say: "No way." This example itself illustrates my SFE intuitions.

Maybe I could experience happiness merely as intense (in terms of objective brain properties) as the most intense suffering I've ever felt, so that I wouldn't also have to experience any additional suffering in order to keep the comparison fair. I wouldn't necessarily say no to this in theory, though I still would worry about some amount of goal drift. And in the real world, I would worry about becoming somewhat addicted to the happiness—even if not physically addicted, then at least addicted in the sense that I would want to experience it again and might spend a fair amount of effort on that goal.

I would also be curious to know what it's like to feel that paperclips are the most important things in the universe. It would even be interesting to know what supersadism feels like. But having these experiences, especially the latter, sounds like a bad idea.

I've been asked whether I'm anhedonic. The answer is no; I presume I feel pleasure roughly to an average degree. I also have negative emotions fairly regularly, but this too I expect is roughly at about an average level. That said, there presumably is something about my psychology that's somewhat different from average. My best guess would be that I have an abnormally high level of anxiety when imagining extreme pain (though not mild pain; I don't mind the low-level pains of ordinary life that much). This was plausibly contributed to by my experiences of significant suffering from esophagitis as a teenager, which is an experience that most young people don't go through.

When I shared this essay that you're reading now with friends, I joked that it would not only be unconvincing to most effective altruists but perhaps negatively convincing, i.e., it would make some people more inclined to dismiss SFE, because I acknowledge that a lot of my SFE motivation is based on my idiosyncratic psychology rather than abstract arguments. One thing to remember is that there are in fact numerous abstract arguments for SFE, including the various pro-SFE intuitions other than the "sheer intensity" one. So even if you find my emotion-driven approach to ethics to be a non-starter, that shouldn't discourage you from SFE overall. Most of my pro-SFE colleagues are much more driven by argumentation than I am.

I think some feminist ethicists have pointed out that "masculine" morality tends to be focused on abstract principles of justice, while "feminine" morality may be more about emotional caring. (I haven't read these feminist authors, and they have plenty of critics as well, including fellow feminists.) This distinction somewhat mirrors how I see my ethics compared with that of many of my colleagues. On the other hand, I feel like a lot of moral argumentation ultimately comes down to emotion as well. Two people can read the same arguments for SFE, yet one person can be fully convinced, while the other person remains unmoved. Maybe this can be explained by differences in other philosophical arguments that these two people have been exposed to earlier in life. But I suspect a lot of the difference ultimately comes down to psychological factors, such as differences in emotion and cognitive style.

I feel like my description of my ethics aims to be maximally honest rather than maximally convincing, like: "Here are the actual reasons why I hold the moral view I do, as best I can introspect it." In contrast, arguing based on abstract reasons feels more like a kind of (gentle) warfare, in which you fire argumentation missles in an attempt to destroy alternative viewpoints and defend your own. This attempt to convince others of your viewpoint may be why moral reasoning arose to begin with? It's more convincing to tell someone that "there's a moral reason to do X" than "I emotionally want you to do X". In any case, I value reasoning a lot as well and am willing to reach some fairly strange conclusions using it. But I only reach conclusions that my emotions are comfortable with. And my emotions rebel against the idea of allowing more torture for the sake of superhappiness that no one needs to experience.

Is my approach to ethics selfish?

One reply to my moral view could be that it focuses too much on my own emotions and not enough on the interests of those whom I'm trying to help. For example, these critics could say that the question isn't whether my brain feels a strong emotional reaction when contemplating extreme suffering; what matters is that the sufferer has an extreme emotional reaction. And likewise, these critics could argue, it doesn't matter whether my brain feels strongly motivated to create extreme happiness; what matters is that the extremely happy agent himself would be immensely grateful for being brought into existence. If altruism is about treating others the way they want to be treated, then how can I justify prioritizing reducing extreme suffering over creating extreme happiness? (I've heard this kind of argument from at least three different friends over the years.)

One possible reply is that if we fail to create de novo superhappy posthumans, the agents we could have created won't exist, so we aren't violating their preferences by not creating them. Note that this framing in terms of "avoiding frustrated preferences" already assumes a disvalue-focused moral view, such as perhaps negative preference utilitarianism. I find it intuitive that reducing preference frustration is (vastly) more important than creating new preferences only to satisfy them, which is another argument for SFE. However, I can imagine some people feeling that creating satisfied preferences and not just avoiding dissatisfied ones is part of what altruism should be about.

A second reply I would make is that even if we do give some weight to the gratitude of newly created superhappy beings to exist, we have to decide how to weigh this against the preferences of beings in extreme agony for their mental states to stop. When different agents have different utility functions, there's no "right" way to compare the magnitudes of those utility functions. For example, there's no unique way to compare the strength of a tortured animal's preference to stop being in pain against the strength of a blissful person's preference in favor of the mental state she currently occupies. Given that there's inherent arbitrariness to these comparisons to begin with, I choose to put more focus on those preferring to avoid extreme suffering (Tomasik "Does Negative ...").

Thirdly, suppose we ignore the above two replies, as well as any other principled arguments that might be adduced to defend against the charge that my ethics is selfish. Even if my ethics were sort of "selfish", I'd be ok with that. If you think that sounds horrible, consider the following thought experiment.

99.99% paperclippers

Suppose that space colonization and the long-term future are irrelevant and that altruistic impact is about what happens on Earth in the next few years. (For example, this would be true if we're in a simulation that will end soon. This stipulation ensures that the numbers of creatures of various types that exist can't be changed very much.) Imagine that only 0.01% of conscious creatures on Earth are capable of feeling happiness and suffering, and that these creatures mainly want to avoid extreme suffering. Meanwhile, the other 99.99% of conscious creatures on Earth are non-hedonic "paperclippers", i.e., agents who don't feel happiness or suffering but still have extremely strong desires to create more paperclips. When the preferences of these agents to create paperclips are thwarted, they don't feel bad about it; they just seek out ways to avoid having their preferences thwarted again. Finally, suppose we have a switch that will either reduce extreme suffering (when pulled down) or create paperclips (when pushed up).

If altruism is about helping other agents the way they want to be helped, it would seem we should push the switch up and create more paperclips. That's more consistent with the preferences of almost all conscious beings in existence. I do feel some sympathy for the paperclippers and their intense desires to have more paperclips in the world. If I could push a switch that would create paperclips without any opportunity cost, I would. However, if creating paperclips means allowing more extreme suffering to continue, I would have to pull the switch down to prevent the extreme suffering. I might say: "I'm sorry, paperclippers ...but not that sorry. I know that you care a lot about those shiny objects that keep papers together, but I personally identify more strongly with the urgency of reducing extreme suffering. I'm not going to be a slave to majority opinion and do whatever the majority happens to prefer. I'll give some weight to the majority's views, but ultimately I'm going to focus on doing what tugs at my heartstrings the most. If you want to label that as 'selfish', so be it."

The warm glow that your brain attaches to the concept of "altruism" is based on the fact that altruism in our own world tends to mean helping other humans in ways that emotionally resonate with you. If you were transported into the 99.99% paperclipper world, your reaction to the idea of "altruism" might be like: "Ugh, I have to create more of these pointless paperclips. This sucks." Maybe you would begin to feel more favorably toward paperclip production if you had pleasant social interactions with paperclippers who displayed signs of gratitude (though they, by assumption, wouldn't actually feel any happiness about your work). However, let's imagine that the paperclippers, despite having intensely strong preferences, are for some reason unable to ever show those preferences to humans, maybe because the paperclippers live in a cave deep underground and can't communicate with the surface world. In that case, the only reason to create paperclips would be actually caring about being altruistic toward them, rather than seeking social approval.

The analogy of all this to supperhappiness is that, relative to my current emotions, creating superhappiness feels to me more like creating paperclips than it does like reducing extreme suffering. I care a little more about superhappiness than about paperclips because I'm a human who appreciates happiness to some degree. But when compared against extreme suffering, superhappiness seems almost frivolous. I am a bit sorry about those beings who won't get to experience enormous gratitude for their blissful existences ...but not sorry enough to allow additional torture.

Also, note that if superhappy posthumans never exist, they won't have frustrated preferences. In contrast, the paperclippers in my example do exist and do have frustrated preferences, although they don't experience hedonic suffering as a result (assuming that frustrated preferences don't inherently count as hedonic suffering to some degree). I think that preference frustration is somewhat bad and that enough of it can plausibly outweigh hedonic suffering, so I might even say that a large enough number of non-hedonic paperclippers having their preferences thwarted would be worse than some moderately intense hedonic suffering. But I certainly wouldn't say that one paperclipper with an extremely intense non-hedonic preference for more paperclips deserves as much moral weight as one human with an extremely intense preference against agony. It matters how much I emotionally resonate with the given preference.

Acknowledgments

This piece arose out of discussions with Winston Oswald-Drummond, Anthony DiGiovanni, Emery Cooper, Magnus Vinding, Tobias Baumann, and others.