by Brian Tomasik
First written: 6 April 2013; last update: 17 Oct. 2013
I now disagree with this essay insofar as it encourages defection on multi-agent prisoner's dilemmas, which I believe is not the right path to take, even from a purely selfish perspective.
Paul Christiano wrote a blog post that responds to the thesis of this piece: resisted his arguments, but I'm now coming to see that the strategic value of compromise is more persuasive than I had realized.
That said, values spreading merely in the sense of helping people discover what they really care about, by providing logical arguments and thought experiments, is not an antagonistic enterprise but is in fact something people should welcome. Purely self-promotional values spreading, in contrast, wastes resources that could go toward mutually beneficial projects.
I personally have mixed feelings about efforts to reduce extinction risk because I think space colonization would potentially give rise to astronomical amounts of suffering. However, even if I thought reducing extinction risk was a good idea, I would still think that promoting desire to create utilitronium would have much higher expected value because relatively speaking it's a much less exploited area to work in, and values seem more malleable and arbitrary than trajectories like whether humans go extinct. That said, it's important to remember that support for utilitronium is still somewhat and maybe even mostly determined by factors outside our control. Moreover, utilitronium supporters may wish to achieve gains from trade through compromise with agents of other value systems rather than just pushing for what they want in an untempered fashion.
Bob is a consequentialist who wants to maximize Z. He thinks that in the glorious future, people will be able to simulate vast amounts of Z by colonizing space and converting solar systems to Matrioshka brains. Therefore, he reasons, the most important thing I can do is make sure humans survive and colonize space.
This is similar to what Eliezer Yudkowsky calls
When technology advances far enough, we'll be able to build minds far surpassing human intelligence. Now it's clear, that if you're baking a cheesecake, how large a cheesecake you can bake depends on your intelligence. A superintelligence could build enormous cheesecakes - cheesecakes the size of cities. And Moore's Law keeps dropping the cost of computing power. By golly, the future will be full of giant cheesecakes!
The problem is that the thing Z that you value may, depending on how parochial your intuitions are, be more narrow than the thousands of intuitions that people across the globe have.
Consider positive-leaning utilitarians (i.e., those who care more about pleasure than negative-leaning utilitarians do). Their dream is to fill the universe with utilitronium. Unfortunately, most people don't support this dream and may even see it as a dystopic outcome. Most of the world thinks utilitronium would be boring, pointless wireheading that has no value at all.
Verbal intuition for the claim
Most people want to prevent extinction. They don't want to die, and they don't want their kids, neighbors, projects, writings, and society to die. Their incentives to prevent extinction are not calibrated to the full extent of astronomical waste that positive-leaning utilitarians fear, but they do care, and governments will invest billions and billions into safeguards against extinction risk.
So almost everyone wants to prevent extinction, but doing so is pretty hard. In contrast, you may have particular things that you value that aren't widely shared. These things might be easy to create, and the intuition that they matter is probably not too hard to spread. Thus, it seems likely that you would have higher leverage in spreading your own values than in working on safety measures against extinction.
Mathematical intuition for the claim
Say there are N people competing to decide how the future looks. There are only U positive-leaning utilitarians pushing for utilitronium, where U << N. (I would guess that U/N even among rational post-humans would be <10%.) Say the probability that humans colonize space is P. I would guess that P > 10%, maybe > 40%.
Utilitronium would be orders of magnitude more efficient than the other kinds of simulations we might expect typical post-humans to run. Relative to the value of a future with utilitronium, a future without utilitronium would be basically negligible. To fix a scale, say the value of a utilitronium-rich universe is 1 and the value of the other things that most people would want is 0.
If Bob the positive-leaning utilitarian works on combating extinction risk, say he reduces it by X%. The absolute reduction in risk is X% * P. But conditional on survival, there's only a ~U/N chance that humans create utilitronium, assuming that each of the people competing to shape the future has an equal shot at deciding the outcome. Alternatively, if computing resources are divided up proportionally to the number of people trying to shape the future, then U/N of the resources of the future will go toward utilitronium. Bob's expected contribution is then X% * P * U/N.
Suppose that instead Bob tries to make more people support a utilitronium-filled future, by spreading the meme that utilitronium would be wonderful. Say he increases the number of supporters U by Y%. Then his expected impact is (Y% * U)/N * P = Y% * P * U/N, which parallels the expression in the previous paragraph.a
But here's the thing: If U/N is small, it's waaay easier to have a big Y than a big X. An entire multi-million-dollar extinction-risk organization would be lucky to reduce the probability of extinction by X = 0.1%. In contrast, it seems quite plausible that a multi-million-dollar pro-utilitronium organization could counterfactually increase the number of utilitronium supporters by Y = 1%, 5%, 10%, or something in that ballpark. To a large extent, people's support for ideas is constrained by evolutionary and cultural pressures outside of our control, but there is also wiggle room, and it seems likely there's ample opportunity still to push on utilitronium.b
If U starts to become big, this is less true. For example, say there were already U = 50 million utilitronium supporters out of N = 500 million people seriously competing to shape the future. A multi-million-dollar utilitronium organization might be able to create, say, 50,000 new supporters, but this would only be a fractional increase of Y = 0.1%.
What if non-utilitronium futures aren't worthless?
If U/N is extremely small, then it's possible that other futures could begin to matter in the calculation. A utilitronium future is orders of magnitude more valuable than other futures, but maybe only by a few orders of magnitude. And maybe other values that people care about would be less amenable to hyper-optimization than happiness is.
Above I defined the value of a utilitronium universe to the positive-leaning utilitarian as 1. Say the value of a non-utilitronium universe is E, where 0 < E < 1. Probably E is very small, like <10-3.
Then the value of working on extinction risk is X% * P * [U/N + (N-U)/N * E], which for U << N approximately equals X% * P * [U/N + E]. The value of working on utilitronium remains Y% * P * U/N. The ratio of the first to the second equals
(X/Y) * (U/N + E)/(U/N) = (X/Y) * [1 + E / (U/N)].
If U is really small, then E / (U/N) will be big, but on the other hand, Y% will also be really big, because it's easy to change the percentage of people who care about utilitronium when the total number is small. For example, say U/N = 10^-5 and E = 10^-4. Then the ratio would be X/Y * (1 + 10) = 11X/Y. If N = 500 million, U = 10^-5 * N = 5,000. With such a small number of supporters, it's easy to imagine that X/Y is much less than 1/11 -- maybe even like 1/1000 or lower. In this case, spreading support for utilitronium is still a far better option. Of course, you'll need to plug in the appropriate numbers for your case.
1. As intelligence and rationality progress, people will discover what the right things to value are. We don't need to worry that our descendants will choose the wrong values.
There are hard survival constraints requiring that organisms become more intelligent and strategic. There is a single real world that agents must adapt to, learn about, and operate in. In contrast, there's not an obvious similar mechanism pushing organisms toward the things that I care about. Values are fleeting like the wind: Just look at how human tastes can change in the realms of fashion, art, theology, music, entertainment, and of course, ethics itself over the course of history. If we don't work to promote our ethical values, they will probably remain a tiny minority and might vanish entirely.
2. If values are fleeting, then won't spreading values not matter at all due to entropy over time?
Very likely our values will be lost to entropy or Darwinian forces beyond our control. However, there's some chance that we'll create a singleton in the next few centuries that includes goal-preservation mechanisms allowing our values to be "locked in" indefinitely. Even absent a singleton, as long as the vastness of space allows for distinct regions to execute on their own values without take-over by other powers, then we don't even need a singleton; we just need goal-preservation mechanisms.
If you want, you can include failure to lock in human values as a form of extinction risk in the P parameter above. Goal preservation should in theory be like extinction risk -- something that almost everyone would want because failure to achieve it would mean losing what they care about. So it doesn't seem like it should be different in incentive structure from other existential risks in the long run.
(Note: Relative to my negative-leaning utilitarian values, I don't know if I want goal preservation or not. It's not clear to me whether a human-driven future would create more or less suffering than a more alien future. However, positive-leaning utilitarians, like most other people, would favor goal preservation because otherwise utilitronium would be unlikely.)
3. If everyone reasoned as you suggest, we'd have a "free rider" / "tragedy of the commons" situation where no one would worry about extinction because they'd all be selfishly pushing their own values.
If you favor space colonization (and I don't), then this may be an important consideration. Still, it remains unclear how seriously to take it. Even if all the effective altruists on Earth dropped extinction-risk work overnight and began spreading their values, the dent in the total probability of extinction would be close to nil. Much of extinction risk is out of our hands; we're at the mercy of Darwinian, economic, and geopolitical systems that may not be amenable to change. Moreover, most people will still care about extinction for the reasons they do now: They care about themselves, their grandkids, their society, etc. They aren't hyper-optimizers for whom this argument will be compelling. So no, your decision to work on meme-spreading instead of extinction risk has extremely little impact on the overall probability of extinction even if your decision is mirrored by all of your friends.
That said, if 9 of your friends who don't care about utilitronium follow your lead in working on their pet values instead of extinction risk, then the reduction in work on extinction risk is 10 times your own reduction in work on extinction risk. But in most cases, I think it's implausible that your individual decision would have this much of an impact on what people with different values do. In any event, the effectiveness of values spreading may be 10, 100, etc. times that of extinction risk.
If you're an extremely influential thinker in this field (e.g., Nick Bostrom), then maybe your decision to stop working on extinction risk would cause 100 other people to change their minds too. But if you're that influential, then you'll also be able to persuade dozens of new people to support the utilitronium meme.
4. As an empirical matter, extinction risk isn't being funded as much as you suggest it should be if
almost everyone has some incentives to invest in the issue.
There's a lot of "extinction risk" work that's not necessarily labeled as such: Biosecurity, anti-nuclear proliferation, general efforts to prevent international hostility by nation states, general efforts to reduce violence in society and alleviate mental illnesses, etc. We don't necessarily see huge investments in AI safety yet, but this will probably change in time, as we begin to see more AIs that get out of control and cause problems on a local scale. 99+% of catastrophic risks are not extinction risks, so as the catastrophes begin happening and affecting more people, governments will invest more in safeguards than they do now. The same can be said for nanotech.
In any event, even if budgets for extinction-risk reduction are pretty low, you also have to look at how much money can buy. Reducing risks is inherently difficult, because so much is out of our hands. It seems relatively easier to win over hearts and minds to utilitronium (especially at the margin right now, by collecting the low-hanging fruit of people who could be persuaded but aren't yet). And because so few people are pushing for utilitronium, it seems far easier to achieve a 1% increase in support for utilitronium than a 1% decrease in the likelihood of extinction.
Applications for other value systems
I used utilitronium as a motivating example throughout this piece, but the argument is much more general. Below are some alternate value systems where similar reasoning applies. There are many, many more.
- Egoism. If you care about yourself more than others, one hyper-optimization would be to maximize the number of happy copies of you in the future, which would to positive-leaning egoists be orders of magnitude more important than anything else. Thus, they would favor personal life-extension, becoming rich and famous, etc. relative to working on extinction risks that affect the whole planet. That said, it's dubious whether most egoists want to maximize the number of happy copies of themselves; maybe just one is sufficient. Even if so, they would rather focus on making sure they don't die before they can be uploaded, and they might also want extra copies for redundancy.
- Complexity. It seems plausible that some computational futures could contain structures with significantly higher complexity than average according to some measure.
- Knowledge. Some post-human civilizations might spend vastly more research on knowledge than others. If you care especially about particular types of knowledge (e.g., proofs in number theory, understanding microbiology, or whatever), then hyper-optimization would be even more pronounced.
- Life. Depending on how you count life, we might be able to create orders of magnitude more life than a typical post-human civilization would. This is especially true if you don't weight by organism size, since you could create oodles of really tiny critters.
- Religious converts. If you want to win as many souls to Christ as possible, you could simulate vast number of tiny minds that regarded Jesus as their savior.
- Cheesecake. Probably the future won't have lots of cheesecake, so your efforts might be able to make a significant difference in this regard.
Values spreaders may also want to work on compromise mechanisms
Above I suggested that we probably can't overcome tragedy of the commons just by helping other value systems on our own, because our correlation with others is probably too small to timelessly achieve an outcome where everyone cooperates. However, we could aim to rectify the situation by changing the game dynamics, i.e., by creating institutions and mechanisms that would allow for more robust cooperation arrangements. This could be good for all value systems, including utilitronium, via gains from trade through compromise.
- In general, if you want to maximize a product of two factors, and if you can only change one or the other factor, then you should choose the action that maximizes the percent change in one of the factors. To see this, suppose we're aiming to maximize the product x1x2 by our actions. We have a choice between an action a1 that only affects x1 or an action a2 that only affects x2. We want to choose i to maximize
d(x1x2)/dai = (dx1/dai) x2 + (dx2/dai) x1.
Suppose i = 1. We only affect x1, so dx2/dai = 0, and the expression becomes (dx1/da1) x2. The situation would be reversed if i = 2.
Now, the i that maximizes d(x1x2)/dai will also maximize [d(x1x2)/dai]/(x1x2). For i = 1, this quantity equals [dx1/da1]/x1, which is the percent change in x1 that action a1 effects. If we took i = 2, we would see that the quantity equaled the percent change in x2 that a2 effects. This completes the proof. (back)
- Despite the hand-wavy numbers in the body of this essay, U/N is actually the fraction of utilitronium supporters in the far future, not in the near term. This fraction may increase on its own due to external factors, or alternatively, even if you increase it now, it may go back down. That said, it seems that values are permanently malleable in some measure, so it doesn't seem implausible that U/N could be changed in a counterfactual way by at least some single-digit percentage in expectation. (back)