Why Maximize Expected Value?

By Brian Tomasik

First written: fall 2007. Last nontrivial update: 7 Apr 2016.

Summary

Standard Bayesian decision theory tells us to maximize the expected value of our actions.a For instance, suppose we see a number of kittens stuck in trees, and we decide that saving some number n of kittens is n times as good as saving one kitten. Then, if we are faced with the choice of either saving a single kitten with certainty or having a 50-50 shot at saving three kittens (where, if we fail, we save no kittens), then we ought to try to save the three kittens, because doing so has expected value 1.5 (= 3*0.5 + 0*0.5), rather than the expected value of 1 (= 1*1) associated with saving the single kitten. But why expected value? Why not instead maximize some other function of probabilities and values? I present two intuitive arguments in this piece. First, in certain situations, maximizing the expected number of organisms helped is equivalent to maximizing the probability that any given organism is helped. Second, even in cases where that isn't true, the law of large numbers will often guarantee a better outcome over the long run.

Contents

A fictional example

An unknown disease has broken out among the 20,000 inhabitants of a small island. The disease is highly contagious: it spreads to everyone on the island before anyone detects it. Fortunately, because the island is isolated, there is no danger that the disease will spread to other parts of the world. Unfortunately for the islanders themselves, the disease is also 100% fatal, and each person now has only three days to live.

The world medical community has no drugs to treat the disease, or even to stave off its fatal side effects. Nonetheless, medical teams are dispatched to the island in order to provide palliative care. The medical teams have a limited budget of $10,000 with which to buy analgesics that, if successful, will alleviate the painfulness of death by the disease. You, the director of the medical team, are deciding which of two possible medicines to buy.

Since you believe that pointless suffering prior to death is equally bad regardless of which of the islanders experiences it, you are of the opinion that successfully treating n people is n times as good as successfully treating one person. You reason as follows: "If we buy SureRelieve, we are guaranteed to prevent the suffering of 10,000/2.04 = 4,900 people. If we choose CheapRelieve, we'll be able to buy 10,000 treatments, but it's unclear how many people we'll help. Since each treatment has a 50% chance of success, the expected value of the number of people helped is 10,000*0.5 + 0*0.5 = 5,000. This is higher than 4,900, so we should buy CheapRelieve."

But what if lots more medicines fail than expected? What if, say, only 4,800 of them work? Then we will have "gambled away" treatments that could have helped 100 people. Isn't it better to stick with the safe bet?

Point 1: Take a vote

Suppose we don't decide ahead of time which of the islanders will get the treatments we buy. Then if we have t treatments, the probability is t/20,000 that any individual will get a treatment. We then take a poll of the islanders to ask if they would prefer having the medical team buy all SureRelieve, all CheapRelieve, or some combination of both.

If the islanders vote for the option that maximizes their probability of being successfully treated, then they will all vote to buy all CheapRelieve. This follows from a simple

Theorem: Suppose there are N organisms who will experience some amount of brutal pain unless they receive help. Let T be a random variable for the number of organisms—randomly chosen from the N organisms—that will successfully avoid the painful experience by receiving help. Then the probability that any organism avoids the pain is E(T) / N, where E(T) denotes the expected value of T. In particular, the probability of avoiding the pain always increases as E(T) increases, regardless of the variance of T.

Proof:

Prob(helped) = Σt Prob(T=t) * Prob(helped | T=t)
= Σt Prob(T=t) * t/N
= (Σt Prob(T=t) * t) / N
= E(T) / N.

We can also apply this thought to the kitten example from before. Suppose you're one of the kittens, and you're deciding whether you want your potential rescuer to save one of the three or take a 50-50 shot at saving all three. In the former case, the probability is 1/3 that you'll be saved. In the latter case, the probability is 1 that you'll be saved if the rescuer is successful and 0 if not. Since each of these is equally likely, your overall probability of being saved is (1/2)*1 + (1/2)*0 = 1/2, which is bigger than 1/3.

I should note that in practice people in situations like that of the islanders may not actually choose the option that maximizes their probability of being helped, perhaps on account of ambiguity aversion, as illustrated in the Ellsberg paradox. Not knowing how many total successful treatments are available may be more ambiguous than knowing the actual number of treatments and merely being uncertain about who will receive them.

Point 2: The law of large numbers

The above point works well in situations where the potential benefits being distributed are equal, so that people care only about their probability of receiving the benefit. But what about situations where potential benefits are unequal--e.g., preventing someone from getting a cold versus preventing someone from getting malaria? Clearly it's not desirable for people merely to choose the option that maximizes their probability of getting some treatment, because, e.g., a probability 1/2 of avoiding the common cold is clearly not better than a probability 1/3 of avoiding malaria. We need to impose some utility function on different outcomes that specifies how much better malaria prevention is than cold prevention.

If we randomly distributed cold-prevention and malaria-prevention among a group of people who maximized their expected individual utility, then it's not hard to show that they would prefer the treatment method that maximized the expected utility of the whole group. But this begs the question, because we need to understand why people would want to maximize their expected individual utility.

The reason that is usually put forward is that, when decisions are made repeatedly regarding some random event, maximizing expected value makes it probable that, over long periods of time, you'll maximize the actual average value. This follows from the law of large numbers, which says that if we do enough uncorrelated random trials (e.g., flipping a coin enough times), we can become as certain as we like that the actual average value we observe in our trials (e.g., the average of the dice rolls that we make) will be as close as we like to the expected value (which, in this case, is 3.5 = 1*(1/6) + 2*(1/6) + ... + 6*(1/6)).b

In the island disease example, the number of people treated by CheapRelieve is a sum of 10,000 random outcomes. This is a "large number," which means the probability that the actual number of people treated deviates significantly from 5,000 is small. In fact, the chance is only 2.3% that CheapRelieve will successfully treat fewer people than SureRelieve.c

What about mixed strategies?

For instance, why not spend $5,000 on SureRelieve and $5,000 on CheapRelieve? With this strategy, you can buy 2,450 SureRelieve treatments and 5,000 CheapRelieve treatments. The expected number of people helped is 2,450 + 0.5*5,000 = 4,950. Here, we've bought a little bit of "insurance" against extremely low numbers of people helped, but at the cost of the chance to actually help more people. Even here, the chance is only 21% that our mixed strategy will help more people than the riskier strategy.d

If we had spent less than 50% of our budget on SureRelieve, this gap in expected values would have narrowed, but our insurance would have declined along with it. I see no reason to prefer a mixed strategy: If buying some CheapRelieve will help more than buying no CheapRelieve, then buying all CheapRelieve will be even better. If the improvement of buying all CheapRelieve over mostly CheapRelieve is hard to see with only 10,000 people getting treatments, then consider 10 trillion or 10 googol. In those cases, it's practically guaranteed that you'll help more people by buying all CheapRelieve.

Implication

Now consider the following. You are again the medical-project director, and you discover that you've gotten an extra donation of $51 with which to buy more medicines. If you buy the SureRelieve, you'll be guaranteed to help 51/2.04 = 25 people. If you buy CheapRelieve, the expected number of people you'll help is 25.5. But now, there's a 44% chance that CheapRelieve will help fewer people, perhaps several fewer. Do you decide that, unlike before, this case is too risky, so it's best to play it safe?

Hopefully not. The extra $51 is not isolated; it's part of the overall budget. If you had started out with a budget of $10,051, the no-mixed-strategies argument above says that you should have used all of it to buy CheapRelieve, because that would have almost guaranteed a better outcome, possibly much better.

Infinite outcomes

As William Feller notes on p. 251 of An Introduction to Probability Theory and Its Applications, the weak law of large numbers fails for random variables with infinite expectation, so the long-run-average argument falls through. Similarly, the von-Neumann Morgenstern expected-utility theorem, which is also sometimes invoked, relies on a continuity axiom that fails to hold when we allow infinitely large utility values (without also allowing infinitesimal probabilities).

What about isolated actions?

The long-run-average idea applies to cases in which our donations or actions will be one part of a larger ensemble of actions. But what if that isn't the case? What if we encounter a one-time all-or-nothing situation in which we can't rest assured that the law of large numbers will make things work out okay overall?

Scenario. You are the only sentient organism in the universe, but you learn that, at 5 p.m. tomorrow, 2 million people will come into existence for an hour, be brutally tortured, and then vanish again. No other sentient organisms will exist afterwards.

You discover a certain box that has two buttons, one red and one blue. The Red Button, if pressed, has a one-in-a-million chance of preventing all two million of the people from being tortured; instead, they'll come into existence for an hour and read the newspaper before vanishing. If the Blue Button is pressed, it will, with certainty, allow exactly one of the two million people to avoid torment and instead read the newspaper. You can only press one button because once one of these two buttons is pressed, the box vanishes forever.

Here, the argument about long-run averages seems not to apply because there are no repetitions of the event. The "take a vote" argument would apply, if we could poll in advance the 2 million people that would be coming into existence. However, it's possible to devise more complicated thought experiments in which this point, too, would break down. At this point, I would be willing simply to accept the expected-value criterion as an axiomatic intuition: The potential good accomplished by the Red Button is just so great that a chance for it shouldn't be forgone. However, below I survey two additional arguments.

Argument 1: Quantum MWI

The many-worlds interpretation (MWI) of quantum mechanics enjoys relatively large support among certain groups of physicists and presents what I consider a more coherent view than the Copenhagen interpretation. According to MWI, apparently random quantum events do not select a particular measurement outcome; rather, all of the possibilities are realized in different, parallel worlds. For instance, if we put a cat into a box hooked up to Geiger-counter-triggered poisonous-gas machine, it's not that there's a 50% chance that the cat will be killed; rather, there are two different world branches, and in one, the cat actually is killed. Thus, an expected value (using a probability distribution that matches the fractions of the various worlds realized) does not just reflect what might happen: It actually counts what does happen. So if the efficacy of the Red Button in the previous example is determined by a quantum outcome, the fact that this is a "one-shot" action doesn't matter: In a small fraction of worlds, you actually do prevent all 2 million people from being tortured!

Two qualifications are in order. First, the naïve picture about counting "numbers of worlds" is not quite right—see, e.g., "Understanding Deutsch's Probability in a Deterministic Multiverse" by Hilary Greaves (2004), sec. 5.3. What really counts are measures given by the Born rule. But this raises the question of what exactly measure is and how to justify Born probabilities rather than some other measure (like one based on having an odd number of socks—see sec. 3.2). Indeed, Greaves (2004) concludes that using Born probabilities in decision theory may simply need to be taken "as something of a primitive" (p. 34), which brings us right back to square one (Why expected value?) except perhaps to the extent that other MWI-based intuitions can be adduced.

Second, even if we agree that we should use Born-rule probabilities, this only applies to physical uncertainties, such as whether an electron will be measured spin-up or spin-down, or whether neurons in my brain will fire in a way that causes me to drive off the side of the road. Ideally, we want to maximize "expected values" calculated according to the true Born-rule measures over various worlds. But our probability distributions are not perfect: Much of our uncertainty about the future is not due to quantum splitting but merely our own ignorance, which might not be anything close to the true distribution of measure over outcomes. Moreover, we may assign meta-level probabilities that don't refer to specific outcomes at all (e.g., What's the probability that MWI is false? How likely is this or that law of physics to be true?). The MWI justification for maximizing expected value only holds to the extent that our subjective probability distributions match true quantum measures.

Argument 2: Rule utilitarianism

As a general rule, if everyone followed the advice to choose the action of maximum expected value, then the law of large numbers would imply that this would have the best consequences, even if a given individual action didn't come through with the desired outcome. We should be the change we wish to see in the world and set an example by following this rule ourselves.

Applied to the Red-Button example from before, we can say that even if this is the only time you'll ever have the opportunity to press a button and thereby potentially prevent torture, you would like it to be the case that others, in similar situations, behave the way you did, because aggregated over all such situations, that will prevent more total people from being tortured.

Likewise, we should praise people based on doing what seemed at the time to be the action of highest expected value, even if the person got unlucky with the actual outcome.

Logical uncertainty

The arguments above don't cover every case of uncertainty. For instance, when you're unsure about a logical truth like whether P = NP, the answer will be the same in every circumstance, for every person, in every possible world. Large numbers, quantum uncertainty, and rule utilitarianism can't help here.

Of course, remember that there's no such thing as objective probability: The "real" probability is 1 for however the multiverse is and 0 for everything else. Probabilities are tools that we use to express our own ignorance, and it's convenient to think of them as though they represent "actual randomness" over different outcomes (even though there is no such thing as "actual randomness"). So even if you make a wager based on the possibility that P = NP, and this turns out to be false, it may be compensated by someone else in another world taking another wager based on the possibility that the Riemann hypothesis is false when it in fact turns out to be true. (These are just examples. Neither of these questions has yet been solved.) Whether this kind of trading of logical errors satisfies you depends partly on how much is at stake based on the logical wager and how correlated those wagers are across worlds.

For myself, I find it just intuitive that the magnitude of importance of something should scale linearly with its probability. From this standpoint, expected-value maximization needs no further justification; the expected value just is how much I think the possible outcome matters.

Also, the "Take a vote" argument from the beginning of this piece does still apply to cases like the P = NP wager, at least if one's "probability of being helped" is assessed using the helper's subjective probability that P = NP. For example, suppose the probability that P = NP is 5%. Action A would help 100 people each by some constant amount if P = NP and would help no one if P != NP. Action B would help 2 people each by that same constant amount if P != NP and would help no one if P = NP. Out of a large number N of people needing help, a person's probability of being helped by action A is 5% * (100/N) = 5/N. A person's probability of being helped by action B is only 95% * (2/N) = 1.9/N.

Footnotes

  1. In mathematical language, this means that we consider a sample space of possible worlds (e.g., one possible world might include a kitten being saved from a tree, while another possible world might involve the same kitten not being saved). We then decide upon an objective function that maps from our sample space to the real numbers (or perhaps the hyperreal numbers or another ordered field). We then consider some set of possible actions (assumed finite for simplicity) we might take. For each action, we assign a subjective probability distribution to our sample space which recognizes the various possible results of taking that action (e.g., if our action is to call the firefighter, this probability distribution would say how likely it is that the kitten will be saved). So, for each action, our objective function becomes a random variable. Standard decision theory says the following: If, for each action, the objective function has finite expectation, then choose an action whose expectation is maximal.

    If we are hedonistic utilitarians, then our objective function maps from possible worlds to cardinal utility assignments.  (back)

  2. This is technically the weak law of large numbers, which holds in more cases than does the strong law.  (back)
  3. This number is easily computed by the normal approximation to the binomial distribution. With CheapRelieve, mu = 0.5*10,000 = 5,000, sigma = [10,000*0.5*(1-0.5)](1/2) = 50, z = (4,900 - 5,000)/50 = -2. The chance is 2.3% that a standard normal random variable will be less than -2.  (back)
  4. Consider the difference of two random variables: one binomial(10,000, 0.5) and the other binomial(5,000, 0.5). The probability that the mixed strategy does better is the probability that the difference of these two is less than 2,450. Approximate both as independent normally distributed variables. The difference of the two has variance equal to the sum of the individual variances: 10,000*0.5*(1-0.5) + 5,000*0.5*(1-0.5), which implies sigma = 61.2. mu = 2,500. Our probability is the probability that a standard normal random variable will be below -0.816.  (back)