Expected Value of Shared Information for Competing Agents

By Brian Tomasik

First written: 4 Jun 2013. Last nontrivial update: 17 Oct 2013.

Summary

Most of the time, learning more allows you to more effectively accomplish your goals. The expected value of information for you is proportional to the probability you'll find a better strategy times the expected amount by which it's a better strategy than your old one conditional on you switching to it. However, what happens when you conduct research that is shared publicly for use by other agents that may have different values? Under which conditions is this helpful versus harmful to your goals? A few generalizations we can make are that acquiring information that will be shared with you and other agents is beneficial by your values when (1) you have more resources to act than the other agents, (2) the other agents currently disagree with you a lot on their policy stances, and (3) you expect yourself and other agents to adopt closer policy stances to each other upon learning more, perhaps because your values are similar.

It's not obvious whether negative-leaning utilitarians should welcome general research that will be useful to both themselves and to positive-leaning utilitarians, though the publicity, credibility, and feedback benefits of sharing your research are substantial and suggest that you should share by default unless you have other strong reasons not to.

In addition, as a matter of theory, it should by the Coase theorem usually be possible to make arrangements such that the sharing of information is net positive when appropriate compensation is given for losses that might result to the values of the originator. It would be worth exploring more how mechanisms to compensate for information externalities could work in practice.

Summary
Expected value of information for one agent
Impact of information with two agents
Implications for when shared information is beneficial
Cooperative research?
Should NLUs do research with PLUs?
Other considerations
Can we make information always Pareto-improving?
Footnotes

Expected value of information for one agent

An agent has a given value system. It assigns probabilities P to possible worlds, and given world w, the value of taking action a is V(a|w). The expected value of an action is

EV(a) = sum_w V(a|w)P(w).

The agent chooses action argmax_a EV(a), which has expected value max_a EV(a).

Suppose the agent learns a new piece of information i. This updates its probability distribution to become P(w|i) for each w. The new expected value of action a is

EV(a|i) = sum_w V(a|w)P(w|i).

The new best action is argmax_a EV(a|i) with expected value max_a EV(a|i). Relative to our new information, the improvement in expected value of our action is the value of information i:

VOI_i = max_a EV(a|i) - EV(argmax_a EV(a) | i).

Now, a priori, the agent doesn't know what information it will learn. It needs to assign a probability distribution P(i) over possible discoveries. The expected value of information is then

EVOI = sum_i P(i) [max_a EV(a|i) - EV(argmax_a EV(a) | i)].

The information i only matters insofar as it may change the agent's action. Therefore, rather than partitioning the possible outcomes based on the exact information i, we can group it according to what action the agent chose as a result of learning i. Define the agent's original action as a_o and P(choose a_k) as the probability the information comes out such that the agent thinks action a_k is the best upon learning it. We then have

EVOI = sum_k P(choose a_k) [EV(a_k | choose a_k) - EV(a_o | choose a_k)]. ^a

Consider the simple case of just two action choices a₁ and a₂. Without loss of generality, assume the original action was a₁. Then

EVOI = P(choose a₁) [EV(a₁ | choose a₁) - EV(a₁ | choose a₁)] + P(choose a₂) [EV(a₂ | choose a₂) - EV(a₁ | choose a₂)]
= P(switch to a₂) (given that you switched, how much better do you think a₂ is than a₁).

Impact of information with two agents

What happens if we have two agents with different value systems that both learn the new information? Let's adopt the stance of the first agent throughout the remainder of this piece. Relative to our values, the improved information will (in general, with some exceptions not discussed here) have positive expected value, because either we just do the same thing as before, or else we find something better and do that instead.

However, the other agent has different values, so when it learns more, it could take an action that is worse than what it was doing before, relative to our values. Thus, the expected value of information learned by the other agent could be negative from our point of view. In light of this, we can no longer represent the expected value of information to the other agent's action using a max operation, because the other agent may not choose the action that's maximally valuable by our lights. However, we can still use the formula that operates based on what the agent chooses, assuming we remember that the other agent may choose a new action worse than the old one.

The nonzero terms in the formula for expected value of information come from the potential changes of action by one or the other agent. In particular, for the simple case of two possible actions:

EVOI = P(you switch and other agent doesn't) (difference in expected values for your action given that you switched and the other agent didn't) + P(you don't switch and other agent does) (difference in expected values for the other agent's action given that he switched and you didn't) + P(you switch and other agent does also) [(difference in expected values for your action given that you both switched) + (difference in expected values for the other agent's action given that you both switched)].

Implications for when shared information is beneficial

From the above formula, we can derive a few intuitive principles about when it's advantageous for you to produce shared information versus when it's harmful to do so.

EVOI is higher when you have more resources compared with the other agent

The signs of the terms in the formula for differences in expected values for your actions upon learning more are (with some unusual exceptions) always positive, because more information can (usually) only make your decisions at least as good as they were before. If you have more resources (e.g., money, supporters, influence), then the magnitude of impact you can have is bigger, so these positive terms for the expected value of the information on your decision are big compared with the impact on the other agent's decision.

EVOI is higher when, before gathering new information, the other agent adopts the opposite policy from you

The intuition here is that if the other agent is currently opposed to you, then you have nothing to lose but possibly something to gain by shuffling each of your policy stances. In particular, the "difference in expected values for the other agent's action given that he switched and you didn't" is positive, because this means the new information caused the other guy to come around to your policy stance after all. As usual, "difference in expected values for your action given that you switched and the other agent didn't" is also positive because you're the one deciding to switch. Unfortunately, "difference in expected values for the other agent's action given that you both switched" is negative, because when you both switch, you're once again each holding opposite policy stances. However, most of the time P(you don't switch and other agent does) > P(you switch and other agent does also), the reason being that an agent is less likely to switch than to keep its current policy stance in ex ante expectation, because if that weren't true, it should have switched already. Therefore, the positive term generally has higher probabilistic weight than the negative term. (Exceptions can occur if the other agent has values very much antithetical to yours.)

In particular, suppose that whether you switch is independent of whether the other agent switches. Say there's a 10% chance that you each switch. Then the formula is

EVOI = (0.1)(0.9) (difference in expected values for your action given that you switched and the other agent didn't) + (0.9)(0.1) (difference in expected values for the other agent's action given that he switched and you didn't) + (0.1)(0.1) [(difference in expected values for your action given that you both switched) + (difference in expected values for the other agent's action given that you both switched)]
= 0.09 (some positive number) + 0.09 (some positive number) + 0.01 [(some positive number) + (some negative number)].

From this we can also see that it's more certain the total sign of EVOI is positive when switching is unlikely. However, the raw magnitude of EVOI will tend to be higher when switching is more likely.

If the other agent currently agrees with your policy stance, then the above conclusions are mostly reversed, because now the "difference in expected values for the other agent's action given that he switched and you didn't" term is negative. In other words, if you're currently allied with another agent who doesn't share your values but does share your policy conclusions, generating information that the other agent will also learn from may be harmful.

EVOI is higher when information tends more often to cause your policy stance to match that of the other agent

It's easy to imagine that this should be true because, for example, if the other agent were a copy of you, then it would be as good for it to learn more as for you to learn more, assuming parity in resources between you and your copy. If the other agent had very similar values, then it would be almost as good for it to learn more as for you to learn more. And if the other agent had the exact opposite values (e.g., you want to create staples and it wants to destroy staples), then allowing it to learn information will almost always be harmful, because then it can better do exactly the wrong thing by your values.

We can see how alignment of your policy stances with the other agent's plays out through the probability terms in the EVOI formula. In particular, say the other agent currently disagrees with you but has similar values, such that we expect that more information will tend to make you both agree more with each other on policy. In that case, P(you switch and other agent doesn't) > P(you switch) P(other agent doesn't) because if you switch to the other policy stance (the one currently held by the other agent), it's less likely the other agent will simultaneously switch to your old policy stance because you tend to converge in policy stances upon learning more. This probability is multiplied by a positive term in the formula, because the expected change due to your switching policy given that you choose to switch is (almost) always positive. Secondly, P(you don't switch and other agent does) > P(you don't switch) P(other agent does), again because you tend to converge on learning more. This probability is also multiplied by a positive term in the formula, because if the other agent switches and you didn't, that probably means he's coming around to seeing that your policy is right. Finally, P(you switch and other agent does also) < P(you switch) P(other agent switches). This probability is multiplied by a possibly negative term in the formula, because if the other agent and you both switch, it means the other agent seems to be adopting the wrong new stance. Thus, the positive terms are multiplied by bigger probabilities and the possibly negative terms are multiplied by smaller probabilities when you and the other agent tend to converge on policy stances. A similar analysis can show the same idea for an agent who starts out with the same policy stance as you.

Cooperative research?

If it seems that acquiring more shared information is expected to be significantly positive, then competing agents may wish to work together on research, each contributing some funds and staff to the task. In the case of two agents contributing equally, this means the cost of the EVOI is only half what it would have been to gather privately. Of course, in some cases, when the expected value of shared information is very low or negative, the EVOI for private research will be higher per dollar.

Should NLUs do research with PLUs?

What does the above analysis suggest about whether negative-leaning utilitarians (NLUs) should collaborate on fundamental research with positive-leaning utilitarians (PLUs)? Let's adopt the NLU stance for analysis.

The first point was that "EVOI is higher when you have more resources compared with the other agent." NLUs currently have many fewer resources than PLUs, utilitarians who care more about pleasure. This means that the benefits of information to the NLUs themselves may be small compared with their usefulness to the PLUs, although exactly how much this is true depends on the generality of the questions being studied. This doesn't automatically mean the EVOI is likely to be negative; it just means we can't rely as much on the value to the NLUs themselves to keep it positive.

The second point, "EVOI is higher when, before gathering new information, the other agent adopts the opposite policy from you," is sometimes true and sometimes false for NLUs vs. PLUs. For example, both camps tend to agree on something like phasing out factory farming, but the sides may differ on something like reducing certain types of extinction risk. To the extent NLUs and PLUs do differ on big issues, the NLUs may find it advantageous for each side to learn more. If either the NLUs see they were wrong or the PLUs see they were wrong, that's a win for the NLUs (and for the PLUs too).

The third point, "EVOI is higher when information tends more often to cause your policy stance to match that of the other agent," has unclear implications. On the one hand, PLUs do care about reducing suffering like the NLUs do. However, in practice, the most overwhelming ways to promote happiness aren't to reduce suffering directly but just to create new pleasure through space colonization and such. The NLUs generally oppose this because more computational resources mean more possibility for suffering. This is a pretty fundamental split, and insofar as we can "pretend" that the PLUs value creating computational resources while the NLUs value eliminating them, we have a situation of two fundamentally opposed value systems, such that the NLUs and PLUs will, other things being equal, act in symmetrically opposite ways. Insofar as that's the case, shared information hurts the NLUs by helping the PLUs.

Other considerations

Information has broad, long-term effects

What you learn now not only influences the policy question that you're considering in the present but informs all future policy questions as well. Unfortunately, EVOI calculations become harder when considering potential future policies too. That said, if you expect the information to eventually be discovered anyway, such that discovering it now just brings it forward in time, then you can ignore the effects on policy questions in the far future after the information would have been discovered anyway.

Type of research matters

Until this point I've been discussing research whose principal effect is to modify one's policy positions. Of course, some research can primarily target other areas, like productivity, strategic thinking, etc. In these cases, shared information is mostly good insofar as it reaches you and those who agree with you on policies/values and mostly bad insofar as it reaches those who disagree with you on policies/values. In the short run, what matters is whether the people disagree with you on policies, because the information will tend to amplify whatever actions they're taking. In the long run, it's more important that it helps those who share your values if you expect to eventually converge on policy questions with them.

Collaboration may promote good will

Even if you're uncertain whether shared research is a good idea from a narrow self-interested perspective, sharing may be a wise idea because it promotes rapport with the other agents. Common-sense heuristics suggest that sharing information is good, which provides some weak prior in that direction, although this heuristic may not have been developed with minority value systems in mind. See the next section for more on this.

Sharing makes you better known and demonstrates the quality of your work

Having more public content generates more traffic and more discussion. Producing more public work allows donors, supporters, and onlookers to see what you've accomplished. These are substantial reasons in favor of making something public even if there's no other motivation to do so.

Sharing allows for more comments and insights from more people

This is one of the reasons I like public conversations. If you have an idea, airing it out to a broader audience allows more feedback. This can multiply the impact of your research in either a positive or negative direction, but especially if the research topic is narrow and mostly centered on your values (e.g., an open problem in the field of reducing wild-animal suffering), then this effect will be mostly to your benefit.

Upshot of these factors

I think the last two considerations here are significant and suggest that sharing should be the default unless there's strong reason to believe it's harmful. If you're going to do the effort to produce research, you may as well reap the reciprocity and popularity benefits that sharing would provide.

Can we make information always Pareto-improving?

In general, there's a strong intuition that more information should almost always be better, because (ignoring information overload, bounded rationality, the paradox of choice, and other limitations on finite humans) more information helps you accomplish your goals more effectively. There's a sort of surplus that's taken from the environment and given to the agent that's making smarter moves.

What if there are two agents with correlated or orthogonal goals/subgoals? In this case, the information helps both of them get more of what they want, and they should both be willing to pay some resources to acquire that additional information. So far so good.

The main problem comes when the goals of the agents are somewhat anticorrelated, such that when agent A is more effective, agent B's values get hurt in the process. In this case agent B may want to avoid the information from being discovered, even if it might benefit some from the information in isolation.

To take an example, suppose agent B wants to study a problem that, if figured out, would yield $1000 in value by improving its actions. Unfortunately, the information would also inform agent A, which has somewhat anticorrelated values. From agent A's perspective, the change in action is worth $10,000, but to agent B, the change in agent A's actions costs $3000 of value by its lights. This is a case of a positive "information externality": Privately, agent B prefers not to uncover the information on balance (because $1000 - $3000 < $0), but from a social perspective, the sum of agent A's and agent B's values from the information is positive.

What to do? Well, the word "externality" reminds us of the Coase theorem, which suggests we should be able to achieve an efficient outcome in the absence of transactions costs. So, for example, agent B could charge agent A, say, $6000 for the information. Agent A should be willing to pay this, because it still nets $10,000 - $6000 = $4000 of value, and agent B now gets $6000 + $1000 - $3000 = $4000 as well.

Of course, making this work in practice is a lot more difficult. Often it's not feasible or sensible to charge in this way for research findings. In most situations, I would tend to rely on the sharing of information as incrementing a mental ledger that the other side feels about how much I've helped them and hence how much they want to help me in return. It's worth exploring how to make these socially net-positive transactions work more often.

Note that

I think(?) the Coase theorem relies on the assumption of transferable utility, which may not always be satisfied in practice.
There may be cases where it's still best not to share information. For example, suppose A benefited in the amount of only $3000 from B's information, while B would have lost $10,000 in value due to A's changed actions in light of the information. In this case, A is not willing to compensate B for the full cost of the information by B's values, and indeed the total social value of the information (sum of A's and B's values) is negative. My intuition is that these cases are more rare than those in which A benefits more than B loses.

Footnotes

To get a sense of how the this formula derives from the previous one, imagine that the only two information sets that would lead to action a_k being optimal are i₁ and i₂. The part of the previous EVOI formula for i₁ and i₂ was

P(i₁) [EV(a_k | i₁) - EV(a_o | i₁)] + P(i₂) [EV(a_k | i₂) - EV(a_o | i₂)].

Let's focus on the EV(a_k | ...) terms.

EV(a_k | i₁) = sum_w V(a_k | w) P(w | i₁)
= sum_w V(a_k | w) P(w and i₁) / P(i₁).

A similar expression applies for EV(a_k | i₂). Thus,

[ EV(a_k | i₁) P(i₁) + EV(a_k | i₂) P(i₂) ] / [ P(i₁) + P(i₂) ]
= [ sum_w V(a_k | w) P(w and i₁) + sum_w V(a_k | w) P(w and i₂) ] / [ P(i₁) + P(i₂) ]
= { sum_w V(a_k | w) [ P(w and i₁) + P(w and i₂) ] } / [ P(i₁) + P(i₂) ].

Since i₁ and i₂ are disjoint, maximally specific information sets, P(i₁) + P(i₂) = P(i₁ or i₂) = P(choose a_k), and hence the previous equation becomes

[ sum_w V(a_k | w) P( w and (i₁ or i₂) ) ] / P(choose a_k)
= [ sum_w V(a_k | w) P(w and choose a_k) ] / P(choose a_k)
= sum_w V(a_k | w) P(w | choose a_k)
= EV(a_k | choose a_k).

Multiplying this by [ P(i₁) + P(i₂) ] = P(choose a_k) gives us the first half of the equation to be proved. Roughly the same argument works for the second half of the equation—the EV(a_o | choose a_k) part. (back)