Stylized Model for an Agent's Risk Aversion with Respect to Lifespan

By Brian Tomasik

First written: 30 Jun 2013. Last nontrivial update: 30 Jun 2013.

Summary

In a stylized model where an agent risks only death in order to achieve variable rewards at each time step, the agent will be more cautious at younger ages, because it has more potential future fitness to lose by dying early.

Summary
Setup
Expected fitness calculations
Summary so far
Graphs

Setup

Consider an agent organism trying to maximize its expected future fitness. Fitness ultimately comes from bearing and possibly raising successful offspring, but let's also assign more intermediary "reward" values when the organism accomplishes something that helps it achieve it's final goal, e.g., gathering food, having sex with a good partner, saving its baby from a predator, etc. Denote the reward that an organism might accomplish at time t as r_t. At each timestep, an organism is presented with a possible reward drawn from some probability distribution. It doesn't know what the potential reward for that time step will be until it gets there, although it does know the distribution from which they're drawn. In order to gain this reward, the organism may need to bear a risk. Suppose that the only risk is that the organism will die and lose its potential future rewards, with probability 1-p_t. Thus, the organism successfully achieves its reward with probability p_t. For simplicity, assume that p_t has a constant value p at all times. At any given timestep, the organism can also choose not to seek the reward and thereby to avoid risk of death at that time.

If it doesn't die prematurely due to risk-taking, the organism has a maximum possible lifespan of L timesteps. At a given time t, the organism can imagine what it thinks its expected remaining fitness will be at a present or future time T. Denote this expected fitness by F_T|t. In order to do this calculation, the organism will need to guess what its future rewards will be as well, since it only learns their actual values when the time comes. Denote the expected value of a reward at time T viewed from the standpoint of time t as r_T|t. When T=t, the organism knows the actual value r_t of the reward.

At each time t = 1, ..., L, the organism chooses to take the risk or not depending on which option gives a higher expected value of future fitness at that time, i.e.:

F_t|t = max[F_t+1|t, p(r_t|t + F_t+1|t)]. (1)

This says the organism will either wait it out until t+1 and just get an expected F_t+1|t from there onwards, or else it will take a risk now, and with probability p, it will gain a reward now, r_t|t, and will also live to have expected future fitness F_t+1|t. With probability 1-p, the organism dies and has 0 future fitness.

Expected fitness calculations

F_L+1|L = 0, because once you're dead, you have no more fitness. Then

F_L|L = max[F_L+1|L, p(r_L|L + F_L+1|L)] = max[0, p(r_L|L + 0)] = pr_L|L.

In other words, in your last day of life, you have nothing to lose by dying, so you may as well take a risk for the reward.

What if you're at t=L-1 looking forward to t=L? It's certain that you'll take the risk when you get to t=L, so the max operation will always pick out the second argument:

F_L|L-1 = p(r_L|L-1 + F_L+1|L-1) = pr_L|L-1.

The only difference from what we had before is that we need to use an expected value for the reward, r_L|L-1, because we don't yet know the actual value, r_L|L.

In this time step, the decision of whether to take a risk for the reward r_L-1|L-1 depends on whether

p(r_L-1|L-1 + pr_L|L-1) > pr_L|L-1, or
r_L-1|L-1 > (1-p)r_L|L-1.

I so far haven't specified the known distribution from which rewards are sampled, but I now need to in order to make things tractable from this point forward. Assume that rewards are uniform on the unit interval [0,1]. The expected value is thus 1/2, which means that r_a|b = 1/2 for any a > b, because until the time of the reward comes, you can only guess what it will be.

Now the condition for taking a risk becomes

r_L-1|L-1 > (1-p)/2.

Next consider F_L-1|L-2. We don't know what r_L-1|L-1 will turn out to be, so we can't say whether we'll take that risk at t=L-1. However, we do know that we'll take the risk iff r_L-1|L-1 > (1-p)/2, and since r_L-1|L-2 is uniform on [0,1], this has probability 1-(1-p)/2 = (1+p)/2. Then

F_L-1|L-2 = (chance choose to wait at t=L-1)(expected value if wait) + (chance choose to take a risk at t=L-1)(expected value if take the risk)
= [(1-p)/2]pr_L|L-2 + [(1+p)/2]p[(expected value of r_L-1|L-1 given that you took the risk) + pr_L|L-2].

Here we can substitute r_L|L-2=1/2. The expected value of r_L-1|L-1 given that you took the risk, is the expected value of r_L-1|L-1 given that r_L-1|L-1 > (1-p)/2. Since r_L-1|L-1 is uniform, this expected value is [1+(1-p)/2]/2 = (3-p)/4. Substituting in:

F_L-1|L-2 = [(1-p)/2]p(1/2) + [(1+p)/2]p[(3-p)/4 + p(1/2)]
= (p-p²)/4 + (3p+4p²+p³)/8
= (5p+2p²+p³)/8.

Continuing on:

We take the risk iff

p(r_L-2|L-2 + (5p+2p²+p³)/8) > (5p+2p²+p³)/8, or
r_L-2|L-2 > (5+2p+p²)/8 - (5p+2p²+p³)/8, or
r_L-2|L-2 > (5-3p-p²-p³)/8.

Then

F_L-2|L-3 = (chance choose to wait at t=L-2)(expected value if wait) + (chance choose to take a risk at t=L-2)(expected value if take the risk)
= [(5-3p-p²-p³)/8][(5p+2p²+p³)/8] + [1-(5-3p-p²-p³)/8]p[(1+(5-3p-p²-p³)/8)/2 + (5p+2p²+p³)/8]
= 89p/128 + 25p²/64 + 31p³/128 + 3p⁴/32 + 7p⁵/128 + p⁶/64 + p⁷/128.

Summary so far


Variable	Formula	Value if p=1	Value if p=0.95
F_L\|L-1	p/2	1/2	0.475
F_L-1\|L-2	(5p+2p²+p³)/8	1	0.856
F_L-2\|L-3	89p/128 + 25p²/64 + 31p³/128 + 3p⁴/32 + 7p⁵/128 + p⁶/64 + p⁷/128	3/2	1.225

We can see how, for p<1, the increase in expected reward is not completely linear as age increases. This is because as the organism gets older, it needs to become more cautious about taking risks—only doing so when there's a bigger potential reward that can justify the occasion.

Graphs

Our analytical formulas quickly became intractable, but the problem is actually easy to represent computationally. In an Excel workbook, I created a column of random [0,1] rewards and then backwards from the end of the organism's life applied equation (1) at each timestep.

The following graphs show runs each with a single column of random numbers.
p=0.9
p=0.99
p=0.6
The flat regions are where the organism chooses not to take a risk because the reward at that point is too low.

These figures show only actual rewards for a particular run rather than expected rewards over all possible runs. The following graph shows an average of the future rewards over 10 different random columns.
p=0.9,runs=10
The graph is smoother because it shows expected values, but it's also noticeably concave because risk aversion grows as the organism has more potential future time ahead of it. In fact, we can establish an upper bound for how high the curve can get: p/(1-p). This is because if you had an indefinite potential lifespan L, the best you could do would be to wait until a reward of 1 came and then take the risk at that point. The expected value for the number of rewards you'd win before dying follows a geometric distribution and has mean p/(1-p).

Summary

Contents

Setup

Expected fitness calculations

Summary so far

Graphs