Include uncertainty in a financial model

On-line resources for analysts are often either too general to be of practical use or too specialised to be accessible. The aim of ragscripts.com is to remedy this by providing start to finish directions for complex analytical tasks. The site is under construction at the moment but any feedback is most welcome!

The problem

You’ve been asked to calculate some figure or other (e.g. end of year revenue, average customer lifetime value) based on numbers supplied from various parts of the business. You know how to make the calculation but what bothers you is that some of the figures going in are clearly approximations or at worst complete guesses. Therefore the final calculation will inherit this uncertainty. A decision will be made based on your final figure and more likely than not the decision makers will treat this number as a certainty. They won’t have any way of knowing how uncertain it is and may well draw the wrong conclusions. How do you get an understanding of the uncertainty around the final calculation and how do you explain it to the decision makers?

The standard approach in finance is to produce a set of best and worst case scenarios. These are tedious to produce and are often disregarded as unlikely outcomes. I think our approach is better as it gives a full picture of the uncertainty around an estimate.

What you’ll need

• Basic R programming skills (including writing functions)
• An understanding of probability distributions
• About 1-3 hours (depending on the complexity of the model)

Example

Let’s say you are asked to project the revenue of a business in one year’s time based on the following numbers:

• Customer base at the beginning of the year: 50,000
• Average revenue per customer: £7
• Monthly acquisition rate: 5%
• Monthly churn rate: 2%

You might have thought that these figures could be known with some certainty but believe me it’s actually very rare especially in large businesses where integrating systems to get an overall picture is a tough job. Add to this the fact that we are guessing at how well any current figures are going to hold out over the year and we should agree than there’s a fair bit of uncertainty in the inputs.

With this is mind you ask the suppliers of the data for a lower and upper bound for the figures, i.e. the lowest and highest numbers that they would expect to see without being very surprised. Where they are unable or unwilling to do this you use your common sense. This gives you the following table:

 Statistic Estimate Lower Bound Upper Bound ARPC £7 £4 £10 Acquisition Rate 5% 3% 7% Churn Rate 2% 1% 3%

Do it

Step 1: Setting up the distributions

There’s lots of tools you could do this with but R seems most appropriate so start up a session. We are going to model the our uncertainty about the input values in each case as a normal distribution. We will say that the mean of the distribution is the value the business has estimated and that the 5% and 95% quantiles are the upper bounds and lower bounds respectively.

Yes this is subjective (we could have chosen other values for the bounds) but the point of this exercise is not to make incredibly accurate statistical inferences rather it is to show the consequences that different levels of uncertainty in the inputs have on the output. You can play with these parameters if you wish and say things like if we were this accurate with our inputs then we would be this accurate with our output

Let’s start by calculating the parameters (mean, standard deviation) for the three normal distributions we’ll need.

Let $q_{95}$ be the 95th quantile. Then

$q_{95} = \mu + 1.645 \sigma$. Therefore $\sigma = \frac{q_{95}-\mu}{1.645}$

#Work out the standard deviation for arpu
arpu.sd<-3/1.645
#Plot the resulting distribution
x<-seq(0, 15,by=0.5)
d<-dnorm(x, 7, arpu.sd)
plot(x, d, type='l')


This should give a plot that meets our expectations and it does:

We’ll complete this step by doing the same for acquisition and churn.

#Do the same for acquisition and churn
acq.sd<-0.02/1.645
x<-seq(0, 0.2,by=0.001)
d<-dnorm(x, 0.05, acq.sd)
plot(x, d, type='l')
ch.sd<-0.01/1.645
x<-seq(0, 0.2,by=0.001)
d<-dnorm(x, 0.02, ch.sd)
plot(x, d, type='l')

Step 2: A function for calculating revenue

The next piece you need is a function for calculating the output (the statistic of interest) from the inputs. This is what would usually be captured in the formulae of an excel spreadsheet.

For our example it looks like this:

revenue<-function(arpu, acq, ch){
num.cust<-50000
for (m in 1:12){
num.cust<-num.cust+acq*num.cust-ch*num.cust
}
return(num.cust*arpu)
}

Step 3: Making random draws from our belief distributions

We will now generate 10 thousand values from our each of normally distributed input variables to create 10 thousand simulations. Each of these simulations represents a possible set of inputs to our model. For example one set might be

• Average revenue per customer: £7.50
• Monthly acquisition rate: 3%
• Monthly churn rate: 1%

However the way we’ve set things up unlikely values for each input are unlikely to be drawn from our distributions. In fact since 10 thousand is a pretty large number the frequency at which each of the possible values is drawn should fairly well reflect our initial beliefs.

Here is our code for simulating the input values:

#Now let's simulate 10k values from each of our distributions
sim.arpu<-rnorm(10000, 7, arpu.sd)
sim.acq<-rnorm(10000, 0.05, acq.sd)
sim.ch<-rnorm(10000, 0.02, ch.sd)


Note a couple of things we haven’t accounted for: we’ve assumed that the our beliefs about our input variables are independent of one another but that might not be the case. We might think for some reason that if churn is higher then acquisition will be lower. If so we want to draw our values from a multivariate distribution incorporating correlation between the variables. However assuming independence is an easy and useful place to start. Also we’ve assumed that our model itself is correct – i.e. that constant rates of churn and acquisition apply and that there are no other effects (for example seasonal or economic.)

Step 4: Running them through the function

Now everything is place and it’s a simple matter to apply our function to each of the 10 thousand simulated sets of input variables. We use R’s mapply function.

sim.rev<-mapply(revenue, sim.arpu, sim.acq, sim.ch)

Step 5: Examining the results

Since our simulated input variables were distributed in a way that reflects our beliefs about their likely values, our final revenue calculations, based on these simulated inputs, will also have a distribution that reflect these uncertainties. The next few lines of code show us this distribution.

summary(sim.rev)
hist(sim.rev)
plot(density(sim.rev))


The histogram of the possible values for annual revenue makes the point of doing all of this quite clear. A straight calculation based on the original input values gives a revenue value of 499k. However a quick glance at the histogram shows that revenues as low as 400k are not at all improbable. Any planning should account for this!

See it

Here’s a quick JSfiddle to see how inputs affect outputs in the example case.

Explain it

So how do you explain what you have done and why you have done it?

Chances are you may run into the analyst’s catch 22 (see skit). You’ve done a much better job because you’ve incorporated uncertainty into your forecast allowing the decision maker to make a more informed decision. However people like certainty and the very fact you have done this may be perceived as incompetence What do you mean you don’t know what the revenue figure is going to be – go back and do it again!

So either you compromise your own integrity by glossing over the uncertainty or you’re branded an idiot or at least a boffin who overcomplicates things!

But there are some ways around this. Try to find out what decisions are being made using the data. Some of the frustration will come from being handed something that doesn’t solve the problem and the decision maker may have a point: if the range of accuracy is not enough to make a decision the model may need to be improved in some way.

Also talking about the decision and the problem from the outset is a good way of driving home the value of what you are doing. The very purpose of showing the uncertainty in the estimate is to prevent misinformed decisions being made.

The decision in our example might be whether or not to we can invest in a new product now given that we still need to be in profit by the end of the year. Our model as it is shows that if all other outgoings come to 150k then we’d be taking a risk spending over 50k on the new product, (since there’s a reasonable chance that revenue could be as low as 200k). If 50k was what were planning then we’ve done the job. If however a new product can’t be done for less than 100k then it’s worth spending some time improving our model.

Fork it

Take my code and adapt it for your own models. You can find the gist here

Visualising Shrinkage

A useful property of mixed effects and Bayesian hierarchical models is that lower level estimates are shrunk towards the more stable estimates further up the hierarchy.

To use a time honoured example you might be modelling the effect of a new teaching method on performance at the classroom level. Classes of 30 or so students are probably too small a sample to get useful results. In a hierarchical model the data are pooled so that all classes in a school are modelled together as a hierarchy and even all schools in a district.

At each level in this hierarchy an estimate for the efficiency of the teaching method is obtained. You will get an estimate for the school as a whole and for the district. You will even get estimates for the individual classes. These estimates will be weighted averages of the estimates for the class and the estimate for the school (which in turn is a weighted average of the estimate for the school and the district.) The clever part is that this weighting is itself determined by the data. Where a class is an outlier, and therefore the overall school average is less relevant, the estimate will be weighted towards the class. Where it is typical it will be weighted towards the school. This property is known as shrinkage.

I’m often interested in how much shrinkage is affecting my estimates and I want to see it. I’ve created this plot which I find useful. It’s done in R using ggplot and is very simple to code.

The idea is that the non shrunk estimates bi (i.e. the estimates that would be obtained by modelling classes individually) are plotted on along the line x=y at the points (bi, bi). The estimates they are being shrunk towards ai are plotted at the points (bi, ai). Finally we plot the shrunk estimates si at (bi, si) and connect the points with an arrow to illustrate the direction of the shrinkage.

Here is an example. You can see the extent of the shrinkage by the the distance covered by the arrow towards the higher level estimate.

Note the arrows do sometimes point away from the higher level estimate. This is because this data is for a single coefficient in a hierarchical regression model with multiple coefficients. Where other coefficients have been stabilized by shrinkage this causes this particular coefficient to be revised.

The R code is as follows:

# *--------------------------------------------------------------------
# | FUNCTION: shrinkPlot
# | Function for visualising shrinkage in hierarchical models
# *--------------------------------------------------------------------
# | Version |Date      |Programmer  |Details of Change
# |     01  |31/08/2013|Simon Raper |first version.
# *--------------------------------------------------------------------
# | INPUTS:  orig      Estimates obtained from individual level
# |                    modelling
# |          shrink    Estimates obtained from hierarchical modelling
# |          prior     Priors in Bayesian model or fixed effects in
# |                    mixed effects model (i.e. what it is shrinking
# |                    towards.
# |          window    Limits for the plot (as a vector)
# |
# *--------------------------------------------------------------------
# | OUTPUTS: A ggplot object
# *--------------------------------------------------------------------
# | DEPENDS: grid, ggplot2
# |
# *--------------------------------------------------------------------
library(ggplot)
library(grid)
shrinkPlot&lt;-function(orig, shrink, prior, window=NULL){
group&lt;-factor(signif(prior,3))
data&lt;-data.frame(orig, shrink, prior, group)
g&lt;-ggplot(data=data, aes(x=orig, xend=orig, y=orig, yend=shrink, col=group))
g2&lt;-g+geom_segment(arrow = arrow(length = unit(0.3, &quot;cm&quot;))) +geom_point(data=comp.in, aes(x=coef, y=mean))
g3&lt;-g2+xlab(&quot;Estimate&quot;)+ylab(&quot;Shrinkage&quot;)+ ggtitle(&quot;Shrinkage Plot&quot;)
if (is.null(window)==FALSE){
g3&lt;-g3+ylim(window)+xlim(window)
}
print(g3)
}


What are degrees of freedom?

I remember getting frustrated as an undergraduate trying to find straight answer to this question.

The standard text book answer is something like this:

"In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary"


That’s from Wikipedia but it’s fairly typical.

I could just about make sense of this for something like a chi-squared statistic but why, could someone explain to me, are the degrees of freedom for linear regression n-k-1?

I realise this doesn’t keep many people awake but it did me so I was pleased to find the following quote:

"The person who is unfamiliar with N-dimensional geometry or who knows the contributions to modern sampling theory
only from secondhand sources such as textbooks, this concept often seems almost mystical, with no practical meaning."
Walker, 1940


Many years later I’m nasty enough to use it as an interview question. In a kinder frame of mind I thought I’d post my slightly XKCD inspired notes for explaining, as simply as I can, the concept in terms of N-dimensional geometry.

I hope you can read my writing and apologies to mathematicians if the language is a bit loose!