July 29, 2015

A post by Duncan Stoddard. It first appeared among lots of other great posts at www.sparepiece.com

Bayesian analysis is perfect in situations when data are scarce and experience and intuition are plentiful. We can combine available data with the collective intelligence of a team in the form of parameter priors in a Bayesian model to test hypotheses and inform strategic decision making.

Let’s imagine we want to build a model that explains the drivers of a binary dependent variable, e.g. ‘prospect buys or doesn’t buy product x’, in order to test hypotheses about what makes a good prospective client and to score prospective client lists. We’ve got some descriptive data on past acquisitions, e.g. prospect geography, age, industry and some behavioural data, e.g. ‘exposed to ad A’, ‘responded to email campaign B’ etc. We’ve also got the collective knowledge of a client services team. We’re going to combine these in a Bayesian logistic regression model using the MCMCpack package in R.

1. Generating coefficient priors

First we need to get each member of the team to (independently) place weights on a list of candidate explanatory variables. To keep this simple we’ll allow integer values on [-10,10] and then convert these to usable priors.

The coefficient in a logistic regression can be interpreted as the log of the % increase in the odds of y=1 over a base value, which we can take to be the overall response rate. We can map our weights to a comparable value with the following logic:

Weight w to probability p:

$p = \frac{s+10}{20}$

The odds ratio:

$\frac{p}{1-p} = \frac{2+10}{10-s}$

The base probability:

$p_{0} = \frac{1}{n}\sum_{i}^{n}y_{i}$

Weight to prior coefficient:

$\beta = ln \left ( \frac{\frac{p}{1-p}}{\frac{p_{0}}{1-p_{0}}} \right ) = ln\left (\frac{(\omega+10)(1-p_{0})}{p_{0}(10-\omega)} \right)$

Once we’ve applied this formula we can calculate prior mean and precision values. These will then be read into R and used as the b0 and B0 arguments of the MCMClogit function. The b0 argument is a vector of length k+1, where k is the number of variables in the model, and B0 is a diagonal square matrix (assuming independence between coefficient distributions).

2. Normalising the data

The weighting process implicitly assumes variable values are on the same scale. We therefore need to normalise the data to ensure this is so. We can do that simply  with:

${x_i}'=\frac{x_i - min(x)}{max(x) - min(x)}$

3. Running the model

The MCMClogit(formula, data, b0, B0) function gives posterior distributions for each beta, assuming a multivariate normal prior. We can use summary(posterior) to get the posterior mean and standard deviation values, and plot(posterior) to get distribution plots.

The data updates our prior knowledge and pulls our estimates towards a ‘truer’ representation of reality.

Our posterior means will serve two purposes:

a) parameterise a scoring equation for predicting response rates conditional on prospect characteristics.

b) provide insights on the characteristics of good prospective clients and test decision makers’ hypotheses.

4. Building a scoring equation

Summing the product of our posterior means and normalised data will give the log of the odds ratio. We can convert this into response probabilities like this:

$ln\frac{s}{1-s} = \sum_{i}^{k}\hat{\beta }_ix_i$

$p = \frac{e^{\sum_{i}^{k}\hat{\beta _i}x_i}}{1+e^{\sum_{i}^{k}\hat{\beta _i}x_i}}$

s represents the expected response probability given a vector of characteristics x. We can apply this to our prospect lists and rank to find the best targets for marketing.

5. Converting posterior means back to scores

Lastly, we want to convert our posterior mean estimates back to score integers on [-10,10] so we can report back results in a form they’re now familiar with. We can do this easily with the following:

$\omega = \frac{10e^{\hat{\beta }}+10}{1+e^{\hat{\beta }}}$