A valuable tool in choice modelling is the Dirichlet-multinomial distribution. It’s a compound of the multinomial and Dirichlet distributions and it works like this:

- A choice between N options is modelled as a multinomial distribution with parameters θ
_{1}, θ_{2}, θ_{3}… θ_{N}, where the thetas also represent the probabilities of each option being chosen. For example we might model votes cast in an election as draws from a multinomial distribution with parameters θ_{1}=0.7, θ_{2}=0.2, θ_{3}=0.1. - However the multinomial distribution by itself is likely to be a poor model of the choices made within a population as it assumes all individuals select options with the same probabilities. It would be more realistic to say that the thetas themselves vary over the population. This gives us what is known as an over-dispersed distribution: the parameters for one distribution are modelled by another distribution. In this case we use a Dirichlet distribution, which is the multivariate version of a Beta distribution, to model the distribution of the thetas.

As we’ll be using it a lot here’s the probability density function for the Dirichlet distribution.

Where the normalising constant is:

One of the powerful things about the Dirichlet distribution as a modelling tool is that it allows us to capture not only the proportions of the population that opt for each choice but also the amount of switching between the choices from one draw to the next. To take our election example again, an election result of 70%, 20%, 10% for three parties could be modelled by Dirichlet distribution with alphas of 0.1, 0.029 and 0.014 or with alphas of 20, 5.71 and 2.86. In fact there are infinitely many possible settings of the alpha parameters that will produce this result. The difference between them is stability. If two successive elections produce the same result then this could be because the same people are voting for the same parties or, less likely but equally possible, people are switching their votes but in such a way that the net result is the same. Different settings of the alpha parameters produce different levels of switching.

A natural question is then: given a particular parametrisation of the Dirchlet distribution, what is the expected percentage of individuals that will switch category from one draw of the multinomial distribution to another?

I’m sure this has been worked out before somewhere but after a quick and fruitless trawl through the online literature I decided to do it myself, helped a lot by a great post from **Leo Alekseyev** who demonstrates a clever way of integrating over Dirichlet distributions. All I’ve done is adapt his technique.

(By the way to convert the latex in this post into a form easily used in wordpress I used an excellent python package from Luca Trevisan)

So let’s say we have *N* choices. For individual *i* the probability of picking choice *j* is θ_{ij}. What then is the probability that a randomly selected individual will make the same choice in two successive draws from a multinomial distribution? The individual could either select the first option twice or the second option twice or the third option twice etc. In other words the probability we are interested in is

We will call L the loyalty and work out expected switching as 1-E[L].

Leo Alekseyev has created an excellent video on you tube talking through his technique. I would recommend watching it if you would like to follow the arguments below. If you’re just interested in the end result then scroll to the end of the post.

The quantity we are interested in is where the come from a Dirichlet distribution with parameters To get the expected value of L we can use a generalised version of the Law of the Unconscious Statistician to adapt the proof given by Leo Alekseyev. As a reminder, the Law of the Unconscious Statistician is:

where is the probability distribution of the random variable X.

This will give us the following integral

So how do we evaluate this integral? It’s domain D is not straightforward as it is constrained by (i.e. the probabilities must sum to zero).

Leo Alekseyev shows us a trick using the Dirac delta function:

This (generalised) function, the limit of increasingly concentrated distributions, has an area of one beneath the curve (if you can call it that) at x=0 and an area of zero everywhere else – a slightly odd concept sometimes thought of as infinitely tall spike above the origin. The helpful thing for us is if we set and multiply the contents of the integral by the delta function then this is equivalent to evaluating the integral over D.

Since…

… the integral can be rewritten as:

If we group together like terms we reach:

Note the pattern in the exponents!

Continuing along the lines described by Leo Alekseyev we use the substitutions and to set us up for the Laplace transform and evalute the integral at to set us up for the inverse Laplace transform.

As a reminder the Laplace transform is:

So we can substitute it in giving us:

The Laplace transformation evaluates as:

Which we can substitute back into our integral:

Since we get:

Some rearranging gives us:

Now we use the inverse Laplace transform to evaluate as giving us:

Bringing the normalisation constant back in and cancelling out we get

which is our expected loyalty .

All we need to do now is check it’s right by comparing with a simulated result in R

comp<-NULL for (i in 1:100){ alphas<-runif(3, 0.01, 10) #Simulate sample.d1<-rdirichlet(10000, alphas) purchases<-t(mapply(function(x, y, z) rmultinom(1, size = 2, prob=c(x,y,z)), sample.d1[,1], sample.d1[,2], sample.d1[,3])) loyal<-sum(purchases[,1]==2)+sum(purchases[,2]==2)+sum(purchases[,3]==2) switching.sim<-(10000-loyal)/10000 #Derived switching.dir<-1-(sum(alphas*(alphas+1))/(sum(alphas)*(1+sum(alphas)))) comp<-rbind(comp, c(switching.sim, switching.dir)) } colnames(comp)<-c("Simulated", "Derived") plot(comp) x<-seq(0,1,0.1) lines(x,x)

A plot of the simulated as against the derived results shows that, as we would hope, they are approximately equal (the line is x=y)