June 16, 2014

# Distribution for the difference between two binomially distributed random variables

I was doing some simulation and I needed a distribution for the difference between two proportions. It’s not quite as straightforward as the difference between two normally distributed variables and since there wasn’t much online on the subject I thought it might be useful to share.

$X \sim Bin(n_1, p_1)$

$Y \sim Bin(n_2, p_2)$

We are looking for the probability mass function of $Z=X-Y$

First note that the min and max of the support of Z must be $(-n_2, n_1)$ since that covers the most extreme cases ($X=0$ and $Y=n_2$) and ($X=n_1$ and $Y=0$).

Then we need a modification of the binomial pmf so that it can cope with values outside of its support.

$m(k, n, p) = \binom {n} {k} p^k (1-p)^{n-k}$ when $k \leq n$ and 0 otherwise.

Then we need to define two cases

1. $Z \geq 0$
2. $latex Z < 0$ In the first case $latex p(z) = \sum_{i=0}^{n_1} m(i+z, n_1, p_1) m(i, n_2, p_2)$ since this covers all the ways in which X-Y could equal z. For example when z=1 this is reached when X=1 and Y=0 and X=2 and Y=1 and X=3 and Y=4 and so on. It also deals with cases that could not happen because of the values of $latex n_1$ and $latex n_2$. For example if $latex n_2 = 4$ then we cannot get Z=1 as a combination of X=4 and Y=5. In this case thanks to our modified binomial pmf the probablity is zero. For the second case we just reverse the roles. For example if z=-1 then this is reached when X=0 and Y=1, X=1 and Y=2 etc. \$latex p(z) = \sum_{i=0}^{n_2} m(i, n_1, p_1) m(i+z, n_2, p_2)[l\atex] Put them together and that's your pmf.

Here’s the function in R and a simulation to check it’s right (and it does work.)