In this chart (detail above, full version below) I’ve tried to capture the decision process I go through to select the most promising statistical or machine learning technique given the problem and the data.
It’s a heuristic in the sense given in Wikipedia:
A heuristic technique often called simply a heuristic, is any approach to problem solving, learning, or discovery that employs a practical methodology not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. (Wikipedia)
It certainly isn’t perfect but it is practical! In particular it’s worth bearing in mind that
- It does not cover the tests you’d need to go through to establish whether a technique is being applied correctly. Also where a technique is sophisticated I’d probably start with something simpler and then work towards the more complex technique.
- There are of course many other available techniques but these are ones I use a lot.
- Some personal preferences are also built in. For example I tend to go for a Bayesian model whenever the problem does not call for a model using a linear combination of explanatory variables as I find it easier to think about the more unusual cases in this way.
This diagram was made with fantastic draw.io. Click into it for the full version.