What is a data science peer review?
Peer review is a mainstay of academic science. Before findings are published they must make it past a panel of peers who will check the work for errors and faulty thinking. This makes perfect sense: we are all fallible and research methods are only getting more complicated, especially where statistics is involved.
Yet in business, many millions are spent on data science and statistical modelling projects without even a second opinion, let alone a third or fourth. This can’t be good for business. Statistical reasoning is notoriously tricky providing ample opportunity for practitioners to misinterpret false positives: results that appear to be interesting but are really no more than random patterns in the data. The repercussions can be costly: a wrong business decision, the launch of a poorly performing algorithm or the continuation of an expensive project that is unlikely to deliver.
We’d like to change this. For a fraction of the cost of a data science project, Coppelia now offers a set of services that are designed to work alongside larger endeavours to provide an additional layer of rigour. We do so in a way that is friendly, non-judgmental, and collaborative. Where models or processes fall short we work with your teams to explore alternative approaches.
What does it involve?
We can peer review whole projects or just aspects of project or a pitch. In particular we offer:
A comprehensive review of the methodology and results obtained for data science and statistical modelling projects
In particular, we check:
- The soundness of claims made on the basis of the data, looking especially for common pitfalls in statistical thinking.
- The appropriateness of the techniques used – often models are unnecessarily complex: a simpler approach can yield comparable results with greater efficiency.
- The strength of alternative models and explanations (your model may make a strong case for strategy A but perhaps there is an even stronger case for strategy B!)
- The sensitivity of results to starting assumptions (if the conclusions of a project vary wildly when small changes are made to your assumptions, then you should be concerned.)
- That machine learning algorithms have been appropriately trained and that predictions about their performance are sound.
- That the limitations of project findings, in particular levels of uncertainty, have been successfully communicated.
A lot of time and money can be saved by running simulations and constructing simple what-if models ahead of the work to gauge the likelihood that the project will succeed.
The de-jargoning of business presentations and pitches
We ask questions that will pin down what is actually being implemented. *If the solution that is being presented to you as cutting edge AI has in fact been around since the 60s we feel you have a right to know!**
Replication of data science projects
Reproducing the work using alternative methods and on different systems is an excellent way to double check the findings of a project.
The highlighting of ethical implications of a project
These might not be obvious. For example: does an algorithm contain any unintended biases causing to discriminate against particular groups.
We apply these services across a wide range of techniques and approaches including machine learning (predictive models and clustering solutions), statistical modelling in general, optimisation models, econometric modelling and the analysis of survey data.
If you feel that your project might benefit from the data science peer review service, we’d love to hear from you!