February 13, 2017 Simon Raper

Concept map for Spark and Hadoop

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Here is a concept map I use in some of our Spark workshops. I find these diagrams very useful when a topic is particularly crowded with different tools, techniques and ideas. It gives a zoomed out view which you can refer back to when you start to get lost.

To read the diagram pick a concept, read off the description underneath and then continue the sentence using one of the arrows. So for example “EMR is a web-based service that allows you to efficiently process large data sets by … running on a cluster of computers built with … EC2”

Click into the image to get a zoomable version else you won’t be able to read the text!


About the Author

Simon Raper I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. I am the founder of Coppelia an analytics startup that uses agile methods to bring machine learning and other cutting edge statistical techniques to businesses that are looking to extract value from their data. My current interests are in scalable machine learning (Mahout, spark, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Channel 4, Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 310 K visits and appeared in the online editions of The New York Times and The New Yorker. I am a regular speaker at conferences and events.

Machine Learning and Analytics based in London, UK