February 5, 2013 Simon Raper

Two Quick Recipes: Ubuntu and Hadoop

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

There are so many flavours of everything and things are changing so quickly that I find every task researched online ends up being a set of instructions stitched together from several blogs and forums. Here’s a couple of recent ones.

Ubuntu on AWS (50 mins)

Was going to buy a new laptop but it made more sense to set up a linux instance on AWS and remote in (a quarter of the price and more interesting). Here’s my recipe

    1. As in Mark’s earlier post set yourself up with an AWS account and a key pair by following this tutorial
    2. Launch an Ubuntu instance using the EC2 management console and select memory and processing power to suit.
    3. Start up the instance then connect to it by using Mindterm (very useful alternative to SSHing in with putty). To do this just select the instance in the terminal. Select Actions and then Connect. (You’ll need to provide the path to your saved key)
    4. Now you probably want to remote into your machine. Do this by setting up NoMachineNX following steps 2 to 4 in the following post
    5. However when you execute the last line of step 2 you’ll find that nxsetup is not found. To fix this switch to this post and follow steps 6-7 (life’s so complicated)

    6. Change password authentication to yes in  /etc/ssh/sshd_config
    7. Add gnome fall back

sudo apt-get install gnome-session-fallback

  1. Restart the instance and log in

Just remember to keep an eye on the charges!

Single Cluster Hadoop on Ubuntu (20 mins)

Of course you can run Hadoop directly on Amazon’s EMR platform but if you want to get more of a feel for how it works in a familiar environment you can set it up on a single instance.

  1. Follow the instructions in this post substituting in the latest hadoop stable release
  2. Install the latest JDK sudo apt-get install openjdk-7-jdk
  3. Set the JAVA_HOME path variable export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 Substituting in the path to the JDK binaries
  4. From the Hadoop quick start guide follow the instructions in the “Prepare to start the Hadoop Cluster” and “Stand Alone Operations” sections. If this all works you should be ready to go.


Tagged: , , , , ,

About the Author

Simon Raper I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. I am the founder of Coppelia an analytics startup that uses agile methods to bring machine learning and other cutting edge statistical techniques to businesses that are looking to extract value from their data. My current interests are in scalable machine learning (Mahout, spark, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Channel 4, Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 310 K visits and appeared in the online editions of The New York Times and The New Yorker. I am a regular speaker at conferences and events.

Comment (1)

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning and Analytics based in London, UK