Scoring a Neural Net using R on AWS

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

nnet scoring plot

One of the drawbacks with R has been its limitation with big datasets. It stores everything in RAM so once you have more than 100K records your PC really starts to slow down. However, since AWS allows you to use any size machine, you could now consider using R for scoring out your models on larger datasets. Just fire up a meaty EC2 with the RStudio amazon machine image (AMI) and off you go.

With this in mind I wondered how long it would take to score up a Neural Net depending on how many variables were involved and how many records you need to score out. There was only one way to find out.

Read more

Two Quick Recipes: Ubuntu and Hadoop

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

There are so many flavours of everything and things are changing so quickly that I find every task researched online ends up being a set of instructions stitched together from several blogs and forums. Here’s a couple of recent ones.

Ubuntu on AWS (50 mins)

Was going to buy a new laptop but it made more sense to set up a linux instance on AWS and remote in (a quarter of the price and more interesting). Here’s my recipe

    1. As in Mark’s earlier post set yourself up with an AWS account and a key pair by following this tutorial
    2. Launch an Ubuntu instance using the EC2 management console and select memory and processing power to suit.
    3. Start up the instance then connect to it by using Mindterm (very useful alternative to SSHing in with putty). To do this just select the instance in the terminal. Select Actions and then Connect. (You’ll need to provide the path to your saved key)
    4. Now you probably want to remote into your machine. Do this by setting up NoMachineNX following steps 2 to 4 in the following post
    5. However when you execute the last line of step 2 you’ll find that nxsetup is not found. To fix this switch to this post and follow steps 6-7 (life’s so complicated)

    6. Change password authentication to yes in  /etc/ssh/sshd_config
    7. Add gnome fall back

sudo apt-get install gnome-session-fallback

  1. Restart the instance and log in

Just remember to keep an eye on the charges!

Single Cluster Hadoop on Ubuntu (20 mins)

Of course you can run Hadoop directly on Amazon’s EMR platform but if you want to get more of a feel for how it works in a familiar environment you can set it up on a single instance.

  1. Follow the instructions in this post substituting in the latest hadoop stable release
  2. Install the latest JDK sudo apt-get install openjdk-7-jdk
  3. Set the JAVA_HOME path variable export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 Substituting in the path to the JDK binaries
  4. From the Hadoop quick start guide follow the instructions in the “Prepare to start the Hadoop Cluster” and “Stand Alone Operations” sections. If this all works you should be ready to go.


Machine Learning and Analytics based in London, UK