November 7, 2014 James Thomson

Scoring a Neural Net using R on AWS

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

nnet scoring plot

One of the drawbacks with R has been its limitation with big datasets. It stores everything in RAM so once you have more than 100K records your PC really starts to slow down. However, since AWS allows you to use any size machine, you could now consider using R for scoring out your models on larger datasets. Just fire up a meaty EC2 with the RStudio amazon machine image (AMI) and off you go.

With this in mind I wondered how long it would take to score up a Neural Net depending on how many variables were involved and how many records you need to score out. There was only one way to find out.

If you’ve not done it before here’s some simple instructions on how to get an EC2 with R installed up and running, and then access it.

  1. Register with AWS
  2. Launch an EC2 with the RStudio AMI On “Step 6: Configure Security Group” make sure you set the type to ‘HTTP’ the ‘port’ to 80 and I’d suggest you set the source to ‘your IP’ to ensure only your IP can access it.
  3. Once the EC2 is running, access it by pasting the public IP of the Ec2 into a web browser, and enter rstudio as the username and password.

For my job I fired up a c3.8xlarge EC2, it has 60GB RAM, and cost me around $3 an hour.

Here’s the R script I ran. It generates a simple Neural Net model with 10 hidden nodes. The model is based on #x many random normal variables and then scores out on #y many records. I ran this across a variety of variables and records and timed how long the scoring part took. I ran it for 5 to 100 variables and 100 to 10,000,000 records.

Here’s the output plotted using ggplot

nnet scoring plot

As you can see you can score out 10M records with a 100 variable Neural Net in 6-7mins. Not too shabby.

Tagged: , , , , ,

About the Author

James Thomson I have over 10 years experience working in analytics and statistics. I worked as a Statistician in the Pharma industry before branching out into analytics and data mining. I'm currently enjoying learning about data visualisation and machine learning. Over years I've worked with: GSK, Nectar, BP, Ford, Whitbread, Wunderman, Jaguar, RNIB, Virgin Media & Channel4. Please visit my own blog on analytics and music

Machine Learning and Analytics based in London, UK