Dendrograms in R2D3

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Hi, I’m Andrew and this is my first post for Coppelia! If you like the look of this feel free to visit my blog dinner with data (and see what happens when a data scientist hits the kitchen!)

I was excited by James’s last post on the new package R2D3, and I thought I would try to help further develop the package. This is a great new package, built by James Thomson (and in collaboration with myself and Simon Raper at Coppelia) that utilises D3 visualisations inside R. You can quickly create very striking visualisations with a just a few lines of code. This has recently been shared with a recent post, but since then a couple of updates have been made to increase the functionality.

In particular to the function D3Dendro, which creates dendrograms based on a hclust object in R. I had been working on a number of alternatives to the usual static dendrogram found in the package so far, so I thought I would add these in and describe them below.

I have created two new distinct functionalities:

  • Collapsible nodes
  • Radial output (rather than the more traditional ‘linear’ dendrogram)

You can clone the package from James’s github repository or run the following in R:


install.packages("devtools")
library(devtools)
install_github("jamesthomson/R2D3")
library(R2D3)

I will include the example in the original post, so you can easily compare the differences.

Original dendrogram:


hc < - hclust(dist(USArrests), "ave") JSON<-jsonHC(hc) D3Dendro(JSON, file_out="USArrests_Dendo.html")

Read more

Introducing R2D3

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

R2D3 is a new package for R I’ve been working on. As the name suggests this package uses R to produce D3 visualisations. It builds on some work I previously blogged about here.

There are some similar packages out there on CRAN already. Notably rjson and d3Network. However I found with these packages that they covered parts of the process (creating a json or creating a D3) but not the whole process and not ensuring the json was in the right format for the D3. So that was the thinking with this package. I was the aiming to create an end to end process for converting R objects into D3 visualisations. When i mentioned it to [email protected] he was keen to contribute. So we’ve been collaborating on it over the last few weeks. Its by no means finished, but I think it contains enough that its worth sharing.

Read more

A new home for pifreak

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

pifreak is my twitterbot. It started tweeting the digits of pi in April 2012 and has tweeted the next 140 digits at 3:14 pm GMT every day since. Not especially useful or popular (only 48 followers) but I’ve grown fond of she/he/it.

Screen Shot 2014-08-19 at 14.27.11

I was housing her on an AWS ec2 micro instance, however my one year of free hire ran out and it has become a little too expensive to keep that box running.

So I’ve been looking at alternatives. I’ve settled on the google app engine which I’m hoping is going to come out as pretty close to free hosting.

So here’s a few notes for anyone else who might be thinking of using the google app engine for automated posting on twitter.

It was reasonably simple to set up

  1. Download the GAE python SDK. This provides a GUI for both testing your code locally and then deploying it to the cloud when you are happy with it.
  2. Create a new folder for your app and within that place your python modules together with an app.yaml file and a cron.yaml file which will configure the application and schedule your task respectively. It’s all very well documented here and for the cron scheduling here.
  3. Open the App Engine Launcher (which is effectively the SDK), add your folder, then either hit run to test locally or deploy to push to the cloud (you’ll be taken to some forms to register your app if you’ve not already done so)
  4. Finally if you click on dashboard from the launcher you’ll get lots of useful information about your web deployed app including error logs and the schedule for your tasks.

The things that caught me out were:

  1. Make sure that the application name in your app.yaml file is the same as the one you register with Google (when it takes you through to the form the first time you deploy.)
  2. There wasn’t a lot in the documentation about the use of the url field in both the cron and app yaml files. I ended up just putting a forward slash in both since in my very simple app the python module is in the root.
  3. Don’t forget module names are case sensitive so when you add your python module in the script section of the app file you’ll need to get this right.
  4. Yaml files follow an indentation protocol that is similar to python. You’ll need to ensure it’s all lined up correctly.
  5. Any third party libraries you need that are not included in this list will need to be included in your app folder. For example I had to include tweepy and some of its dependencies
  6. Where the third party library that you need is included in the GAE runtime environment you need to add it to the app file using the following syntax

    libraries:
    - name: ssl
    version: "latest"

And here finally is a link to the code.

Machine Learning and Analytics based in London, UK