Dendrograms in R2D3

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Hi, I’m Andrew and this is my first post for Coppelia! If you like the look of this feel free to visit my blog dinner with data (and see what happens when a data scientist hits the kitchen!)

I was excited by James’s last post on the new package R2D3, and I thought I would try to help further develop the package. This is a great new package, built by James Thomson (and in collaboration with myself and Simon Raper at Coppelia) that utilises D3 visualisations inside R. You can quickly create very striking visualisations with a just a few lines of code. This has recently been shared with a recent post, but since then a couple of updates have been made to increase the functionality.

In particular to the function D3Dendro, which creates dendrograms based on a hclust object in R. I had been working on a number of alternatives to the usual static dendrogram found in the package so far, so I thought I would add these in and describe them below.

I have created two new distinct functionalities:

  • Collapsible nodes
  • Radial output (rather than the more traditional ‘linear’ dendrogram)

You can clone the package from James’s github repository or run the following in R:


install.packages("devtools")
library(devtools)
install_github("jamesthomson/R2D3")
library(R2D3)

I will include the example in the original post, so you can easily compare the differences.

Original dendrogram:


hc < - hclust(dist(USArrests), "ave") JSON<-jsonHC(hc) D3Dendro(JSON, file_out="USArrests_Dendo.html")

Read more

Converting an R HClust object into a D3.js Dendrogram

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Hi all I’m James. This is my first blog for Coppelia. Thanks to Simon for encouraging me to do this.

I’ve been doing a lot of hierarchical clustering in R and have started to find the the standard dendrogram plot fairly unreadable once you have over a couple of hundred records. I’ve recently been introduced to the D3.js gallery and I wondered if I could hack something better together. I found this dendrogram I liked and started to play. I soon realised in order to get my data into it I needed a nested json. Read more

Lazy D3 on some astronomical data

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

I can’t claim to be anything near an expert on D3 (a JavaScript library for data visualisation) but being both greedy and lazy I wondered if I could get some nice results with minimum effort. In any case the hardest thing about D3 for a novice to the world of web design seems to be getting started at all so perhaps this post will be useful for getting people up and running.

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

The images above and below are visualisations using D3 of a classification hierarchy for astronomical objects provided by the IVOA (International Virtual Observatory Alliance). I take no credit for the layout. The designs are taken straight from the D3 examples gallery but I will show you how I got the environment set up and my data into the graphs. The process should be replicable for any hierarchical dataset stored in a similar fashion.

A dendrogram of the same data

A dendrogram of the same data

Even better than the static images are various interactive versions such as the rotating Reingold–Tilford Tree, the collapsible dendrogram and collapsible indented tree . These were all created fairly easily by substituting the astronomical object data for the data in the original examples. (I say fairly easily as you need to get the hierarchy into the right format but more on that later.)

First the environment.

If you are an R user with little experience of building web pages then you’ll probably find yourself squinting at the D3 documentation wondering how you get from the code to the output. With just a browser to read the JavaScript and an editor (notepad if you like but preferably a specialist HTML editor) you can get through the most of the tutorials referred to in Mark’s previous post . Doing this locally on your own pc works ok because the data visualised in the tutorials is mostly hard coded into the JavaScript. However once you want to start referring to data contained in external files some sensible security restrictions on what files your browser can access will block your attempts. The options are to turn these off (not advised) or switch to working on a webserver.

You can either opt for one of the many free hosting services or if you are feeling more adventurous you can follow Mark’s posts on setting up a Linux instance on Amazon Web Services and then follow Libby Hemphill’s instructions for starting up Apache and opening the port. Finally, going with the latter, I use FileZilla to transfer over any data I need to my Linux instance. See this post for getting it to work with your authentication key.

This should leave you in the following situation:

  • You have a working web server and an IP address that will take you to an HTML index page from which you can provide links to your D3 documents.
  • You have a way of transferring work that you do locally over to your web server.
  • You have a whole gallery of examples to play around with

Rather than creating my own D3 scripts I’m just going too substitute my own data into the examples. The issue here is that the hierarchical examples take as an input a nested JSON object. This means something like:

{
 &quot;name&quot;: &quot;Astronomical Object&quot;,
 &quot;children&quot;: [
  {
   &quot;name&quot;: &quot;Planet&quot;,
   &quot;children&quot;: [
    {
     &quot;name&quot;: &quot;Type&quot;,
     &quot;children&quot;: [
      {&quot;name&quot;: &quot;Terrestrial&quot;},
      {&quot;name&quot;: &quot;Gas Giant&quot;}
     ]
    }

The issue is that our data looks like this:

1. Planet
1.1. [Type]
1.1.1. Terrestrial
1.1.2. Gas Giant
1.2.  [Feature]
1.2.1. Surface
1.2.1.1. Mountain
1.2.1.2. Canyon
1.2.1.3. Volcanic
1.2.1.4. Impact
1.2.1.5. Erosion
1.2.1.6. Liquid
1.2.1.7. Ice
1.2.2. Atmosphere
1.2.2.1. Cloud
1.2.2.2. Storm
1.2.2.3. Belt
1.2.2.4. Aurora

To put this into the right format I’ve used Python to read in the file as csv (with dot as a delimiter) and construct the nested JSON object. Here’s the full python code:

import csv, re
#Open the file and read as dot delimited
def readLev(row):
    level=0
    pattern = re.compile(&quot;[a-zA-Z]&quot;)
    for col in row:
        if pattern.search(col) !=None:
            level=row.index(col)
    return level
with open('B:PythonFilesPythonInOutAstroObjectAstro.csv', 'rb') as csvfile:
    astroReader = csv.reader(csvfile, delimiter='.')
    astroJson='{&quot;name&quot;: &quot;Astronomical Object&quot;'
    prevLev=0
    for row in astroReader:
        #Identify the depth
        currentLev=readLev(row)
        if currentLev&gt;prevLev:
            astroJson=astroJson+', &quot;children&quot;: [{&quot;name&quot;: &quot;' + row[currentLev]+'&quot;'
        elif currentLev&lt;prevLev:
            jump=prevLev-currentLev
            astroJson=astroJson+', &quot;size&quot;: 1}'+']}'*jump+', {&quot;name&quot;: &quot;' + row[currentLev]+'&quot;'
        elif currentLev==prevLev:
            astroJson=astroJson+', &quot;size&quot;: 1},{&quot;name&quot;: &quot;' + row[currentLev]+'&quot;'
        prevLev=readLev(row)
    astroJson=astroJson+ '}]}]}'
    print astroJson

Since only the level of indentation matters to this process it could be repeated with on any data that has the form

* Planet
** [Type]
*** Terrestrial
*** Gas Giant

So that’s it. You’ll see that it in the source code for the hierarchical examples there is a reference to flare.json. Substitute in a reference to your own file containing the outputted JSON object and be sure to include that file in the directory on your web server.

Of course it’s a poor substitute for learning the language itself as that enables you to construct your own innovative visualisations but it gets you started.

Machine Learning and Analytics based in London, UK