July 14, 2014 James Thomson

Converting an R HClust object into a D3.js Dendrogram

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

Hi all I’m James. This is my first blog for Coppelia. Thanks to Simon for encouraging me to do this.

I’ve been doing a lot of hierarchical clustering in R and have started to find the the standard dendrogram plot fairly unreadable once you have over a couple of hundred records. I’ve recently been introduced to the D3.js gallery and I wondered if I could hack something better together. I found this dendrogram I liked and started to play. I soon realised in order to get my data into it I needed a nested json.

A quick google found the very useful rjson package. Having installed that I copied the nested structure in the example flare dataset and put together the code below.

The R code is made up of two functions. 1) HCtoJSON which converts the hclust output into a JSON and 2) D3Dendo which takes the JSON and creates an HTML file of the dendagram. It’s nice and simple and creates the following dendagram.

Obviously if you’re better with JavaScript than I am you can add to the dendrogram or insert the nested JSON into you’re own D3.

I’m also thinking of other places to use D3 and might put together an R package in a similar style.

Tagged: , , , , ,

About the Author

James Thomson I have over 10 years experience working in analytics and statistics. I worked as a Statistician in the Pharma industry before branching out into analytics and data mining. I'm currently enjoying learning about data visualisation and machine learning. Over years I've worked with: GSK, Nectar, BP, Ford, Whitbread, Wunderman, Jaguar, RNIB, Virgin Media & Channel4. Please visit my own blog on analytics and music

Comments (5)

  1. eb1

    thank you very much for advertising this function it seems very helpful
    i see a problem which can be seen in your example too
    if you run your example i.e:
    hc <- hclust(dist(USArrests), "ave")

    than you look at the picture in R you can see 50 leafs but if you look at the json string extracted from the function it seems that it takes only the first right junction from above only(in your example you can see 34 leafs out of 50 as should be )how can i overcome this problem?

    thank you very much


  2. EB1

    thank you very much for this function seems very helpful.i used the HCtoJSON function
    on my data and it seems to produce the json just for the right side of the first junction from above.
    i also ran your example i.e:
    hc <- hclust(dist(USArrests), "ave")

    in this example the result should produce 50 leafs in the tree if you run it in R but in the picture you produced here we see just the right side of the tree with 34 leafs how can i overcome this problem?

    thank you very much

    • You are right, there was a slight error in my function. I have corrected the gist now, so it should be correct.

      Please try again. Hopefully it should work properly now.

  3. Arun Krishnan

    Hello James,

    Thanks for this post. I was looking into getting my hclust object into a JSON format that d3 can take and this certainly helped.

    I have one more request. Do you know how I could form the JSON object based on a particular cut-level or on a certain number of clusters? For example if I were to cut at the level of node 25 in the above dendrogram, then all the nodes in that subtree would be directly connected to node 25. Do you know how I can do that?

    Thanks in Advance.


    • Hi Arun

      I’m not sure i totally understand what you are asking. A couple of thoughts

      – You could try changing the HCtoJSONfunction to pick out the nodes you are interested in above or below a certain level.
      – You could try using a collapsable dendogram which I’ve now made available in a R package: R2D3 http://www.coppelia.io/introducing-r2d3/

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning and Analytics based in London, UK