Lazy D3 on some astronomical data

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

I can’t claim to be anything near an expert on D3 (a JavaScript library for data visualisation) but being both greedy and lazy I wondered if I could get some nice results with minimum effort. In any case the hardest thing about D3 for a novice to the world of web design seems to be getting started at all so perhaps this post will be useful for getting people up and running.

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

The images above and below are visualisations using D3 of a classification hierarchy for astronomical objects provided by the IVOA (International Virtual Observatory Alliance). I take no credit for the layout. The designs are taken straight from the D3 examples gallery but I will show you how I got the environment set up and my data into the graphs. The process should be replicable for any hierarchical dataset stored in a similar fashion.

A dendrogram of the same data

A dendrogram of the same data

Even better than the static images are various interactive versions such as the rotating Reingold–Tilford Tree, the collapsible dendrogram and collapsible indented tree . These were all created fairly easily by substituting the astronomical object data for the data in the original examples. (I say fairly easily as you need to get the hierarchy into the right format but more on that later.)

First the environment.

If you are an R user with little experience of building web pages then you’ll probably find yourself squinting at the D3 documentation wondering how you get from the code to the output. With just a browser to read the JavaScript and an editor (notepad if you like but preferably a specialist HTML editor) you can get through the most of the tutorials referred to in Mark’s previous post . Doing this locally on your own pc works ok because the data visualised in the tutorials is mostly hard coded into the JavaScript. However once you want to start referring to data contained in external files some sensible security restrictions on what files your browser can access will block your attempts. The options are to turn these off (not advised) or switch to working on a webserver.

You can either opt for one of the many free hosting services or if you are feeling more adventurous you can follow Mark’s posts on setting up a Linux instance on Amazon Web Services and then follow Libby Hemphill’s instructions for starting up Apache and opening the port. Finally, going with the latter, I use FileZilla to transfer over any data I need to my Linux instance. See this post for getting it to work with your authentication key.

This should leave you in the following situation:

  • You have a working web server and an IP address that will take you to an HTML index page from which you can provide links to your D3 documents.
  • You have a way of transferring work that you do locally over to your web server.
  • You have a whole gallery of examples to play around with

Rather than creating my own D3 scripts I’m just going too substitute my own data into the examples. The issue here is that the hierarchical examples take as an input a nested JSON object. This means something like:

 "name": "Astronomical Object",
 "children": [
   "name": "Planet",
   "children": [
     "name": "Type",
     "children": [
      {"name": "Terrestrial"},
      {"name": "Gas Giant"}

The issue is that our data looks like this:

1. Planet
1.1. [Type]
1.1.1. Terrestrial
1.1.2. Gas Giant
1.2.  [Feature]
1.2.1. Surface Mountain Canyon Volcanic Impact Erosion Liquid Ice
1.2.2. Atmosphere Cloud Storm Belt Aurora

To put this into the right format I’ve used Python to read in the file as csv (with dot as a delimiter) and construct the nested JSON object. Here’s the full python code:

import csv, re
#Open the file and read as dot delimited
def readLev(row):
    pattern = re.compile("[a-zA-Z]")
    for col in row:
        if !=None:
    return level
with open('B:PythonFilesPythonInOutAstroObjectAstro.csv', 'rb') as csvfile:
    astroReader = csv.reader(csvfile, delimiter='.')
    astroJson='{"name": "Astronomical Object"'
    for row in astroReader:
        #Identify the depth
        if currentLev>prevLev:
            astroJson=astroJson+', "children": [{"name": "' + row[currentLev]+'"'
        elif currentLev<prevLev:
            astroJson=astroJson+', "size": 1}'+']}'*jump+', {"name": "' + row[currentLev]+'"'
        elif currentLev==prevLev:
            astroJson=astroJson+', "size": 1},{"name": "' + row[currentLev]+'"'
    astroJson=astroJson+ '}]}]}'
    print astroJson

Since only the level of indentation matters to this process it could be repeated with on any data that has the form

* Planet
** [Type]
*** Terrestrial
*** Gas Giant

So that’s it. You’ll see that it in the source code for the hierarchical examples there is a reference to flare.json. Substitute in a reference to your own file containing the outputted JSON object and be sure to include that file in the directory on your web server.

Of course it’s a poor substitute for learning the language itself as that enables you to construct your own innovative visualisations but it gets you started.

Machine Learning and Analytics based in London, UK