January 20, 2016 James Thomson

R2D3 Updates

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

R2D3 Venn

Since I launched R2D3 around a year ago I’ve slowly been adding functionality to the package. I thought I’d post a quick update to highlight some of the latest additions, including; Venn Diagrams, Word Clouds, and a Cross Tab Heatmap.

Firstly the Venn Diagram. I took the D3 format from the work done by Ben Frederickson here.

To produce the diagram it follows the standard process I created for R2D3. Firstly create the json with the data. In this case, call the jsonOverlaps() function, on the browsers data that comes with the package. This data lists out people (an item variable) and browser (the group it belongs to). Items need to belong to more than one group for a Venn to have an overlap. You can specify the levels of overlap you want to consider. Then call the D3 function, D3Venn(), specifying the json data object and the output location for the file containing the D3 visualisation

JSON<-jsonOverlaps(browsers, overlaps = 4)
D3Venn(JSON, file_out="browsers_venn.html")

Here you can see the results of the made up data showing overlap in use of web browsers.

See the Pen Venn Diagram by james thomson (@jamesthomson) on CodePen.

Next the Cross Tab Heat Map. In my role I’m regularly asked to compare two categorical groups. Often two different segmentations. The usual starting point is a cross tab of the two segmentations, with row percentages and column percentages. I’ve turned this into a standard D3 output which shows the total frequencies, the row percentages and column percentages in a heat map style. I based it around this D3 example

The input data here is a data frame containing the two categorical groups. The jsonXtab function calculates the frequencies and row and column percentages and formats it to json. Then D3Xtabheat converts it to the output

data<-data.frame(airquality$Month, airquality$Temp)
D3XtabHeat(json, file_out="heat_map.html")

Here are the results. The drop down menu allows you to switch between frequencies, column percentages, and row percentages.

See the Pen Cross Tab Heat Map by james thomson (@jamesthomson) on CodePen.

Lastly the word cloud. These visually represent frequencies of words in a document. I used this d3 word cloud implementation.

The data here is two vectors one containing the unique words and one the frequency. It is also possible to provide just a list of words without the frequency, and it will calculate the frequency from the word list. The function jsonwordcloud() forms the json object and then D3WordCloud() creates the D3 output.

words=c("big", "data", "machine", "learning", "wordcloud", "R", "d3js", "algorithm", "analytics", "science", "API")
freq=c(50, 50, 30, 30, 100, 10, 10, 10, 5, 5, 5 )
json<-jsonwordcloud(words, freq)
D3WordCloud(json, file_out="word_cloud.html")

Here are the results. Not the most exciting word cloud, but you get the idea.

See the Pen Word Cloud by james thomson (@jamesthomson) on CodePen.

You can download and install the R2D3 package from my github using the devtools package and the install_github() function.


Tagged: , , , , , , , , ,

About the Author

James Thomson I have over 10 years experience working in analytics and statistics. I worked as a Statistician in the Pharma industry before branching out into analytics and data mining. I'm currently enjoying learning about data visualisation and machine learning. Over years I've worked with: GSK, Nectar, BP, Ford, Whitbread, Wunderman, Jaguar, RNIB, Virgin Media & Channel4. Please visit my own blog on analytics and music

Machine Learning and Analytics based in London, UK