January 29, 2012 Mark Bulling

Some new functions I’ve discovered in R

Tweet about this on TwitterShare on LinkedInShare on FacebookGoogle+Share on StumbleUponEmail to someone

I’ve been writing a fair amount of R recently and have been going through a good learning period, here are some functions that I’ve discovered (mainly plyr and reshape related) and thought I would share:

merge_all is a good way to merge multiple different data frames, rather than multiple merge commands. The key thing is to put the dataframes to merge within a list – e.g. merge_all(list(df1, df2, df3), by=”key”).

mutate is a good data manipulation function which is similar to transform (both make for much cleaner code when creating a number of variables within a data frame. The key difference is the iterative nature of mutate – earlier variables that are created can be used in later variables.

So, whilst transform(data.frame, variablex = 5, variabley = variablex +1) won’t work, mutate(data.frame, variablex = 5, variabley = variabley +1) will work.

colwise is a good function for data aggregation when working with wide files. For example, colwise(mean)(data.frame) will return the average of each column in data.frame (there are other ways of doing this, but this makes for quite nice syntax. This example only works if all columns in the dataframe are numeric. To get around this, there are two options – use either numcolwise or colwise(data.frame, is.numeric) – both accomplish exactly the same purpose of subsetting the dataframe before applying the function.

I’m still getting my head around Higher Order Functions in R (John Myles White has a very good intro to these here) and how to use them, but them seem to be like a nice way of writing easy to understand and elegant code:

small.even.numbers <- Filter(function (x) {x %% 2 == 0}, 1:10)
my.sum <- function (x) {Reduce(`+`, x)}
Tagged: ,

Comments (2)

Leave a Reply

Your email address will not be published. Required fields are marked *

Machine Learning and Analytics based in London, UK