R: The Dummies Package

R-2.9.2 was released in August. While R can be considered stable and battle-ready, it is also far from stagnation. It is humbling to see such an intelligent and vibrant community helping CRAN grow faster than ever. Every day I see a new package or read a new comment on R-Help gives me pause to think.

As much as I like R, on occasion I will find myself lost in some dark corner. Sometimes, I find light. Sometimes I am gnashing teeth and wringing hands. Frustrated. In a recent foray, I found myself trying to do something that I thought exceedingly trivial: expanding character and factor vectors to dummy variables. There must be some function, but what? Trying ?dummy didn’t turn up anything. Surely some else must have encountered this and provided a package. I went to the Internet and sure enough the R-wiki was here to save me. And looking even harder, I found some who had treaded before me on the R-Help archives. It turns out, it’s simple. Expanding a variable as a dummy variable can be done like so:


x <- c(2, 2, 5, 3, 6, 5, NA)
xf <- factor(x, levels = 2:6)
model.matrix( ~ xf - 1)

Two problems. The first problem is that without an external source (Google), I would have never stumbled upon what I wanted. ( Thanks Google!) I understand it now, but for what I wanted to do, I would never have thought, “oh, model.matrix.”

The second problem is the arcane syntax, wtf <- ~ xf - 1. I get it now, but it took me some time to figure out what was going on. I get it, but why not just dummy(var)? This is what I want to do.

The solution on the wiki wasn’t quite what I was looking for. For instance, you can’t say:

model.matrix( ~ xf1 + xf2 + xf3- 1)

It turns out, you can only expand one variable at a time. Well, this is not good. I know that you could solve this with some sapply’s and some tests, but next time I might forgot about how to do it. So with a couple of spare hours, I decided that the next guy, wouldn’t have to think about it. He could just use my dummies package.

Like the R-wiki solution, the dummies package provides a nice interface for encoding a single variable. You can pass a variable -or- a variable name with a data frame. These are equivalent:


dummy( df$var )
dummy( "var", df )

Moreover, you can choose the style of the dummy names, whether to include unused factor level, to have verbose output, etc.

But more than the R-wiki solution, dummy.data.frame offers to something similar to data.frames. You can specify which columns to expand by name or class and whether to return non-expanded columns.

The package dummies-1.04 is available in CRAN. Comments and questions are always appreciated.

3 Responses to R: The Dummies Package

  1. erikr says:

    Hello,

    The word “dummy” does appear to show up in the ?model.matrix help system, but not in any of the searchable fields! I think with the fast computers we have now, help.search (or the search engine in the HTML version of help.start() ) should have an option to do afull-text search, in case you do not have the internet available. It only searches certain fields of the help file as is, which is why help.search(“dummy”) does not return information on model.matrix.

    I didn’t believe you that you couldn’t get a full, overparameterized model matrix using ?model.matrix with multiple factors…but I could not find a way!

    Nice work!

    • Christopher says:

      Thanks Eric,

      I did not find a way to get a full, overparametized model matrix, either. It is that use case that prompted me to write dummies.

      I did not know that R did not do a full text search with help.search. I am surprised and agree with you that this should be implemented. ‘ Wonder why it is not.

      Chris

Leave a reply to Christopher Cancel reply