The hash package: hashes come to R

July 26, 2009

Perl has hashes. Python has dictionaries. Why doesn’t R have an equivalent? Hash tables and associative arrays are indispensable tools for the programmer. One of the most common and basic tasks of a programmer is to “look up” or “map” a key to a value. In fact, there are projects whose sole raison d’ĂȘtre is making the hash as fast and as efficient as possible.

R actually has two equivalents, both lacking. The first is R’s named vectors and lists. Elements of vectors and lists can be accessed by name, through the standard R methods:


obj$name
obj['name']
obj[['name']]

Vectors are not stored using internal hash tables and as they grow large, performance can suffer. The performance impact is tangible even on small lists. For programs doing many look-ups or look-ups on many objects, this can create a bottleneck.

R’s environments are much closer to Perl hashes and Python’s dictionary. The structure of the environment is a hash table internally and look-ups do not appreciably degrade with object size. To use a R environment, you need to create it and assign key-value pairs to it.


hash = new.env(hash=TRUE, parent=emptyenv(), size=100L)
assign(key, value, hash)
get(key, hash)

We can even get the keys from the hash with the ls function:


ls( env=hash )

This works well and perfomance is good. So what’s the problem?

Usability. In designing, the S language, John Chambers put much thought into how the analyst and statistician interact with data. All varaibles are designed to be vectors and a standard set of accessors( $, [, [[ ) were defined to retrieve and set slices, subsets or elements of the data. The problem is that R environments don’t follow this pattern. And this is where the hash package comes in.

The hash package is designed to provide an R-syntax to R’s environments and give programmers a hash. The package provides one constructor function, hash that will take a variety of arguments, always doing the right thing. All of the following work:


hash()
hash( keys=c('foo','bar','baz'), values=1:3 )
hash( foo=1, bar=2, baz=3 )
hash( c( foo=1, bar=2, baz=3 ) )
hash( list( foo=1, bar=2, baz=3 ) )
hash( c('foo','bar','baz'), 1:3 )

It pretty much does what you mean.

The standard accessors: [, [[, $ are also available.


h <- hash( c('foo','bar','baz'), 1:3 )
h[ c('foo','bar') ]
h[[ 'foo' ]]
h$foo

As does their corresponding replacement methods.


h <- hash( c('foo','bar','baz'), 1:3 )
h[ c('foo','bar') ] <- c( 'fred', 'wilma' )
h[[ 'foo' ]] <- 'dino'
h$foo <- 'bam bam'

There you have it, hashes for R.

I (CB) am the maintainer of the package, so if you have any suggestions for the package, please let me know.