The Power of Predictive Analytics: Creating New Markets (Part 1)

November 12, 2009

Predictive models have the power to create new markets, a power that is not all that common in technology. This is the first of several posts that contain case studies describing how companies have used predictive analytics to create new markets.

Motorcycle Insurance. For many years, it was difficult if you drove a motorcycle to get insurance. Drivers of motorcycles have more accidents than other drivers and the simplest course of action is simply not to insure them. For most drivers, simply knowing a few facts about them, such as their gender, age, and the number of miles driven to work, was enough information so that an insurance company could set premiums for an insurance policy.

Classic Black Motorcyle

This segment of the automobile insurance market was called the “standard segment” and includes 80%-90% of the market. The other segment, called the “nonstandard segment”, includes drivers with accidents, drivers of motorcycles and high performance cars, older drivers, and younger drivers. Most insurance companies in the 1950’s – 1980’s focused on the standard segment. The standard market was quite competitive during this period.

Progressive Insurance took a different tack in the 1970’s. It developed an analytic model that could quantify the risk of some who drove a motorcycle and then priced the policy accordingly. Motorcycle insurance was part of the nonstandard segment and there was much less competition in this segment. This segment also had a higher barrier to entry since pricing premiums (well) in this market required (simple) analytic models.

By developing an appropriate risk model Progressive was able to create a new market, which became an important driver of its growth in the 1970s. From 1975 to 1978, premium income grew from $38 million to $112 million, as Progressive solidified its leadership in the nonstandard market.

Source: The Progressive Corporation,

Online Text Ads. Google introduced their online text ads (to the right of search results) in January 2000. The ads were sold on a cost per thousand impressions (CPM) by a sales representative. This was the way most ads were sold at that time, although banner ads (not text ads) were the dominant form of online advertising. These ads didn’t generate a lot of money at the beginning.

In the Spring of 2000, the online banner ad market crashed. In response, Google changed its business model to a self-serve model. With a self-serve model, ads were not sold by a sales representative but instead through an online, self-serve web page. It got this idea from (which later became Overture, which later was bought by Yahoo!).

In October, 2000, Google introduced AdWords with the slogan: “Have a credit card and 5 minutes? Get your ad on Google today.” Ads were still priced by CPM, but there was no longer a sales representative. (By the way, Amazon used the same model in August, 2006 when it introduced EC2. With a credit card and 5 minutes you could get use an online computer and pay by the hour.)

In 2001, Google’s AdWords revenue approached $85 million, but was much less than Overture’s revenue which earned $288 million. In contrast to Google’s use of a CPM model, Overture used an auction model: the higher you bid for an ad the more likely your ad would appear.

A problem with the auction model as employed by Overture was that high bids could force an ad to the top but no one necessarily clicked on it unless it was relevant. Relevance using information extracted from text was something Google understood well.

Google built a predictive model (a response model) to predict whether a given user would click on a given ad. Google then integrated this response model with the rankings provided by the auction. Ads with higher expected responses would be moved up higher in the rankings, and those with lower expected responses would be moved lower in the rankings. Ads with the highest rankings would then be displayed. This new model that integrated an online auction with a response model was introduced into AdWords in February 2002.

From a modeling perspective, Google has introduced two disruptive technologies in modeling: 1) PageRank; and 2) integrating relevance through a response model into pay-per-click auctions for online ads.

With this integrated model, Google created a new market (online text ads) that it has dominated since 2002.

Source: John Battelle, The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture.

Health and Status Monitoring. Over the past several years, Open Data has developed predictive analytic models to monitor the operations of complex systems, such as data centers, network operations centers, world wide payment systems, etc. I’ll describe these types of models in a future post.


mapReduce Reduced (& Ported to R)

September 10, 2009

Saying MapReduce and Sector’s implementation of User Defined Functions (UDF) over a storage cloud are innovative is only partly correct. The programming models they implement are quite old. Any programmer versed in functional languages recognizes this.

But mapReduce does come with two important innovations. The first is a framework that is specifically designed for large clusters of low-priced, commodity servers. What mapReduce has done is taken the concurrent programming models and applie them to the economic realities of the day. Large and formerly expensive computations can this be accomplished cheaply when distributed to inexpensive machines. The complexity of managing individual machines and tasks is masked from the coder. The coder does not need to worry about associating or managing which tasks get run on which machine. This is invisible to the coder.

The second innovation is the recognition that a large class of practical problems (but not all) can be solved using mapReduce framework. Because the first innovation allowed solutions to problems that were intractable with conventional techniques, technologist began framing problems to run with the MapReduce. They had a hammer; everything began looking like a nail. Fortunately, there were a lot of nails.

As mentioned above, the algorithmic pattern, itself, is not new. It is actually decades old and is a throwback to the early days of functional programming (think Lisp!) and big mainframes. The method was rediscovered, applied over a distributed virtual filesystem, applied to Google’s toughest problems, renamed mapReduce and the rest is history.

The mapReduce algorithm provides a framework for dividing a problem and working on it in parallel. There are two steps: a map step and a reduce step. Although, the two steps must proceed serially — map must preceded reduce — each step can be accomplished in parallel. In the map step, data is mapped to key-value pairs. In the reduce step, the values that share the same key are transformed (‘reduce’) by some algorithm. More complexity can be added; other functions can be used; arbitrary UDF can be supported, as in Sector. But, in essence, the algorithm is as a series of function calls.

The pattern is fairly common and most programmers have used the mapReduce pattern without knowing it, thinking about it, or calling it mapReduce. In fact, much of SAS is setup in a mapReduce style. SAS programs are comprised of DATA STEPs and PROCEDURE STEPs. In certain problems, the DATA step can be a mapper and either a DATA or PROCEDURE step can function as a reducer. If you disregard, the distribution of the problem across servers, I’d venture to say that every SAS programmer has followed this paradigm, often, numerous times in the course of a single program. This simplicity allowed for the application to a wide series of problems, the second innovation.

The same can be said for our favorite statistical programming language, R. In fact, owing to the fact that R’s is a vectorized, functional language, mapReduce boils down to a single line of code:

apply( map(data), reduce )

Where, map and reduce are the user-defined functions for the mapper and reducer respectively and apply distribution the problem in parallel. Any R programmer that was taking advantage of R’s vectorization was probably writing mapReduce problems from day one. Most often, the jobs were vectorized on a individual, single core machines.

Coupled with R packages such as Rmpi, rpvm and nws, the apply-map-reduce pattern can be distributed to several machines. And even more recently, the mutlicore has allowed the easiest implementation on multicores.

We recognized this several years ago, wrote some simple code and have been distributing work across available servers for some time. More recently, we have released our work as an open source package on CRAN for implementing this pattern. Our implementation follows closely to the mapReduce Google paper, is written in pure R and is agnostic to the parallelization backend whether rpvm, rmpi, nws, multicore, or others. ( Revolution Computing recognized this as a goof idea and adopted the same approach with their ParallelR package. )

The use of the mapReduce is exceedingly simple. The package provided a single function, mapReduce. The function is defined as:

mapReduce( map, ..., data, apply = sapply)


map An expression to be evaluated on data which yielding a vector that is subsequently
used to split the data into parts that can be operated on independently.
... The reduce step(s). One or more expressions that are evaluated for each of the partitions made
data A R data structure such as a matrix, list or data.frame.
apply The functions used for parallelization

Take the iris dataset, data(iris). Using mapReduce, we can quickly compute the mean and max petal lengths as so:

max.petal.length=max(Petal.Length) ,
data = iris

mean.petal.length max.petal.length
setosa 1.462 1.9
versicolor 4.260 5.1
virginica 5.552 6.9

The mean and max petal lengths are computed for each Species and returned as a matrix with two columns, mean.petal.length and max.petal.length and one row for each Species.

Because we have used expressions in our implementations, you can use almost any R function for the map and reduce step. ( Most work, there are few edge case exceptions.) For example, suppose we wanted to do the above calculation but wanted versicolor and virginica lumped together.


mean.petal.length max.petal.length
s 1.462 1.9
v 4.906 6.9

There you have it, simple yet powerful mapReduce in R. mapReduce can be downloaded from any CRAN mirror. If you get a chance to use it, please let me know what you think.