The Creative Use of Data: Models within Models
By Sam Koslowsky, V.P., Strategic Analytics, Harte-Hanks Marketing Planning & Analysis
This article originally appeared in "Research: Direct Marketing Association Research Council Newsletter", June 1997
There is much debate within the database marketing community concerning the superiority of modeling approaches. Some marketers remain committed to tried-and-true RFM. Others are using one of the more intuitive "tree" analysis programs such as CHAID or CART. Logistic regression has its strong supporters. Neural networks, with an implied relationship to biological thinking, are attractive to many. Generic algorithms, which recently have come on the scene, are gaining support. Not surprisingly, vendors that market the various software strongly support their own products.
My focus here has little to do with technique. All of the aforementioned methods are good. I contend, however, that data -- or, more precisely, the creative use of data –- is just as critical as the approach. In addition to providing incremental predictive value, the intelligent use of data frequently offers an intuitive understanding of –- and corresponding comfort about –- the dynamics behind the segmentation algorithm.
What I would like to focus on in this article is the use of:
- Model scores as predictors among other predictors.
- The rate of change of model scores as predictors.
Behavioral, demographic and lifestyle data are most frequently used as predictors in a model. The appropriate weighted combination of the final predictors results in a numerical value, known among database marketers as a "score." The score presents how likely an individual is to demonstrate a certain behavior.
The result of a model often looks something like this:
Likelihood = (c1 x P1) + (c2 x P2) + (c3 x P3) + ….
The P's are the predictors, and the c's are the weights that are derived via the modeling process.
Using a Model Score Within a Model
A mutual fund manager established an objective of mailing to existing customers with a propensity to open an aggressive growth fund. He commissioned his statistician to develop a model that would assist him in accomplishing his goal.
The resulting model performed well. One of the predictors was ownership of a municipal bond fund. In our previous formulation, this factor would be one of the P's.
Two methods were available to describe who owned such a fund. One was more direct – an ownership-or-lack-of-ownership binary variable; that is, a value of "1" indicated ownership and "0" non-ownership. This variable proved to be significant and appeared in the final model.
Now, for the second method of capturing the effect of municipal fund ownership:
The source database also contained a similar field – a score developed from a previous model that summarized an individual's "resemblance" to a municipal bond fundholder. Rather than a binary variable, this score took the form of a value from 0 to 100. The closer to 100, the more an individual "resembled" a municipal bond fundholder. This variable also proved to be significant and appeared in the final model.
This second method has several advantages. Although this predictor (the score) is based on a model, its continuous nature offers better predictive power than the prior "0/1" treatment.
Additionally, this continuous predictor provides an understanding of – and corresponding comfort about – the dynamics of the final model equation. By its very nature the predictor allows us to answer the question: how does the chance of owning an aggressive fund change as the likelihood of owning municipal fund changes?
The Change in Model Scores
Marketers know that a change in lifestyle provides significant opportunities. A move from one location to another, a job promotion, a birth or any such event presents a time in people's lives at which they are likely to be the most receptive to certain offers.
Sometimes, a sudden change in behavior with respect to spending, income or insurance level, for example, can signify such a life event. Incorporating these behavioral dimensions into a model, when available, can provide real value.
As we know, scores are based on a set of predictors, the P's that emerge from a model. While the predictors stay the same throughout the useful life of the model, the values of the predictors very often change for specific individuals.
For example, let's look at the following:
Predictors: | Predictor Value at Time #1: | Predictor Value at Time #2: |
Marital status | Single | Married |
Total Investments | 3 | 9 |
This consumer was single at time #1. Additionally, he made only 3 investments prior to the time that the model score was calculated. When the model was reapplied a year later, at time #2, he was married and had increased his total investments to 9. These differences would lead to a change in this customer's score at time #1 vs. time #2.
A frequently overlooked way of inferring lifestyle events involves examining the change in model scores, and then using it as a potential predictor; that is, as one of the P's. I mentioned previously that using a model within a model can provide predictive and intuitive value. Here, I'm taking this notion one step further by measuring changes in velocity. Let's turn to our mutual fund example, and more closely investigate three consumers.
Customer: | Score at Time #1: | Score at Time #2: | Absolute Change in Scores | Relative Change in Scores |
100 | 342 | 512 | 170 | 49.7% |
200 | 744 | 936 | 192 | 25.9% |
300 | 401 | 395 | -6 | -1.5% |
The Score at time #1 column is the municipal bond score of a customer at a given point in time. The score at time #2 is the reapplication of the same model – say, one year later at time #2. We already have outlined how this change in score can take place.
There are two more variables that we can calculate. Shown in the two right-most columns in the above table, the absolute change in scores is defined as score at time #2 less score at time #1. The last column, the relative difference, is equal to the absolute difference divided by score at time #1. Customer #100's relative difference was almost 50 percent, indicating a possible major shift in lifestyle. On the other hand, #300's score was fairly steady, perhaps pointing to a more stable lifestyle during this one-year period between model scoring.
Both of these new potential predictors were evaluated to determine their value for inclusion into the aggressive fund model. The relative change predictor was significant, and provided enough life to be included in the model (which currently remains in production).
It can be argued that it is sufficient to look at the score at time #2. I contend, however, that there can be a major difference in mindsets between – say – the following two individuals.
Customer #: | Score at Time #1: | Score at Time #2: | Relative Change in Scores |
100 | 432 | 585 | 74% |
101 | 583 | 585 | .3% |
Although both customers have the same score at the time #2, their scores at time #1 are significantly different. The relative change in score can provide real opportunity!
There very likely are other ways to capture lifestyle changes besides the change in model scores that I have described in this article. However, they are lacking the elegance that is innate to the parsimonious use of predictors, as well as the intuitive understanding of what changes in model scores often signify. A large relative positive change in municipal bond scores, for example, was related to a lower probability of becoming an aggressive fundholder.
The analysis game is all about the creative use of data. Practitioners dearly need a technique to mine the data. The winners, however, will be those who can creatively transform what is available into insightful information.
Reprinted by permission.
|