Uplift – Two Model Approach

Data Preparation

Loading the required packages.

library(Information)        # Package that contains the dataset.
library(splitstackshape)    # Package for stratified splitting.
library(caret)              # For some standard predictive analytics techniques.
library(uplift)             # For the evaluation measures.

The data is located in the package “Information”. We first load it in and remove the unwanted columns.

# Getting the data
data("train", package = "Information")
df <- train

# Removing unwanted columns
unwantedColumns <- c("UNIQUE_ID")
df <- df[ , -which(names(df) %in% unwantedColumns)]

We set up some extra variables to make it more readable in the next steps.

# Extra variables:
treatmentVariable <- "TREATMENT"
targetVariable = "PURCHASE"
targetNegativeClass <- "NO"     # Needed later on for factoring the target variable (only needed for certain techniques)
targetPositiveClass <- "YES"    # Needed later on for factoring the target variable (only needed for certain techniques)
predictors <- names(df)[(names(df) != targetVariable) & (names(df) != treatmentVariable)]

We check the distribution of both treatment/control and purchase/non-purchase.

prop.table(table(df[,targetVariable], df[,treatmentVariable]))
         0      1
  0 0.4045 0.3959
  1 0.0983 0.1013

We split the dataset into a training and testing set. We have to do a stratified split on both the treatment and target variables because we want to keep the same ratios of treatment/control and purchase/non-purchase.

# Splitting into train & test set
splitted <- stratified(df, c(treatmentVariable, targetVariable), 0.66, bothSets=TRUE)
df.train <- as.data.frame(splitted[[1]])
df.test <- as.data.frame(splitted[[2]])

Checking if the ratios are indeed more or less the same.

# Check to see if the ratio's treatment/control and responder/non-responder remain correct. 
prop.table(table(df.train[,targetVariable], df.train[,treatmentVariable]))
prop.table(table(df.test[,targetVariable], df.test[,treatmentVariable]))
   
             0          1
  0 0.40448417 0.39584911
  1 0.09831844 0.10134828
   
            0         1
  0 0.4045307 0.3959988
  1 0.0982642 0.1012062

Modeling Techniques

In total there will be three example techniques. These are:

  • The Two Model Approach (also known as the Naive Approach)
  • The Dummy Treatment Approach
  • Generalized Lai’s Approach

On this page we present the Two Model Approach.

Two Model Approach

The idea of this technique is to built two separate models, one for the treatment group and one for the control group. Then calculating the uplift as the difference of the predicted probabilities of the model.

References where this approach was used:

  • Behram Hansotia and Brad Rukstales. “Incremental value modeling”. In: Journal of Interactive Marketing 16.3 (2002), pp. 35–46.
  • James M Robins. “Correcting for non-compliance in randomized trials using structural nested mean models”. In: Communications in Statistics-Theory and methods 23.8 (1994), pp. 2379–2412.
  • S. Vansteelandt and E. Goetghebeur. “Causal inference with generalized structural mean models”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65.4 (2003), pp. 817–835.

Training the model

We factor the target variable and assign it its levels (‘YES’ and ‘NO’).

df.train[,targetVariable] <- factor(df.train[,targetVariable])
  levels(df.train[,targetVariable]) <- c(targetNegativeClass, targetPositiveClass)

For this approach we need to split the dataset into separate datasets, one for the treatment group and one for the control group (as two separate models will be built).

## Splitting on the treatment variable to create the treatment and control groups. ##
  out <- split(df.train, f = df.train[, treatmentVariable])
  df.train.control <- out[[1]]
  df.train.treatment <- out[[2]]

Since we have two separate sets, we do not need the treatment variable anymore.

## Removing the treatmentVariable ##
  unwantedColumns <- c(treatmentVariable)
  df.train.treatment <- df.train.treatment[ , -which(names(df.train.treatment) %in% unwantedColumns)]
  df.train.control <- df.train.control[ , -which(names(df.train.control) %in% unwantedColumns)]

We built two models, one for the treatment group, one for the control group. Both models are built with a logistic regression and are implemented using the ‘caret’-package in R.

 

## setting up training schema ##
  ctrl <- trainControl(method = "cv",
                       number = 5,
                       classProbs = TRUE,
                       summaryFunction=twoClassSummary)
  
  ## Training the model on treatment group
  model.TMA.treatment <- train(df.train.treatment[,predictors], 
                           df.train.treatment[,targetVariable],
                           method="glm",
                           family=binomial(),
                           metric="ROC",
                           trControl=ctrl)
  
  ## Training the model on control group
  model.TMA.control <- train(df.train.control[,predictors], 
                         df.train.control[,targetVariable],
                         method="glm",
                         family=binomial(),
                         metric="ROC",
                         trControl=ctrl)

 

Testing the model

First we factor the targetVariable of the test set and setting the levels.

## Factoring Target variable ##

df.test$targetVariableFactor <- factor(df.test[,targetVariable])
levels(df.test[,"targetVariableFactor"]) <- c(targetNegativeClass, targetPositiveClass)

We then get the probabilities for both the models (one trained on the treatment set and one on the control set).

# Get predictions of treatment model #
  modelProbs.TMA.Treatment <- extractProb(list(model.TMA.treatment), 
                                      testX = df.test[,predictors], 
                                      testY = df.test[,"targetVariableFactor"])
  modelProbs.TMA.Treatment.Results <- modelProbs.TMA.Treatment[modelProbs.TMA.Treatment$dataType == "Test",]
  
  # Get predictions of control model #
  modelProbs.TMA.Control <- extractProb(list(model.TMA.control), 
                                    testX = df.test[,predictors], 
                                    testY = df.test[,"targetVariableFactor"])
  modelProbs.TMA.Control.Results <- modelProbs.TMA.Control[modelProbs.TMA.Control$dataType == "Test",]

We then group together the predictions in a dataframe. The final uplift prediction is the probability of a person purchasing when receiving a treatment minus the probability of a person purchasing when not receiving a treatment.

predictions.TMA = data.frame(treatment=numeric(nrow(modelProbs.TMA.Treatment.Results)),control=numeric(nrow(modelProbs.TMA.Control.Results)))
predictions.TMA$treatment <- modelProbs.TMA.Treatment.Results[,targetPositiveClass]
predictions.TMA$control <- modelProbs.TMA.Control.Results[,targetPositiveClass]
predictions.TMA$uplift <- modelProbs.TMA.Treatment.Results[,targetPositiveClass] - modelProbs.TMA.Control.Results[,targetPositiveClass]

We can take a sneak peak at the predictions.

head(predictions.TMA, n = 10)
treatment control uplift
0.01070962 0.03212048 -0.021410864
0.11464789 0.03576858 0.078879307
0.04858335 0.04031976 0.008263594
0.04228613 0.04973386 -0.007447731
0.01234653 0.03512125 -0.022774726
0.31516125 0.19100784 0.124153404
0.05625385 0.04678898 0.009464876
0.18446206 0.11972759 0.064734467
0.15607269 0.06930374 0.086768953
0.26517572 0.51828055 -0.253104825

Evaluating the model

Evaluation in uplift modeling is difficult as we cannot both test a person how he would react when receving the campaign or treatment and how he would react when not receiving the campaign or treatment. Therefore we have to look at it on a more aggregated basis.

NOTE: From hereon the code is adopted from the ‘Uplift’-package. Because we made a modification to the code we could not use the package directly. The package can be found here: https://cran.r-project.org/web/packages/uplift/index.html

We rank the uplift scores from high to low and add this information to a dataframe.

set.seed(123) # As there is a randomness is involved we set a seed to be able to reproduce results while testing.

mm <- cbind(uplift = predictions.TMA[,3],
            target = df.test[,targetVariable], 
            treatment = df.test[,treatmentVariable],
            uplift_rank = rank(-predictions.TMA[,3], ties.method = "random"))

Afterwards we divide the observation into 10 equal groups. The first group will contain the highest uplift scores, the second group the second highest-scores and so on.

There is a possibility of having observations with the same uplift score and there is a chance that these will not be part of the same group. If this is the case, the observations are assigned randomly.

groups <- 10
bk <- unique(quantile(mm[, 4], probs = seq(0, 1, 1 / groups)))
if ((length(bk)-1) != groups){
    warning("uplift: due to many ties in uplift predictions, the ties will be dealt with randomly ", groups)
}
mm <- cbind(mm, decile = cut(mm[, 4], breaks = bk, labels = NULL, include.lowest = TRUE))

# Previewing the dataframe
head(mm)

uplift target treatment uplift_rank decile
-0.021410864 0 0 2500 8
0.078879307 0 0 753 3
0.008263594 0 0 1664 5
-0.007447731 0 0 1970 6
-0.022774726 0 0 2543 8
0.124153404 0 0 414 2

We have now ranked all the observations in the test according to uplift score and assigned them into a group (according to their ranking). The next step is to test the actual values like per group:

  • How many belonged to the treatment group?
  • How many to the control group?
  • How many of those have purchased?
n.y1_ct0 <- tapply(mm[mm[, 3] == 0, ][, 2], mm[mm[, 3] == 0, ][, 5], sum)  # Sum of people responding and not having received the treatment
n.y1_ct1 <- tapply(mm[mm[, 3] == 1, ][, 2], mm[mm[, 3] == 1, ][, 5], sum)  # Sum of people responding and having received the treatment
r.y1_ct0 <- tapply(mm[mm[, 3] == 0, ][, 2], mm[mm[, 3] == 0, ][, 5], mean) # Ratio of people responding and not having received the treatment
r.y1_ct1 <- tapply(mm[mm[, 3] == 1, ][, 2], mm[mm[, 3] == 1, ][, 5], mean) # Ratio of people responding and having received the treatment
n.ct0 <- tapply(mm[mm[, 3] == 0, ][, 2], mm[mm[, 3] == 0, ][, 5], length)  # Sum of people not having received the treatment
n.ct1 <- tapply(mm[mm[, 3] == 1, ][, 2], mm[mm[, 3] == 1, ][, 5], length)  # Sum of people having received the treatment

# In rare situations the ratio of a group can be non-existing because there are nog people in the treatment or control group.
# We set these to 0.
r.y1_ct0 <- ifelse(is.na(r.y1_ct0), 0, r.y1_ct0)
r.y1_ct1 <- ifelse(is.na(r.y1_ct1), 0, r.y1_ct1)

 

We group these statistics into a new dataframe and call it a performance-class.

df <- merge(cbind(n.y1_ct0, r.y1_ct0, n.ct0), cbind(n.y1_ct1, r.y1_ct1, n.ct1), by= "row.names", all = TRUE)             
    
df$Row.names <- as.numeric(df$Row.names)
df[, c(2, 4, 5, 7)][is.na(df[, c(2, 4, 5, 7)])] <- 0 # missing implies 0 counts

df$uplift = df$r.y1_ct1 - df$r.y1_ct0

df <- df[order(df$Row.names), ] # Ordering according to row-names.

perf <- cbind(group   = df$Row.names,
             n.ct1    = df$n.ct1,
             n.ct0    = df$n.ct0, 
             n.y1_ct1 = df$n.y1_ct1,
             n.y1_ct0 = df$n.y1_ct0,
             r.y1_ct1 = df$r.y1_ct1, 
             r.y1_ct0 = df$r.y1_ct0,
             uplift   = df$uplift)

class(perf) <- "performance"

perf
      group n.ct1 n.ct0 n.y1_ct1 n.y1_ct0   r.y1_ct1   r.y1_ct0        uplift
 [1,]     1   170   170       59       59 0.34705882 0.34705882  0.0000000000
 [2,]     2   171   169       51       28 0.29824561 0.16568047  0.1325651407
 [3,]     3   158   182       30       36 0.18987342 0.19780220 -0.0079287801
 [4,]     4   162   178       27       24 0.16666667 0.13483146  0.0318352060
 [5,]     5   161   179       22       19 0.13664596 0.10614525  0.0305007113
 [6,]     6   177   162       14       10 0.07909605 0.06172840  0.0173676501
 [7,]     7   172   168        9        8 0.05232558 0.04761905  0.0047065338
 [8,]     8   160   180       23       26 0.14375000 0.14444444 -0.0006944444
 [9,]     9   182   158       58       40 0.31868132 0.25316456  0.0655167617
[10,]    10   177   163       51       84 0.28813559 0.51533742 -0.2272018301
attr(,"class")
[1] "performance"

Now that we have the new performance-class we can use it to produce some graphs.

Response Rate per Decile

The Response Rate Per Decile plot is a direct visualisation of the performance block. In theory the ideal plot is to have high response rates for the treatment group and low response rates in the control group in the first deciles and vice versa in the last deciles.

temp.df.treatment <- data.frame(Decile = seq(1:10), responseRate = perf[,6], Group = "treatment")
  temp.df.control <- data.frame(Decile = seq(1:10), responseRate = perf[,7], Group = "control")
  temp.df <- rbind(temp.df.treatment, temp.df.control)
temp.df.treatment <- data.frame(Decile = seq(1:10), responseRate = perf[,6], Group = "treatment")
  temp.df.control <- data.frame(Decile = seq(1:10), responseRate = perf[,7], Group = "control")
  temp.df <- rbind(temp.df.treatment, temp.df.control)

wNx1kLf1o1aJQAAAABJRU5ErkJggg==

Uplift per Decile

By subtracting the response rates of the treatment groups with the response rates of the control groups we achieve the uplift per decile as seen in the next plot.

temp.df.uplift <- data.frame(Deciles = seq(1:10), Uplift = perf[,6] - perf[,7])
require(ggplot2)
require(scales)
ggplot(temp.df.uplift, aes(x=Deciles)) +
geom_bar(stat="identity", aes(y =Uplift, fill="red")) + 
scale_y_continuous(labels=percent, limits=c(-0.3,0.3), name="Uplift (Treatment - Control)") +
scale_x_discrete(name ="Deciles", limits=rep(1:10)) +
theme(panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.background = element_blank()) +
theme(axis.line.x = element_blank(),
      axis.ticks.x = element_blank(), axis.text.x = element_text(size=20), axis.text.y = element_text(size=20)) +
ggtitle("Uplift Per Decile - Two Model Approach") + theme(plot.title = element_text(face="bold", size=20)) +
guides(fill=FALSE)

 

ATcllkspVxhlAAAAAElFTkSuQmCC

Qini Plot

One way of representing the performance of an uplift technique as a single number is through the Qini Qoefficient and the accompaning Qini Curve.

The idea is to calculate the incremental gains:

  • First the cumulitative sum of the treated and the control groups are calculated with respect to the total population in each group at the specified decile.
  • Afterwards we calculate the percentage of the total amount of people (both treatment and control) present in each decile.
r.cumul.y1_ct1 <- cumsum(perf[,"n.y1_ct1"]) / cumsum(perf[,"n.ct1"])
r.cumul.y1_ct0 <- cumsum(perf[,"n.y1_ct0"]) / cumsum(perf[,"n.ct0"])
deciles <- seq(1 / groups, 1, 1 / groups)

r.cumul.y1_ct1[is.na(r.cumul.y1_ct1)] <- 0
r.cumul.y1_ct0[is.na(r.cumul.y1_ct0)] <- 0

Per decile we can calculate the incremental gains for the model performance.

### Model Incremental gains 
inc.gains <- c(0.0, (r.cumul.y1_ct1 - r.cumul.y1_ct0) * deciles)

The overall incremental gains is basically the overal uplift. The random incremental gains is then the overall incremental gains divided by the amount of groups used.

### Overall incremental gains
overall.inc.gains <- sum(perf[, "n.y1_ct1"]) / sum(perf[, "n.ct1"]) - sum(perf[, "n.y1_ct0"]) / sum(perf[, "n.ct0"])

### Random incremental gains
random.inc.gains <- c(0, cumsum(rep(overall.inc.gains / groups, groups)))

Next up we compute the area underneath the incremental curve.

### Compute area under the model incremental gains (uplift) curve 
x <- c(0.0, seq(1 / groups, 1, 1 / groups))
y <- inc.gains

auuc <- 0
auuc.rand <- 0

for (i in 2:length(x)) {
auuc <- auuc + 0.5 * (x[i] - x[i-1]) * (y[i] + y[i-1])
}

We do the same for the area underneath the random incremental curve.

### Compute area under the random incremental gains curve
y.rand <- random.inc.gains

for (i in 2:length(x)) {
auuc.rand <- auuc.rand + 0.5 * (x[i] - x[i-1]) * (y.rand[i] + y.rand[i-1])
}

We then compute the difference between those two areas.

### Compute the difference between the areas (Qini coefficient)
Qini <- auuc - auuc.rand
miny <- 100 * min(c(random.inc.gains, inc.gains))
maxy <- 100 * max(c(random.inc.gains, inc.gains))

The last step is to plot the Qini-curve.

plot(inc.gains * 100 ~ c(0, seq(100 / groups, 100, 100 / groups)), type ="b",
   col = "blue", lty = 2, xlab = "Proportion of population targeted (%)", 
   ylab = "Cumulative incremental gains (pc pt)", ylim = c(miny, maxy))
lines(random.inc.gains * 100 ~ c(0, seq(100 / groups, 100, 100 / groups)), type = "l", col = "red", lty = 1)
legend("topright", c("Model", "Random"), 
     col=c("blue", "red"), lty=c(2,1))

uPexqxY6QABGEBAggJEAAIQECCAkQQEiAAEICBBASIICQAAGEBAggJEAAIQECCAkQQEiAAEICBBASIICQAAGEBAggJEAAIQECCAkQQEiAAEICBBASIICQAAGEBAggJEDAf568sdimlHe4AAAAAElFTkSuQmCC

Disclaimer

Sources used:

 

Prepared by Floris Devriendt.