12 Support Vector Machines

This goal of this chapter is to create an interactive data visualization that explains the Support Vector Machine, a machine learning model for binary classification.

Chapter summary:

We begin by simulating some data for binary classification in two dimensions, and making some static plots.
In the second section, we make an interactive data visualization to show how the linear Support Vector Machine decision boundary changes as a function of the cost hyper-parameter.
In the last section, we make an interactive data visualization to show how the decision boundary of the polynomial kernel Support Vector Machine changes as a function of the two hyper-parameters (cost and degree).

12.1 Generate and plot some data

We begin by generating two input features, x1 and x2.

library(data.table)
N <- 50
set.seed(1)
getInput <- function(){
  c(#rnorm(N, sd=0.3),
    runif(N, -1, 1),
    runif(N, -1, 1)    
    )
}
data.dt <- data.table(
  x1=getInput(),
  x2=getInput())
library(animint2)
ggplot()+
  geom_point(aes(
    x1, x2),
    data=data.dt)

The plot below shows the same data, after computing two additional input features (the squares of the original two inputs).

data.dt[, let(
  x1.sq = x1^2,
  x2.sq = x2^2)][]

             x1         x2      x1.sq      x2.sq
  1: -0.4689827  0.3094479 0.21994475 0.09575798
  2: -0.2557522 -0.2936055 0.06540919 0.08620416
 ---                                            
 99:  0.6217405 -0.3640726 0.38656123 0.13254888
100:  0.2098666  0.5657027 0.04404398 0.32001952

ggplot()+
  geom_point(aes(
    x1.sq, x2.sq),
    data=data.dt)

In our simulation, we assume that the output score f is a linear function of x1.sq, and ignores x2.sq. The plot below visualizes the output scores using the point fill aesthetic.

data.dt[, f := x1.sq]
true.decision.boundary <- 0.2
ggplot()+
  theme_bw()+
  scale_fill_gradient2(midpoint=true.decision.boundary)+
  geom_point(aes(
    x1.sq, x2.sq, fill=f),
    shape=21,
    color="grey",
    data=data.dt)

In particular, we assume that the label y is negative (-1) if x1.sq + noise < threshold, and positive (1) otherwise. The plot below visualizes the scores and labels, as a function of the input feature x1. It also shows the true score function in black. Of course, we would not be able to make this visualization with real data (only the labels are known in real data, not the scores).

data.dt[
, f.noise := f+rnorm(N, 0, 0.2)
][
, y.num := ifelse(f.noise<true.decision.boundary, -1, 1)
][
, y := factor(y.num)
]
table(data.dt$y)


-1  1 
56 44

scores <- data.table(x1=seq(-1, 1, l=101))[
, x1.sq := x1^2
][
, f := x1.sq ]
x1.boundaries <- data.table(
  boundary=c(1, -1)*sqrt(true.decision.boundary))
ggplot()+
  scale_y_continuous(breaks=seq(0, 1, by=0.2))+
  geom_vline(aes(
    xintercept=boundary),
    color="grey50",
    data=x1.boundaries)+
  geom_line(aes(
    x1, f),
    data=scores)+
  geom_point(aes(
    x1, f.noise, color=y),
    shape=21,
    fill=NA,
    data=data.dt)

The plot below shows the scores and labels, as a function of the squared feature x1.sq. It is clear that the score function that we want to learn is linear in x1.sq.

x1sq.boundary <- data.table(boundary=true.decision.boundary)
ggplot()+
  scale_y_continuous(breaks=seq(0, 1, by=0.2))+
  scale_x_continuous(breaks=seq(0, 1, by=0.2))+
  geom_vline(aes(
    xintercept=boundary),
    color="grey50",
    data=x1sq.boundary)+
  geom_line(aes(x1.sq, f), data=scores)+
  geom_point(aes(
    x1.sq, f.noise, color=y),
    shape=21,
    fill=NA,
    data=data.dt)

Next, we visualize the labels in the two-dimensional squared feature space. It is clear that the decision boundary is linear in this space.

ggplot()+
  scale_y_continuous(breaks=seq(0, 1, by=0.2))+
  scale_x_continuous(breaks=seq(0, 1, by=0.2))+
  geom_vline(aes(
    xintercept=boundary),
    color="grey50",
    data=x1sq.boundary)+
  geom_point(aes(
    x1.sq, x2.sq, color=y),
    shape=21,
    fill=NA,
    data=data.dt)

The plot below shows the input feature space (x1 and x2). It is clear that the decision boundary is non-linear in x1.

ggplot()+
  scale_y_continuous(breaks=seq(-1, 1, by=0.2))+
  scale_x_continuous(breaks=seq(-1, 1, by=0.2))+
  geom_vline(aes(
    xintercept=boundary),
    color="grey50",
    data=x1.boundaries)+
  geom_point(aes(
    x1, x2, color=y),
    shape=21,
    fill=NA,
    data=data.dt)

The animint below uses clickSelects to show which points in the input and squared space correspond. We just need to create an data.i variable that has a unique ID for each data point.

data.dt[, data.i := 1:.N]
YVAR <- function(dt, y.var){
  dt$y.var <- factor(y.var, c("x2", "x2.sq", "f"))
  dt
}
animint(
  input=ggplot()+
    ggtitle("input feature space")+
    theme_bw()+
    theme(panel.margin=grid::unit(0, "lines"))+
    facet_grid(y.var ~ ., scales="free")+
    scale_x_continuous(breaks=seq(-1, 1, by=0.2))+
    ylab("")+
    guides(color="none")+
    geom_vline(aes(
      xintercept=boundary),
      color="grey50",
      data=x1.boundaries)+
    geom_point(aes(
      x1, x2, color=y),
      clickSelects="data.i",
      size=4,
      alpha=0.7,
      data=YVAR(data.dt, "x2"))+
    geom_line(aes(
      x1, f),
      data=YVAR(scores, "f"))+
    geom_point(aes(
      x1, f.noise, color=y),
      clickSelects="data.i",
      size=4,
      alpha=0.7,
      data=YVAR(data.dt, "f")),
  square=ggplot()+
    ggtitle("squared feature space")+
    theme_bw()+
    theme(panel.margin=grid::unit(0, "lines"))+
    facet_grid(y.var ~ ., scales="free")+
    ylab("")+
    scale_x_continuous(breaks=seq(0, 1, by=0.2))+
    geom_vline(aes(
      xintercept=boundary),
      color="grey50",
      data=x1sq.boundary)+
    geom_point(aes(
      x1.sq, x2.sq, color=y),
      clickSelects="data.i",
      size=4,
      alpha=0.7,
      data=YVAR(data.dt, "x2.sq"))+
    geom_line(aes(
      x1.sq, f),
      data=YVAR(scores, "f"))+
    geom_point(aes(
      x1.sq, f.noise, color=y),
      clickSelects="data.i",
      size=4,
      alpha=0.7,
      data=YVAR(data.dt, "f")))

Note how we used two multi-panel plots with the addColumn then facet idiom, rather than creating four separate plots. This emphasizes the fact that some plots/facets have a common x1 or x1.sq axis. Note that we also hid the color legend in the first plot, since it is sufficient to just have one color legend.

12.2 Linear SVM

train.i <- 1:N
data.dt[
, set := "validation"
][
  train.i, set := "subtrain"
]
table(data.dt$set)


  subtrain validation 
        50         50

subtrain.dt <- data.dt[set=="subtrain",]
ggplot()+
  theme_bw()+
  theme(panel.margin=grid::unit(0, "lines"))+
  facet_grid(set ~ .)+
  geom_vline(aes(
    xintercept=boundary),
    color="grey50",
    data=x1.boundaries)+
  scale_y_continuous(breaks=seq(-1, 1, by=0.2))+
  scale_x_continuous(breaks=seq(-1, 1, by=0.2))+
  geom_point(aes(
    x1, x2, color=y),
    data=data.dt)

We begin by fitting a linear SVM to the train data in the squared feature space, and visualizing the true labels y along with the predicted labels pred.y.

library(kernlab)


Attaching package: 'kernlab'

The following object is masked from 'package:animint2':

    alpha

squared.mat <- subtrain.dt[, cbind(x1.sq, x2.sq)]
y.vec <- subtrain.dt$y
fit <- ksvm(squared.mat, y.vec, kernel="vanilladot")

 Setting default kernel parameters

subtrain.dt$pred.y <- predict(fit)
ggplot()+
  geom_point(aes(
    x1.sq, x2.sq, color=pred.y, fill=y),
    shape=21,
    size=4,
    stroke=2,
    data=subtrain.dt)

It is clear from the plot above that there are several mis-classified train data points. In the plot below we visualize the decision boundary and margin.

predF <- function(fit, X){
  fit.sc <- scaling(fit)$x.scale
  if(is.null(fit.sc)){
    fit.sc <- list(
      "scaled:center"=c(0,0),
      "scaled:scale"=c(1,1))
  }
  mu <- fit.sc[["scaled:center"]]
  sigma <- fit.sc[["scaled:scale"]]
  X.sc <- scale(X, mu, sigma)
  kernelMult(
    kernelf(fit),
    X.sc,
    xmatrix(fit)[[1]],
    coef(fit)[[1]])-b(fit)
}
xsq.vec <- seq(0, 1, l=41)
grid.sq.dt <- data.table(expand.grid(
  x1.sq=xsq.vec,
  x2.sq=xsq.vec
))[
, pred.f := predF(fit, cbind(x1.sq, x2.sq))]
subtrain.dt[, train.error := ifelse(y==pred.y, "correct", "error")]
ggplot()+
  theme_bw()+
  scale_color_manual(values=c(error="black", correct=NA))+
  geom_point(aes(
    x1.sq, x2.sq, fill=y, color=train.error),
    shape=21,
    stroke=1,
    size=4,
    data=subtrain.dt)+
  geom_vline(aes(
    xintercept=boundary), color="grey50",
    data=x1sq.boundary)+
  geom_contour(aes(
    x1.sq, x2.sq, z=pred.f),
    breaks=0,
    color="black",
    data=grid.sq.dt)+
  geom_contour(aes(
    x1.sq, x2.sq, z=pred.f),
    breaks=c(-1, 1),
    color="black",
    linetype="dashed",
    data=grid.sq.dt)

The plot above shows the true decision boundary using a grey vline. It also uses geom_contour to display the decision boundary (solid black line, predicted score 0) and the margin (dashed black line, predicted score -1 and 1). Since the decision boundary and margin are linear in this space, we can also use geom_abline to display them. To do that we need to do some math, and work out the equations for the slope and intercepts of those lines (as a function of the learned bias b(fit) and weight.vec, as well as the scale parameters mu and sigma).

## The equation of the margin lines is x2 = m2 + s2/w2[c+b+w1*m1/s1]
## -s2*w1/(w2*s1)*x1 for c=1 and -1. x is input feature, m is mean, s
## is scale, w is learned weight.
fit.sc <- scaling(fit)$x.scale
if(is.null(fit.sc)){
  fit.sc <- list(
    "scaled:center"=c(0,0),
    "scaled:scale"=c(1,1))
}
mu <- fit.sc[["scaled:center"]]
sigma <- fit.sc[["scaled:scale"]]
weight.vec <- colSums(xmatrix(fit)[[1]]*coef(fit)[[1]])
predF.linear <- function(fit, X){
  X.sc <- scale(X, mu, sigma)
  X.sc %*% weight.vec - b(fit)
}
abline.dt <- data.table(
  y=factor(c(-1,0,1)),
  boundary=c("margin", "decision", "margin"),
  intercept=mu[2]+sigma[2]/weight.vec[2]*(
    c(-1, 0, 1)+b(fit)+weight.vec[1]*mu[1]/sigma[1]),
  slope=-weight.vec[1]*sigma[2]/(weight.vec[2]*sigma[1]))
ggplot()+
  theme_bw()+
  scale_linetype_manual(values=c(margin="dashed", decision="solid"))+
  geom_abline(aes(
    slope=slope, intercept=intercept, linetype=boundary),
    color="green",
    size=1,
    data=abline.dt)+
  geom_point(aes(
    x1.sq, x2.sq, color=y),
    shape=21,
    fill=NA,
    size=4,
    data=subtrain.dt)+
  geom_contour(aes(
    x1.sq, x2.sq, z=pred.f),
    breaks=0,
    color="black",
    data=grid.sq.dt)+
  geom_contour(aes(
    x1.sq, x2.sq, z=pred.f),
    breaks=c(-1, 1),
    color="black",
    linetype="dashed",
    data=grid.sq.dt)

The plot above confirms that our computation of the slope and intercepts (green lines) agrees with the contours (black lines). In the plot below, we show the learned alpha coefficients, and add a geom_segment to visualize the slack.

subtrain.dt[, alpha := 0]
train.row.vec <- as.integer(rownames(xmatrix(fit)[[1]]))
subtrain.dt[train.row.vec, alpha := kernlab::alpha(fit)[[1]] ]
subtrain.dt[, status := ifelse(
  alpha==0, "alpha=0",
  ifelse(alpha==1, "alpha=C", "0<alpha<C"))]
slack.slope <- weight.vec[2]*sigma[1]/(weight.vec[1]*sigma[2])
slack.dt <- subtrain.dt[alpha==1,]
slack.join <- abline.dt[slack.dt, on=list(y)]
slack.join[, x1.sq.margin := (
  x2.sq-slack.slope*x1.sq-intercept)/(slope-slack.slope)]
slack.join[, x2.sq.margin := slope*x1.sq.margin + intercept]
sv.colors <- c(
  "alpha=0"="white",
  "0<alpha<C"="black",
  "alpha=C"="grey")
ggplot()+
  theme_bw()+
  scale_linetype_manual(values=c(margin="dashed", decision="solid"))+
  geom_vline(aes(
    xintercept=boundary), color="violet",
    data=x1sq.boundary)+
  geom_abline(aes(
    slope=slope, intercept=intercept, linetype=boundary),
    size=1,
    data=abline.dt)+
  geom_segment(aes(
    x1.sq, x2.sq,
    xend=x1.sq.margin, yend=x2.sq.margin),
    color="grey",
    data=slack.join)+
  scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+
  geom_point(aes(
    x1.sq, x2.sq, color=y, fill=status),
    shape=21,
    size=4,
    data=subtrain.dt)

The plot above shows the slack in grey segments, and the decision and margin lines in black. The Bayes decision boundary is shown in the background as a vertical violet line. The support vectors are the points with non-zero alpha coefficients. Black filled support vectors are on the margin, and grey support vectors are on the wrong side of the margin (and have non-zero slack). The plot below shows the model that was learned in the original feature space,

n.grid <- 41
x.vec <- seq(-1, 1, l=n.grid)
grid.dt <- data.table(expand.grid(
  x1=x.vec,
  x2=x.vec))
getBoundaryDF <- function(score.vec, level.vec=c(-1, 0, 1)){
  stopifnot(length(score.vec) == n.grid * n.grid)
  several.paths <- contourLines(
    x.vec, x.vec,
    matrix(score.vec, n.grid, n.grid),
    levels=level.vec)
  contour.list <- list()
  for(path.i in seq_along(several.paths)){
    contour.list[[path.i]] <- with(several.paths[[path.i]], data.table(
      path.i,
      level.num=as.numeric(level),
      level.fac=factor(level, level.vec),
      boundary=ifelse(level==0, "decision", "margin"),
      x1=x, x2=y))
  }
  do.call(rbind, contour.list)
}
grid.dt[, pred.f := predF(fit, cbind(x1^2, x2^2))]
boundaries <- grid.dt[, getBoundaryDF(pred.f)]
ggplot()+
  scale_linetype_manual(values=c(margin="dashed", decision="solid"))+
  geom_vline(aes(
    xintercept=boundary),
    color="violet",
    data=x1.boundaries)+
  geom_path(aes(
    x1, x2, group=path.i, linetype=boundary),
    size=1,
    data=boundaries)+
  scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+
  scale_size_manual(values=c(correct=2, error=4))+
  geom_point(aes(
    x1, x2, color=y,
    size=train.error,
    fill=status),
    shape=21,
    data=subtrain.dt)

The goal below will be to make an animint that shows how the decision boundary, margin, and slack change as a function of the cost parameter.

modelInfo.list <- list()
predictions.list <- list()
slackSegs.list <- list()
modelLines.list <- list()
inputBoundaries.list <- list()
setErrors.list <- list()
cost.by <- 0.2
for(cost.param in round(10^seq(-1, 1, by=cost.by),1)){
  fit <- ksvm(
    squared.mat, y.vec, kernel="vanilladot", scaled=FALSE, C=cost.param)
  fit.sc <- scaling(fit)$x.scale
  if(is.null(fit.sc)){
    fit.sc <- list(
      "scaled:center"=c(0,0),
      "scaled:scale"=c(1,1))
  }
  mu <- fit.sc[["scaled:center"]]
  sigma <- fit.sc[["scaled:scale"]]
  weight.vec <- colSums(xmatrix(fit)[[1]]*coef(fit)[[1]])
  grid.sq.dt[, pred.f := predF(fit, cbind(x1.sq, x2.sq))]
  data.dt[, pred.y := predict(fit, cbind(x1.sq, x2.sq))]
  one.error <- data.dt[, list(errors=sum(y!=pred.y)), by=set]
  setErrors.list[[paste(cost.param)]] <- data.table(
    cost.param, one.error)
  subtrain.dt[, pred.f := predF(fit, cbind(x1^2, x2^2))]
  grid.dt[, pred.f := predF(fit, cbind(x1^2, x2^2))]
  boundaries <- getBoundaryDF(grid.dt$pred.f)
  inputBoundaries.list[[paste(cost.param)]] <- data.table(
    cost.param, boundaries)
  subtrain.dt$alpha <- 0
  train.row.vec <- as.integer(rownames(xmatrix(fit)[[1]]))
  subtrain.dt[train.row.vec, alpha := kernlab::alpha(fit)[[1]] ]
  subtrain.dt[, status := ifelse(
    alpha==0, "alpha=0",
    ifelse(alpha==cost.param, "alpha=C", "0<alpha<C"))]
  ## The equation of the margin lines is x2 = m2 + s2/w2[c+b+w1*m1/s1]
  ## -s2*w1/(w2*s1)*x1 for c=1 and -1. x is input feature, m is mean, s
  ## is scale, w is learned weight.
  slack.slope <- weight.vec[2]*sigma[1]/(weight.vec[1]*sigma[2])
  abline.dt <- data.table(
    y=factor(c(-1,0,1)),
    boundary=c("margin", "decision", "margin"),
    intercept=mu[2]+sigma[2]/weight.vec[2]*(
      c(-1, 0, 1)+b(fit)+weight.vec[1]*mu[1]/sigma[1]),
    slope=-weight.vec[1]*sigma[2]/(weight.vec[2]*sigma[1]))
  slack.dt <- subtrain.dt[alpha==cost.param]
  slack.join <- abline.dt[slack.dt, on=list(y)]
  slack.join[, x1.sq.margin := (
    x2.sq-slack.slope*x1.sq-intercept)/(slope-slack.slope)]
  slack.join[, x2.sq.margin := slope*x1.sq.margin + intercept]
  norm.weights <- as.numeric(weight.vec %*% weight.vec)
  modelInfo.list[[paste(cost.param)]] <- data.table(
    cost.param,
    slack=slack.join[, sum(1-pred.f*y.num)],
    norm=norm.weights,
    margin=2/sqrt(norm.weights))
  predictions.list[[paste(cost.param)]] <- data.table(
    cost.param, subtrain.dt)
  slackSegs.list[[paste(cost.param)]] <- data.table(
    cost.param, slack.join)
  modelLines.list[[paste(cost.param)]] <- data.table(
    cost.param, abline.dt)
}

 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters  
 Setting default kernel parameters

inputBoundaries <- do.call(rbind, inputBoundaries.list)
predictions <- do.call(rbind, predictions.list)
slackSegs <- do.call(rbind, slackSegs.list)
modelLines <- do.call(rbind, modelLines.list)
modelInfo <- do.call(rbind, modelInfo.list)
setErrors <- do.call(rbind, setErrors.list)
modelInfo.tall <- melt(modelInfo, id.vars="cost.param")
grid.sq.dt$boundary <- "true"
setErrors$variable <- "errors"
inputBoundaries[, boundary := ifelse(
  level.num==0, "decision", "margin")]
slackSegs$boundary <- "margin"
set.label.select <- data.table(
  cost.param=range(setErrors$cost.param),
  set=c("validation", "subtrain"),
  hjust=c(1, 0))
set.labels <- setErrors[set.label.select, on=list(cost.param, set)]
viz.linear.svm <- animint(
  selectModel=ggplot()+
    ggtitle("Select regularization parameter")+
    scale_x_continuous(limits=c(-1.5, 1.5))+
    geom_tallrect(aes(
      xmin=log10(cost.param)-cost.by/2,
      xmax=log10(cost.param)+cost.by/2),
      clickSelects="cost.param",
      alpha=0.5,
      data=modelInfo)+
    theme_bw()+
    facet_grid(variable ~ ., scales="free")+
    geom_line(aes(
      log10(cost.param), errors,
      group=set, color=set),
      data=setErrors)+
    geom_text(aes(
      log10(cost.param), errors-1, label=set,
      hjust=hjust,
      color=set),
      data=set.labels)+
    guides(color="none")+
    geom_line(aes(
      log10(cost.param), log10(value)),
      data=modelInfo.tall),
  inputSpace=ggplot()+
    ggtitle("Input space features")+
    scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+
    geom_vline(aes(
      xintercept=boundary),
      color="violet",
      data=x1.boundaries)+
    guides(color="none", fill="none", linetype="none")+
    scale_linetype_manual(values=c(
      "-1"="dashed",
      "0"="solid",
      "1"="dashed"))+
    geom_path(aes(
      x1, x2,
      group=path.i,
      linetype=level.fac),
      showSelected=c("boundary", "cost.param"),
      color="black",
      data=inputBoundaries)+
    geom_point(aes(
      x1, x2, fill=status),
      showSelected=c("status", "y", "data.i", "cost.param"),
      size=5,
      color="grey",
      data=predictions)+
    geom_point(aes(
      x1, x2, color=y, fill=status),
      showSelected=c("cost.param", "status", "y"),
      clickSelects="data.i",
      size=3,
      data=predictions),
  kernelSpace=ggplot()+
    ggtitle("Kernel space features")+
    geom_vline(aes(
      xintercept=boundary), color="violet",
      data=x1sq.boundary)+
    ##coord_cartesian(xlim=c(0, 1), ylim=c(0, 1))+
    geom_abline(aes(
      slope=slope, intercept=intercept, linetype=boundary),
      showSelected="cost.param",
      color="black",
      data=modelLines)+
    scale_linetype_manual(values=c(
      decision="solid",
      margin="dashed",
      true="solid"))+
    geom_point(aes(
      x1.sq, x2.sq, fill=status),
      showSelected=c("data.i", "cost.param"),
      size=5,
      color="grey",
      data=predictions)+
    geom_point(aes(
      x1.sq, x2.sq, color=y, fill=status),
      clickSelects="data.i",
      showSelected="cost.param",
      size=3,
      data=predictions)+
    scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+
    geom_segment(aes(
      x1.sq, x2.sq, 
      xend=x1.sq.margin, yend=x2.sq.margin),
      showSelected=c("cost.param", "boundary"),
      color="grey",
      data=slackSegs))
viz.linear.svm

12.3 Non-linear polynomial kernel SVM

In the previous section we fit a linear kernel in the squared feature space, which resulted in learning a function which is non-linear in terms of the original feature space. In this section we directly fit a non-linear polynomial kernel in the original space.

predictions.list <- list()
inputBoundaries.list <- list()
setErrors.list <- list()
cost.by <- 0.2
orig.mat <- subtrain.dt[, cbind(x1, x2)]
for(cost.param in 10^seq(-1, 3, by=cost.by)){
  for(degree.num in seq(1, 6, by=1)){
    k <- polydot(degree.num, offset=0)
    fit <- ksvm(
      orig.mat, y.vec, kernel=k, scaled=FALSE, C=cost.param)
    grid.dt[, pred.f := predF(fit, cbind(x1, x2))]
    grid.dt[, pred.y := predict(fit, cbind(x1, x2))]
    grid.dt[, stopifnot(sign(pred.f) == pred.y)]
    data.dt[, pred.y := predict(fit, cbind(x1, x2))]
    one.error <- data.dt[, list(errors=sum(y != pred.y)), by=set]
    setErrors.list[[paste(cost.param, degree.num)]] <- data.table(
      cost.param, degree.num, one.error)
    boundaries <- getBoundaryDF(grid.dt$pred.f)
    if(is.data.frame(boundaries) && nrow(boundaries)){
      cost.deg <- paste(cost.param, degree.num)
      inputBoundaries.list[[cost.deg]] <- data.table(
        cost.param, degree.num, boundaries)
    }
    subtrain.dt[, alpha := 0]
    train.row.vec <- as.integer(rownames(xmatrix(fit)[[1]]))
    subtrain.dt[train.row.vec, alpha := kernlab::alpha(fit)[[1]] ]
    subtrain.dt[, status := ifelse(
      alpha==0, "alpha=0",
      ifelse(alpha==cost.param, "alpha=C", "0<alpha<C"))]
    predictions.list[[paste(cost.param, degree.num)]] <- data.table(
      cost.param, degree.num, subtrain.dt)
  }
}
inputBoundaries <- do.call(rbind, inputBoundaries.list)
predictions <- do.call(rbind, predictions.list)
setErrors <- do.call(rbind, setErrors.list)
validationErrors <- setErrors[set=="validation"]
validationErrors$select <- "degree"
setErrors$select <- "cost"
animint(
  selectModel=ggplot()+
    ggtitle("Select hyper parameters")+
    geom_tallrect(aes(
      xmin=log10(cost.param)-cost.by/2,
      xmax=log10(cost.param)+cost.by/2),
      clickSelects="cost.param",
      alpha=0.5,
      data=setErrors[degree.num==1 & set=="subtrain",])+
    theme_bw()+
    theme(panel.margin=grid::unit(0, "lines"))+
    theme_animint(width=350, rowspan=1)+
    facet_grid(select ~ ., scales="free")+
    ylab("")+
    geom_line(aes(
      log10(cost.param), errors,
      key=set,
      group=set,
      color=set),
      showSelected="degree.num",
      data=setErrors)+
    scale_fill_gradient("validErr", low="white", high="red")+
    geom_tile(aes(
      log10(cost.param), degree.num, fill=errors),
      clickSelects="degree.num",
      data=validationErrors),
  inputSpace=ggplot()+
    theme_bw()+
    ggtitle("Input space features")+
    scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+
    geom_vline(aes(
      xintercept=boundary),
      color="violet",
      data=x1.boundaries)+
    scale_linetype_manual(values=c(
      margin="dashed",
      decision="solid"))+
    geom_path(aes(
      x1, x2,
      group=path.i,
      linetype=boundary),
      showSelected=c("degree.num", "cost.param"),
      color="black",
      data=inputBoundaries)+
    geom_point(aes(
      x1, x2, color=y, fill=status),
      showSelected=c("cost.param", "degree.num"),
      size=3,
      data=predictions))

12.4 Chapter summary and exercises

We used ggplots to visualize the Support Vector Machine model for binary classification. We used animint and interactivity to show how the SVM decision boundary changes as a function of the model hyper-parameters.

Exercises:

Use HTML table layout for viz.linear.svm, so that the two feature space plots appear beside each other, and the “Select regularization parameter” plot appears above or below.
Use rbfdot as the kernel function. Compute subtrain and validation error, then add a new panel to the “select hyper parameters” plot.
Default scales use the same two colors for the y and set legends, which could be confusing. Change the colors in one of the two legends so that they are different.
Use color and color_off parameters to change the appearance of the geom_tile when selected or not, as explained in Chapter 6, section Specifying how selection state is displayed.

Next, Chapter 13 explains how to visualize the Poisson regression model.

# Support Vector Machines ```{r setup, echo=FALSE} knitr::opts_chunk$set(fig.path="ch12-figures/") ``` This goal of this chapter is to create an interactive data visualization that explains the [Support Vector Machine](https://en.wikipedia.org/wiki/Support_vector_machine), a machine learning model for binary classification. Chapter summary: * We begin by simulating some data for binary classification in two dimensions, and making some static plots. * In the second section, we make an interactive data visualization to show how the linear Support Vector Machine decision boundary changes as a function of the cost hyper-parameter. * In the last section, we make an interactive data visualization to show how the decision boundary of the polynomial kernel Support Vector Machine changes as a function of the two hyper-parameters (cost and degree). ## Generate and plot some data {#generate} We begin by generating two input features, `x1` and `x2`. ```{r} library(data.table) N <- 50 set.seed(1) getInput <- function(){ c(#rnorm(N, sd=0.3), runif(N, -1, 1), runif(N, -1, 1) ) } data.dt <- data.table( x1=getInput(), x2=getInput()) library(animint2) ggplot()+ geom_point(aes( x1, x2), data=data.dt) ``` The plot below shows the same data, after computing two additional input features (the squares of the original two inputs). ```{r} data.dt[, let( x1.sq = x1^2, x2.sq = x2^2)][] ggplot()+ geom_point(aes( x1.sq, x2.sq), data=data.dt) ``` In our simulation, we assume that the output score `f` is a linear function of `x1.sq`, and ignores `x2.sq`. The plot below visualizes the output scores using the point fill aesthetic. ```{r} data.dt[, f := x1.sq] true.decision.boundary <- 0.2 ggplot()+ theme_bw()+ scale_fill_gradient2(midpoint=true.decision.boundary)+ geom_point(aes( x1.sq, x2.sq, fill=f), shape=21, color="grey", data=data.dt) ``` In particular, we assume that the label `y` is negative (-1) if `x1.sq + noise < threshold`, and positive (1) otherwise. The plot below visualizes the scores and labels, as a function of the input feature `x1`. It also shows the true score function in black. Of course, we would not be able to make this visualization with real data (only the labels are known in real data, not the scores). ```{r} data.dt[ , f.noise := f+rnorm(N, 0, 0.2) ][ , y.num := ifelse(f.noise<true.decision.boundary, -1, 1) ][ , y := factor(y.num) ] table(data.dt$y) scores <- data.table(x1=seq(-1, 1, l=101))[ , x1.sq := x1^2 ][ , f := x1.sq ] x1.boundaries <- data.table( boundary=c(1, -1)*sqrt(true.decision.boundary)) ggplot()+ scale_y_continuous(breaks=seq(0, 1, by=0.2))+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1.boundaries)+ geom_line(aes( x1, f), data=scores)+ geom_point(aes( x1, f.noise, color=y), shape=21, fill=NA, data=data.dt) ``` The plot below shows the scores and labels, as a function of the squared feature `x1.sq`. It is clear that the score function that we want to learn is linear in `x1.sq`. ```{r} x1sq.boundary <- data.table(boundary=true.decision.boundary) ggplot()+ scale_y_continuous(breaks=seq(0, 1, by=0.2))+ scale_x_continuous(breaks=seq(0, 1, by=0.2))+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1sq.boundary)+ geom_line(aes(x1.sq, f), data=scores)+ geom_point(aes( x1.sq, f.noise, color=y), shape=21, fill=NA, data=data.dt) ``` Next, we visualize the labels in the two-dimensional squared feature space. It is clear that the decision boundary is linear in this space. ```{r} ggplot()+ scale_y_continuous(breaks=seq(0, 1, by=0.2))+ scale_x_continuous(breaks=seq(0, 1, by=0.2))+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1sq.boundary)+ geom_point(aes( x1.sq, x2.sq, color=y), shape=21, fill=NA, data=data.dt) ``` The plot below shows the input feature space (`x1` and `x2`). It is clear that the decision boundary is non-linear in `x1`. ```{r} ggplot()+ scale_y_continuous(breaks=seq(-1, 1, by=0.2))+ scale_x_continuous(breaks=seq(-1, 1, by=0.2))+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1.boundaries)+ geom_point(aes( x1, x2, color=y), shape=21, fill=NA, data=data.dt) ``` The animint below uses `clickSelects` to show which points in the input and squared space correspond. We just need to create an `data.i` variable that has a unique ID for each data point. ```{r, ch12-viz-simple} data.dt[, data.i := 1:.N] YVAR <- function(dt, y.var){ dt$y.var <- factor(y.var, c("x2", "x2.sq", "f")) dt } animint( input=ggplot()+ ggtitle("input feature space")+ theme_bw()+ theme(panel.margin=grid::unit(0, "lines"))+ facet_grid(y.var ~ ., scales="free")+ scale_x_continuous(breaks=seq(-1, 1, by=0.2))+ ylab("")+ guides(color="none")+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1.boundaries)+ geom_point(aes( x1, x2, color=y), clickSelects="data.i", size=4, alpha=0.7, data=YVAR(data.dt, "x2"))+ geom_line(aes( x1, f), data=YVAR(scores, "f"))+ geom_point(aes( x1, f.noise, color=y), clickSelects="data.i", size=4, alpha=0.7, data=YVAR(data.dt, "f")), square=ggplot()+ ggtitle("squared feature space")+ theme_bw()+ theme(panel.margin=grid::unit(0, "lines"))+ facet_grid(y.var ~ ., scales="free")+ ylab("")+ scale_x_continuous(breaks=seq(0, 1, by=0.2))+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1sq.boundary)+ geom_point(aes( x1.sq, x2.sq, color=y), clickSelects="data.i", size=4, alpha=0.7, data=YVAR(data.dt, "x2.sq"))+ geom_line(aes( x1.sq, f), data=YVAR(scores, "f"))+ geom_point(aes( x1.sq, f.noise, color=y), clickSelects="data.i", size=4, alpha=0.7, data=YVAR(data.dt, "f"))) ``` Note how we used two multi-panel plots with the [addColumn then facet](/ch99#addColumn-then-facet) idiom, rather than creating four separate plots. This emphasizes the fact that some plots/facets have a common `x1` or `x1.sq` axis. Note that we also hid the color legend in the first plot, since it is sufficient to just have one color legend. ## Linear SVM {#linear-svm} ```{r} train.i <- 1:N data.dt[ , set := "validation" ][ train.i, set := "subtrain" ] table(data.dt$set) subtrain.dt <- data.dt[set=="subtrain",] ggplot()+ theme_bw()+ theme(panel.margin=grid::unit(0, "lines"))+ facet_grid(set ~ .)+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1.boundaries)+ scale_y_continuous(breaks=seq(-1, 1, by=0.2))+ scale_x_continuous(breaks=seq(-1, 1, by=0.2))+ geom_point(aes( x1, x2, color=y), data=data.dt) ``` We begin by fitting a linear SVM to the train data in the squared feature space, and visualizing the true labels `y` along with the predicted labels `pred.y`. ```{r} library(kernlab) squared.mat <- subtrain.dt[, cbind(x1.sq, x2.sq)] y.vec <- subtrain.dt$y fit <- ksvm(squared.mat, y.vec, kernel="vanilladot") subtrain.dt$pred.y <- predict(fit) ggplot()+ geom_point(aes( x1.sq, x2.sq, color=pred.y, fill=y), shape=21, size=4, stroke=2, data=subtrain.dt) ``` It is clear from the plot above that there are several mis-classified train data points. In the plot below we visualize the decision boundary and margin. ```{r} predF <- function(fit, X){ fit.sc <- scaling(fit)$x.scale if(is.null(fit.sc)){ fit.sc <- list( "scaled:center"=c(0,0), "scaled:scale"=c(1,1)) } mu <- fit.sc[["scaled:center"]] sigma <- fit.sc[["scaled:scale"]] X.sc <- scale(X, mu, sigma) kernelMult( kernelf(fit), X.sc, xmatrix(fit)[[1]], coef(fit)[[1]])-b(fit) } xsq.vec <- seq(0, 1, l=41) grid.sq.dt <- data.table(expand.grid( x1.sq=xsq.vec, x2.sq=xsq.vec ))[ , pred.f := predF(fit, cbind(x1.sq, x2.sq))] subtrain.dt[, train.error := ifelse(y==pred.y, "correct", "error")] ggplot()+ theme_bw()+ scale_color_manual(values=c(error="black", correct=NA))+ geom_point(aes( x1.sq, x2.sq, fill=y, color=train.error), shape=21, stroke=1, size=4, data=subtrain.dt)+ geom_vline(aes( xintercept=boundary), color="grey50", data=x1sq.boundary)+ geom_contour(aes( x1.sq, x2.sq, z=pred.f), breaks=0, color="black", data=grid.sq.dt)+ geom_contour(aes( x1.sq, x2.sq, z=pred.f), breaks=c(-1, 1), color="black", linetype="dashed", data=grid.sq.dt) ``` The plot above shows the true decision boundary using a grey `vline`. It also uses `geom_contour` to display the decision boundary (solid black line, predicted score 0) and the margin (dashed black line, predicted score -1 and 1). Since the decision boundary and margin are linear in this space, we can also use `geom_abline` to display them. To do that we need to do some math, and work out the equations for the slope and intercepts of those lines (as a function of the learned bias `b(fit)` and `weight.vec`, as well as the scale parameters `mu` and `sigma`). ```{r} ## The equation of the margin lines is x2 = m2 + s2/w2[c+b+w1*m1/s1] ## -s2*w1/(w2*s1)*x1 for c=1 and -1. x is input feature, m is mean, s ## is scale, w is learned weight. fit.sc <- scaling(fit)$x.scale if(is.null(fit.sc)){ fit.sc <- list( "scaled:center"=c(0,0), "scaled:scale"=c(1,1)) } mu <- fit.sc[["scaled:center"]] sigma <- fit.sc[["scaled:scale"]] weight.vec <- colSums(xmatrix(fit)[[1]]*coef(fit)[[1]]) predF.linear <- function(fit, X){ X.sc <- scale(X, mu, sigma) X.sc %*% weight.vec - b(fit) } abline.dt <- data.table( y=factor(c(-1,0,1)), boundary=c("margin", "decision", "margin"), intercept=mu[2]+sigma[2]/weight.vec[2]*( c(-1, 0, 1)+b(fit)+weight.vec[1]*mu[1]/sigma[1]), slope=-weight.vec[1]*sigma[2]/(weight.vec[2]*sigma[1])) ggplot()+ theme_bw()+ scale_linetype_manual(values=c(margin="dashed", decision="solid"))+ geom_abline(aes( slope=slope, intercept=intercept, linetype=boundary), color="green", size=1, data=abline.dt)+ geom_point(aes( x1.sq, x2.sq, color=y), shape=21, fill=NA, size=4, data=subtrain.dt)+ geom_contour(aes( x1.sq, x2.sq, z=pred.f), breaks=0, color="black", data=grid.sq.dt)+ geom_contour(aes( x1.sq, x2.sq, z=pred.f), breaks=c(-1, 1), color="black", linetype="dashed", data=grid.sq.dt) ``` The plot above confirms that our computation of the slope and intercepts (green lines) agrees with the contours (black lines). In the plot below, we show the learned `alpha` coefficients, and add a `geom_segment` to visualize the slack. ```{r} subtrain.dt[, alpha := 0] train.row.vec <- as.integer(rownames(xmatrix(fit)[[1]])) subtrain.dt[train.row.vec, alpha := kernlab::alpha(fit)[[1]] ] subtrain.dt[, status := ifelse( alpha==0, "alpha=0", ifelse(alpha==1, "alpha=C", "0<alpha<C"))] slack.slope <- weight.vec[2]*sigma[1]/(weight.vec[1]*sigma[2]) slack.dt <- subtrain.dt[alpha==1,] slack.join <- abline.dt[slack.dt, on=list(y)] slack.join[, x1.sq.margin := ( x2.sq-slack.slope*x1.sq-intercept)/(slope-slack.slope)] slack.join[, x2.sq.margin := slope*x1.sq.margin + intercept] sv.colors <- c( "alpha=0"="white", "0<alpha<C"="black", "alpha=C"="grey") ggplot()+ theme_bw()+ scale_linetype_manual(values=c(margin="dashed", decision="solid"))+ geom_vline(aes( xintercept=boundary), color="violet", data=x1sq.boundary)+ geom_abline(aes( slope=slope, intercept=intercept, linetype=boundary), size=1, data=abline.dt)+ geom_segment(aes( x1.sq, x2.sq, xend=x1.sq.margin, yend=x2.sq.margin), color="grey", data=slack.join)+ scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+ geom_point(aes( x1.sq, x2.sq, color=y, fill=status), shape=21, size=4, data=subtrain.dt) ``` The plot above shows the slack in grey segments, and the decision and margin lines in black. The Bayes decision boundary is shown in the background as a vertical violet line. The support vectors are the points with non-zero alpha coefficients. Black filled support vectors are on the margin, and grey support vectors are on the wrong side of the margin (and have non-zero slack). The plot below shows the model that was learned in the original feature space, ```{r} n.grid <- 41 x.vec <- seq(-1, 1, l=n.grid) grid.dt <- data.table(expand.grid( x1=x.vec, x2=x.vec)) getBoundaryDF <- function(score.vec, level.vec=c(-1, 0, 1)){ stopifnot(length(score.vec) == n.grid * n.grid) several.paths <- contourLines( x.vec, x.vec, matrix(score.vec, n.grid, n.grid), levels=level.vec) contour.list <- list() for(path.i in seq_along(several.paths)){ contour.list[[path.i]] <- with(several.paths[[path.i]], data.table( path.i, level.num=as.numeric(level), level.fac=factor(level, level.vec), boundary=ifelse(level==0, "decision", "margin"), x1=x, x2=y)) } do.call(rbind, contour.list) } grid.dt[, pred.f := predF(fit, cbind(x1^2, x2^2))] boundaries <- grid.dt[, getBoundaryDF(pred.f)] ggplot()+ scale_linetype_manual(values=c(margin="dashed", decision="solid"))+ geom_vline(aes( xintercept=boundary), color="violet", data=x1.boundaries)+ geom_path(aes( x1, x2, group=path.i, linetype=boundary), size=1, data=boundaries)+ scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+ scale_size_manual(values=c(correct=2, error=4))+ geom_point(aes( x1, x2, color=y, size=train.error, fill=status), shape=21, data=subtrain.dt) ``` The goal below will be to make an animint that shows how the decision boundary, margin, and slack change as a function of the cost parameter. ```{r ch12-viz-linear} modelInfo.list <- list() predictions.list <- list() slackSegs.list <- list() modelLines.list <- list() inputBoundaries.list <- list() setErrors.list <- list() cost.by <- 0.2 for(cost.param in round(10^seq(-1, 1, by=cost.by),1)){ fit <- ksvm( squared.mat, y.vec, kernel="vanilladot", scaled=FALSE, C=cost.param) fit.sc <- scaling(fit)$x.scale if(is.null(fit.sc)){ fit.sc <- list( "scaled:center"=c(0,0), "scaled:scale"=c(1,1)) } mu <- fit.sc[["scaled:center"]] sigma <- fit.sc[["scaled:scale"]] weight.vec <- colSums(xmatrix(fit)[[1]]*coef(fit)[[1]]) grid.sq.dt[, pred.f := predF(fit, cbind(x1.sq, x2.sq))] data.dt[, pred.y := predict(fit, cbind(x1.sq, x2.sq))] one.error <- data.dt[, list(errors=sum(y!=pred.y)), by=set] setErrors.list[[paste(cost.param)]] <- data.table( cost.param, one.error) subtrain.dt[, pred.f := predF(fit, cbind(x1^2, x2^2))] grid.dt[, pred.f := predF(fit, cbind(x1^2, x2^2))] boundaries <- getBoundaryDF(grid.dt$pred.f) inputBoundaries.list[[paste(cost.param)]] <- data.table( cost.param, boundaries) subtrain.dt$alpha <- 0 train.row.vec <- as.integer(rownames(xmatrix(fit)[[1]])) subtrain.dt[train.row.vec, alpha := kernlab::alpha(fit)[[1]] ] subtrain.dt[, status := ifelse( alpha==0, "alpha=0", ifelse(alpha==cost.param, "alpha=C", "0<alpha<C"))] ## The equation of the margin lines is x2 = m2 + s2/w2[c+b+w1*m1/s1] ## -s2*w1/(w2*s1)*x1 for c=1 and -1. x is input feature, m is mean, s ## is scale, w is learned weight. slack.slope <- weight.vec[2]*sigma[1]/(weight.vec[1]*sigma[2]) abline.dt <- data.table( y=factor(c(-1,0,1)), boundary=c("margin", "decision", "margin"), intercept=mu[2]+sigma[2]/weight.vec[2]*( c(-1, 0, 1)+b(fit)+weight.vec[1]*mu[1]/sigma[1]), slope=-weight.vec[1]*sigma[2]/(weight.vec[2]*sigma[1])) slack.dt <- subtrain.dt[alpha==cost.param] slack.join <- abline.dt[slack.dt, on=list(y)] slack.join[, x1.sq.margin := ( x2.sq-slack.slope*x1.sq-intercept)/(slope-slack.slope)] slack.join[, x2.sq.margin := slope*x1.sq.margin + intercept] norm.weights <- as.numeric(weight.vec %*% weight.vec) modelInfo.list[[paste(cost.param)]] <- data.table( cost.param, slack=slack.join[, sum(1-pred.f*y.num)], norm=norm.weights, margin=2/sqrt(norm.weights)) predictions.list[[paste(cost.param)]] <- data.table( cost.param, subtrain.dt) slackSegs.list[[paste(cost.param)]] <- data.table( cost.param, slack.join) modelLines.list[[paste(cost.param)]] <- data.table( cost.param, abline.dt) } inputBoundaries <- do.call(rbind, inputBoundaries.list) predictions <- do.call(rbind, predictions.list) slackSegs <- do.call(rbind, slackSegs.list) modelLines <- do.call(rbind, modelLines.list) modelInfo <- do.call(rbind, modelInfo.list) setErrors <- do.call(rbind, setErrors.list) modelInfo.tall <- melt(modelInfo, id.vars="cost.param") grid.sq.dt$boundary <- "true" setErrors$variable <- "errors" inputBoundaries[, boundary := ifelse( level.num==0, "decision", "margin")] slackSegs$boundary <- "margin" set.label.select <- data.table( cost.param=range(setErrors$cost.param), set=c("validation", "subtrain"), hjust=c(1, 0)) set.labels <- setErrors[set.label.select, on=list(cost.param, set)] viz.linear.svm <- animint( selectModel=ggplot()+ ggtitle("Select regularization parameter")+ scale_x_continuous(limits=c(-1.5, 1.5))+ geom_tallrect(aes( xmin=log10(cost.param)-cost.by/2, xmax=log10(cost.param)+cost.by/2), clickSelects="cost.param", alpha=0.5, data=modelInfo)+ theme_bw()+ facet_grid(variable ~ ., scales="free")+ geom_line(aes( log10(cost.param), errors, group=set, color=set), data=setErrors)+ geom_text(aes( log10(cost.param), errors-1, label=set, hjust=hjust, color=set), data=set.labels)+ guides(color="none")+ geom_line(aes( log10(cost.param), log10(value)), data=modelInfo.tall), inputSpace=ggplot()+ ggtitle("Input space features")+ scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+ geom_vline(aes( xintercept=boundary), color="violet", data=x1.boundaries)+ guides(color="none", fill="none", linetype="none")+ scale_linetype_manual(values=c( "-1"="dashed", "0"="solid", "1"="dashed"))+ geom_path(aes( x1, x2, group=path.i, linetype=level.fac), showSelected=c("boundary", "cost.param"), color="black", data=inputBoundaries)+ geom_point(aes( x1, x2, fill=status), showSelected=c("status", "y", "data.i", "cost.param"), size=5, color="grey", data=predictions)+ geom_point(aes( x1, x2, color=y, fill=status), showSelected=c("cost.param", "status", "y"), clickSelects="data.i", size=3, data=predictions), kernelSpace=ggplot()+ ggtitle("Kernel space features")+ geom_vline(aes( xintercept=boundary), color="violet", data=x1sq.boundary)+ ##coord_cartesian(xlim=c(0, 1), ylim=c(0, 1))+ geom_abline(aes( slope=slope, intercept=intercept, linetype=boundary), showSelected="cost.param", color="black", data=modelLines)+ scale_linetype_manual(values=c( decision="solid", margin="dashed", true="solid"))+ geom_point(aes( x1.sq, x2.sq, fill=status), showSelected=c("data.i", "cost.param"), size=5, color="grey", data=predictions)+ geom_point(aes( x1.sq, x2.sq, color=y, fill=status), clickSelects="data.i", showSelected="cost.param", size=3, data=predictions)+ scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+ geom_segment(aes( x1.sq, x2.sq, xend=x1.sq.margin, yend=x2.sq.margin), showSelected=c("cost.param", "boundary"), color="grey", data=slackSegs)) viz.linear.svm ``` ## Non-linear polynomial kernel SVM {#nonlinear-polynomial-kernel-svm} In the previous section we fit a linear kernel in the squared feature space, which resulted in learning a function which is non-linear in terms of the original feature space. In this section we directly fit a non-linear polynomial kernel in the original space. ```{r ch12-viz-poly} predictions.list <- list() inputBoundaries.list <- list() setErrors.list <- list() cost.by <- 0.2 orig.mat <- subtrain.dt[, cbind(x1, x2)] for(cost.param in 10^seq(-1, 3, by=cost.by)){ for(degree.num in seq(1, 6, by=1)){ k <- polydot(degree.num, offset=0) fit <- ksvm( orig.mat, y.vec, kernel=k, scaled=FALSE, C=cost.param) grid.dt[, pred.f := predF(fit, cbind(x1, x2))] grid.dt[, pred.y := predict(fit, cbind(x1, x2))] grid.dt[, stopifnot(sign(pred.f) == pred.y)] data.dt[, pred.y := predict(fit, cbind(x1, x2))] one.error <- data.dt[, list(errors=sum(y != pred.y)), by=set] setErrors.list[[paste(cost.param, degree.num)]] <- data.table( cost.param, degree.num, one.error) boundaries <- getBoundaryDF(grid.dt$pred.f) if(is.data.frame(boundaries) && nrow(boundaries)){ cost.deg <- paste(cost.param, degree.num) inputBoundaries.list[[cost.deg]] <- data.table( cost.param, degree.num, boundaries) } subtrain.dt[, alpha := 0] train.row.vec <- as.integer(rownames(xmatrix(fit)[[1]])) subtrain.dt[train.row.vec, alpha := kernlab::alpha(fit)[[1]] ] subtrain.dt[, status := ifelse( alpha==0, "alpha=0", ifelse(alpha==cost.param, "alpha=C", "0<alpha<C"))] predictions.list[[paste(cost.param, degree.num)]] <- data.table( cost.param, degree.num, subtrain.dt) } } inputBoundaries <- do.call(rbind, inputBoundaries.list) predictions <- do.call(rbind, predictions.list) setErrors <- do.call(rbind, setErrors.list) validationErrors <- setErrors[set=="validation"] validationErrors$select <- "degree" setErrors$select <- "cost" animint( selectModel=ggplot()+ ggtitle("Select hyper parameters")+ geom_tallrect(aes( xmin=log10(cost.param)-cost.by/2, xmax=log10(cost.param)+cost.by/2), clickSelects="cost.param", alpha=0.5, data=setErrors[degree.num==1 & set=="subtrain",])+ theme_bw()+ theme(panel.margin=grid::unit(0, "lines"))+ theme_animint(width=350, rowspan=1)+ facet_grid(select ~ ., scales="free")+ ylab("")+ geom_line(aes( log10(cost.param), errors, key=set, group=set, color=set), showSelected="degree.num", data=setErrors)+ scale_fill_gradient("validErr", low="white", high="red")+ geom_tile(aes( log10(cost.param), degree.num, fill=errors), clickSelects="degree.num", data=validationErrors), inputSpace=ggplot()+ theme_bw()+ ggtitle("Input space features")+ scale_fill_manual(values=sv.colors, breaks=names(sv.colors))+ geom_vline(aes( xintercept=boundary), color="violet", data=x1.boundaries)+ scale_linetype_manual(values=c( margin="dashed", decision="solid"))+ geom_path(aes( x1, x2, group=path.i, linetype=boundary), showSelected=c("degree.num", "cost.param"), color="black", data=inputBoundaries)+ geom_point(aes( x1, x2, color=y, fill=status), showSelected=c("cost.param", "degree.num"), size=3, data=predictions)) ``` ## Chapter summary and exercises {#ch12-exercises} We used ggplots to visualize the Support Vector Machine model for binary classification. We used animint and interactivity to show how the SVM decision boundary changes as a function of the model hyper-parameters. Exercises: * Use [HTML table layout](/ch06#html-table-layout) for [`viz.linear.svm`](#linear-svm), so that the two feature space plots appear beside each other, and the "Select regularization parameter" plot appears above or below. * Use `rbfdot` as the kernel function. Compute subtrain and validation error, then add a new panel to the "select hyper parameters" plot. * Default scales use the same two colors for the **y** and **set** legends, which could be confusing. Change the colors in one of the two legends so that they are different. * Use `color` and `color_off` parameters to change the appearance of the `geom_tile` when selected or not, as explained in [Chapter 6, section Specifying how selection state is displayed](/ch06#display-selection-state). Next, [Chapter 13](/ch13) explains how to visualize the Poisson regression model.