To finalize our analysis of the Presidential election data, I will quickly compare the usage of a Support Vector Machine, which is a modern classification engine based on research by Vapnik et al., with that of the probit model. The SVM is implemented in several packages for R; in the following we will use that from the kernlab package. The code fragments below illustrate how to compute the probit and SVM models on our data.
pmodel<-glm(PRES~HEIGHT+CHANGE,family=binomial(link='probit'),data=presdata)
presdata$PROBIT<-predict(pmodel,type='response')
library(kernlab)
kmodel<-ksvm(PRES~HEIGHT+CHANGE,type='C-svc',data=presdata)
presdata$SVM<-predict(kmodel)
The table below illustrates the output of both models:
| YEAR | PROBIT | SVM | PRES |
| 1896 | 0.15 | 0 | 1 |
| 1900 | 0.60 | 1 | 1 |
| 1904 | 0.24 | 0 | 1 |
| 1908 | 0.69 | 1 | 1 |
| 1912 | 0.34 | 0 | 0 |
| 1916 | 0.45 | 0 | 0 |
| 1920 | 1.00 | 1 | 1 |
| 1924 | 0.71 | 1 | 1 |
| 1928 | 0.74 | 1 | 1 |
| 1932 | 0.15 | 0 | 0 |
| 1936 | 0.03 | 0 | 0 |
| 1940 | 0.80 | 1 | 0 |
| 1944 | 0.25 | 0 | 0 |
| 1948 | 0.80 | 1 | 0 |
| 1952 | 0.90 | 1 | 1 |
| 1956 | 0.90 | 1 | 1 |
| 1960 | 0.39 | 0 | 0 |
| 1964 | 0.15 | 0 | 0 |
| 1968 | 0.90 | 1 | 1 |
| 1972 | 0.84 | 1 | 1 |
| 1976 | 0.34 | 0 | 0 |
| 1980 | 0.83 | 1 | 1 |
| 1984 | 0.96 | 1 | 1 |
| 1988 | 0.96 | 1 | 1 |
| 1992 | 0.45 | 0 | 0 |
| 1996 | 0.24 | 0 | 0 |
| 2000 | 0.80 | 1 | 1 |
| 2004 | 0.48 | 0 | 1 |
| 2008 | 0.02 | 0 | 0 |
We see from the above table that the SVM is categorizing the data quite accurately, but it can tell us nothing about the likelihood of a particular outcome — because that is not what it has been asked to do. We also built our SVM using the parameter set indicated by our probit analysis, because that system gave us tools to investigate those choices. However, the SVM did accomplish something that we only implicitly asked of the probit analysis, which is to classify the data. Although the probit probabilities are useful out-of-sample in a Bayesian sense, in-sample they do not help us quantify the success of the method — we have to overlay an ad hoc classification scheme to permit that interpretation.