Further Comparison of OLS and Support Vector Machine Models Out-of-Sample

by Graham Giller August 26, 2010 11:23

In the prior post we noted the outperformance of Support Vector Regression over OLS models out-of-sample. This is referred to in the Machine Learning community as their superior ability to generalize. I think that the enhanced statistical reliability coupled with the fact that the univariate response model found by the SVM departs highly from our prior prejudices regarding smooth and low order responses is quite a striking result.

In this post we seek to replicate the response function of the SVM with a high-order polynomial model. This is to investigate whether the superior forecasting skill out-of-sample arises from the lower-order “wriggles” in the SVM response function or from the higher-order “kinks.” This is interesting because we can certainly replicate the lower-order features via classical linear methods, but it is unlikely that we can do such a thing for the higher-order features of the response. Thus we define our linear polynomial model as

LaTeX provided by MathTeX at forkosh.com.

Here the Pn(x) are Legendre Polynomials of order n. These are orthogonal on [-1,1], so are a useful basis to express our response function. Because the functions are orthogonal, the estimators should be independent in expectation. In addition, following Vapnik, we reject the Occam's Razor driven methodology of standard classical statistical analysis to find a parsimonious model (what my ex-boss, Peter Muller, used to refer to as “Keep It Simple Stupid”) and find the N large enough to match the testing set R² of the SVM.

Comparison of SVM and 30th Order Legendre Polynomial Models for Daily Range

The above chart illustrates a 30th order Legendre polynomial model replicating the response of the Support Vector Machine and exhibiting an equivalent out-of-sample forecasting skill. From the point-of-view of classical inference, there is no way an analyst would ever suggest using such a high order model on this data, and the t-statistics for the βn coefficients are all small, yet this is the type of functional response picked out by the Support Vector Machine!

 

Comparison of SVR and OLS Models for Daily Range

by Graham Giller August 25, 2010 14:20

Continuing the recent theme on the application of Machine Learning and Interior Analysis, here we investigate the utility of Support Vector Regression methods versus Ordinary Least Squares. The job is to predict the daily range of the top ranked stock in the Compact Model Portfolio from the prior value of that metric. By Daily Range we mean the ratio of the difference between the Closing Price and Opening Price to the difference between the Highest Price and Lowest Price. I chose this particular metric because it is an interior metric but it doesn't use any microstructure information. It is also frequency discussed in non-academic literature.

The data used is the daily range computed for the top ranked stock in the Compact Model Portfolio, with the period 01/03/2001 – 12/31/2009 use as the training data and the period 01/03/2010 – 08/24/2010 used as the testing data. This division is a simple binary sample cross-validation technique. The analysis was performed in R, and the code is appended to this post.

Before discussing the chart above, which exhibits many interesting features, let's talk about the methods. On the training set, the ksvm procedure was used to execute an ε-insensitive regression and allowed to use it's default methods and tolerances. The OLS procedure lm was similarly run without user tuning. I then used both models to predict responses in the testing set and used an OLS regression of the response onto the forecasts as a simple methodology for evaluating the quality of the systems. Notable differences were found. The out-of-sample β was established to be 2.01417 ± 1.30812 for the OLS model and 1.04366 ± 0.43193 for the SVR model. The 's were 0.01526 and 0.03676, respectively. Thus the SVR model is a much more accurate performer out-of-sample, as it is advertised to be.

The chart exhibits the out-of-sample predictor and response data as well as the OLS regression line and the SVR model. We see that the SVR model contains numerous wriggles and kinks, yet my instinct is to reject the information content of these features — making the assumption that they indicate a need to tune the kernel used by the system. However, intuition is not necessarily truth, so we are in need of a procedure to establish where the superior predictive power of this model comes from. Does it come from some simple non-linearity in response that the algorithm has picked up — or is it actually due to the more funky nature of the model. One way to establish this would be to see if we can create some kind of piecewise linear model that does as well as the SVR.


require(kernlab)
training<-read.table("CMP_Training.txt",header=TRUE)
testing<-read.table("CMP_Testing.txt",header=TRUE)
training$DailyRange<-(training$ClosingPrice-training$OpeningPrice)/(training$HighestPrice-training$LowestPrice)
training$PriorDailyRange<-(training$PriorClosingPrice-training$PriorOpeningPrice)
    /(training$PriorHighestPrice-training$PriorLowestPrice)
testing$DailyRange<-(testing$ClosingPrice-testing$OpeningPrice)/(testing$HighestPrice-testing$LowestPrice)
testing$PriorDailyRange<-(testing$PriorClosingPrice-testing$PriorOpeningPrice)
    /(testing$PriorHighestPrice-testing$PriorLowestPrice)
names(training)
training$sample<-!(is.na(training$DailyRange)|is.na(training$PriorDailyRange))
testing$sample<-!(is.na(testing$DailyRange)|is.na(testing$PriorDailyRange))
summary(linmod<-lm(DailyRange~PriorDailyRange,data=training,subset=training$sample))
summary(lm(testing$DailyRange~predict(linmod,newdata=testing)))
print(svrmod<-ksvm(DailyRange~PriorDailyRange,type='eps-svr',data=training))
summary(lm(rest(testing$DailyRange)~predict(svrmod,newdata=testing)))
isort<-order(rest(testing$PriorDailyRange))
plot(testing$PriorDailyRange[testing$sample],testing$DailyRange[testing$sample],axes=TRUE,
    xlab='Prior Daily Range',ylab='Daily Range',main='Comparison of SVR and OLS Models for Daily Range',
    sub=paste('Data: Compact Model Portfolio, Top Ranked Stock; Resolution: Daily; Training:',
    training$MarkDate[1],'--',training$MarkDate[length(training$MarkDate)],'Testing:',testing$MarkDate[1],
    '--',testing$MarkDate[length(testing$MarkDate)]))
lines(c(-1.1,1.1),c(0,0),col='gray')
lines(c(0,0),c(-1.1,1.1),col='gray')
abline(linmod,col='red')
lines(rest(testing$PriorDailyRange)[isort],predict(svrmod,newdata=testing)[isort],col='blue')
 

Almost 100 Years of Asymmetric Response in the Stock Market

by Graham Giller August 18, 2010 09:58

(I actually did this work before the post on the lack of asymmetric response in interest rates.) In the same manner as in our analysis of interest rates, we can create a time series of independent annual upside and downside response estimators for the S&P 500 Index. The chart below illustrates the covariance of those estimators.

Almost 100 Years of Asymmetric Response

From the chart we see data that illustrates support for a significant positive downside response coefficient and no support for a non-zero upside response coefficient. Furthermore, there is no evidence of covariance between the estimators (remember, our null hypothesis of is Ey = Dy).

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , ,

Some Observations on the Interface Between Machine Learning Methods and Classical Statistical Inference

by Graham Giller August 16, 2010 12:34

The current focus of development work for high-frequency trading is centred firmly within the regime of machine learning methods. Since these methods have been so successful, it is important to examine the question of the origin of their success relative to the methods of classical inference. Here we will assume that their success is based on utility rather than habit — i.e. These methods have been adopted by high frequency traders because they are methods that work well, not because they are the methods that high frequency traders were familiar with from earlier in their careers

Within the regime of machine learning methods, the support vector methods developed by Vapnik et al. have become very successful. Starting with the goal of producing effective binary classification systems, these methods have been developed into more wide-reaching analytical strategies by the augmentation of the basic concept with several interesting ideas.

The introduction of a non-linear feature space, and associated dimensional regularization methods, has broken the paradigm of simple linear models often constructed in classically motivated analysis. In addition, ε-insensitive regression, created to bridge the link between the well motivated classification methods and empirically useful regression paradigms, involves some interesting ideas quite distinct from those normally used in classical statistical inference.

In ε-insensitive regression we seek to place little weight on residuals less than a critical threshold and linearly increasing weight on more deviant draws. However, in classical analysis (both in the least-squares family and full maximum likelihood methods) we place the most weight on the core of the distribution of residuals, by simple virtue of their much higher frequencies, and, in the case of least squares and associated methods, quadratically increasing weight on more deviant draws.

In fact, a central concept in the initial development of the support vector family of methods is the representation of prediction formulæ by a function of a pruned subset of the training set data vectors (the so-called support vectors) whereas, in classical analysis, the prediction formulæ are a function of the entire training set of data vectors.

As we increase our data's temporal resolution, we encounter phenomenology that causes classical methods to become less useful. Yet, if we seek to maintain the explanatory power and structural insights of classic methods, we need to adopt and adapt the aspects of machine learning methods that have proved so successful at very high frequencies. We can do this by identifying which particular aspects of machine learning approaches are driving their success and constructing analogues for use in classical inference.

 

Asymmetric Response is an Attribute of Stock Markets

by Graham Giller August 12, 2010 09:28

To learn more about the asymmetric relationship between empirical volatility and market returns, I thought we should study more data and more asset classes. To augment this analysis, however, I thought it worthwhile to introduce a more suitable parameterization than that available with the standard GJR AGARCH model. Let's call this Piecewise-Quadratic GARCH, or PQGARCH.

LaTeX provided by MathTeX at forkosh.com

I fitted this model to daily interest rate changes of U.S. Three Month Treasury Bills. The data I used is available from the Federal Reserve Bank of St. Louis. The model was fitted independently to each year's daily data, from 1954 to date. This yields an annual time series of parameter estimates Dy and Ey. Our null hypothesis is for regular (i.e. symmetric) GARCH, which implies that the D's and E's should agree within sampling errors. The chart below illustrates the empirical distribution functions for both data series and the results of applying the Kolmogorov-Smirnov test to that data; which implies that we cannot reject the null hypothesis with a confidence greated than 82% — which is insufficient. (Years in which the regression did not converge without “tuning” are omitted.)

Kolmogorov-Smirnov Test for D vs. E

This demonstrates that asymmetric response does not appear to be a characteristic of U.S. interest rate changes over more than half a century of data. i.e. It is an attribute of stock markets not of all markets.

 

Asymmetric Response "Straightens" the Relationship between the VIX and the Empirical Volatility

by Graham Giller August 11, 2010 11:06

In earlier posts we studied the relationship between the VIX Index and GARCH Models of the daily volatility of the S&P 500 Index. We found a quadratic relationship between the log of the VIX and the log of the GARCH model.

Relationship Between CAGARCH(1,1) Model and VIX Index

However, we also found considerable support for completely asymmetric response between empirical volatility and market returns. When we properly incorporate this structure, we find the curvature in the relationship between the VIX Index and the GARCH model is removed, as is illustrated in the above chart.

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , , ,

Empirical

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen | Modified by Mooglegiant



RecentComments

Comment RSS

About the Author

Graham Giller - Headshot GRAHAM GILLER
Dr. Giller holds a doctorate from Oxford University in experimental elementary particle physics. His field of research was statistical astronomy using high energy cosmic rays. After leaving Oxford, he worked in the Process Driven Trading Group at Morgan Stanley, as a strategy researcher and portfolio manager. He then ran a CTA/CPO firm which concentrated on trading eurodollar futures using statistical models. From 2004, he has managed a private family investment office. In 2009, he joined a California based hedge fund startup, concentrating on high frequency alpha and volatility forecasting. My updated resume is on LinkedIn.

Pages


Disclaimer

Nothing on this site should be construed as a reccommendation to buy or sell any specific security nor as a solicitation of an order to buy or sell any specific security. Before making any trade for any reason you should consult your own financial advisor. The author may hold long or short positions in any of the securities discussed either before or after publication of an article mentioning such a security.

Copyright Notice

All post on this blog are © Copyright property of Giller Investments (New Jersey), LLC. All comments are the property of their respective authors and neither the author or this blog nor any entity associated with him are responsible for or accept any responsibility for their content. Offensive comments and spam may be removed at the authors discretion.

Data provided on this blog or through links to this blog are either property of Giller Investments (New Jersey), LLC or publicly available or derived from data that is publically available. Any data that is proprietary to Giller Investments (New Jersey), LLC is published here for the public interest and may be reproduced for private research or in public forums provided that suitable attribution and acknowledgement of ownership is made.

Privacy Policy

We use third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.