Some Observations on the Interface Between Machine Learning Methods and Classical Statistical Inference

by Graham Giller August 16, 2010 12:34

The current focus of development work for high-frequency trading is centred firmly within the regime of machine learning methods. Since these methods have been so successful, it is important to examine the question of the origin of their success relative to the methods of classical inference. Here we will assume that their success is based on utility rather than habit — i.e. These methods have been adopted by high frequency traders because they are methods that work well, not because they are the methods that high frequency traders were familiar with from earlier in their careers

Within the regime of machine learning methods, the support vector methods developed by Vapnik et al. have become very successful. Starting with the goal of producing effective binary classification systems, these methods have been developed into more wide-reaching analytical strategies by the augmentation of the basic concept with several interesting ideas.

The introduction of a non-linear feature space, and associated dimensional regularization methods, has broken the paradigm of simple linear models often constructed in classically motivated analysis. In addition, ε-insensitive regression, created to bridge the link between the well motivated classification methods and empirically useful regression paradigms, involves some interesting ideas quite distinct from those normally used in classical statistical inference.

In ε-insensitive regression we seek to place little weight on residuals less than a critical threshold and linearly increasing weight on more deviant draws. However, in classical analysis (both in the least-squares family and full maximum likelihood methods) we place the most weight on the core of the distribution of residuals, by simple virtue of their much higher frequencies, and, in the case of least squares and associated methods, quadratically increasing weight on more deviant draws.

In fact, a central concept in the initial development of the support vector family of methods is the representation of prediction formulæ by a function of a pruned subset of the training set data vectors (the so-called support vectors) whereas, in classical analysis, the prediction formulæ are a function of the entire training set of data vectors.

As we increase our data's temporal resolution, we encounter phenomenology that causes classical methods to become less useful. Yet, if we seek to maintain the explanatory power and structural insights of classic methods, we need to adopt and adapt the aspects of machine learning methods that have proved so successful at very high frequencies. We can do this by identifying which particular aspects of machine learning approaches are driving their success and constructing analogues for use in classical inference.

 

Comments are closed

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen | Modified by Mooglegiant



RecentComments

Comment RSS

About the Author

Graham Giller - Headshot GRAHAM GILLER
Dr. Giller holds a doctorate from Oxford University in experimental elementary particle physics. His field of research was statistical astronomy using high energy cosmic rays. After leaving Oxford, he worked in the Process Driven Trading Group at Morgan Stanley, as a strategy researcher and portfolio manager. He then ran a CTA/CPO firm which concentrated on trading eurodollar futures using statistical models. From 2004, he has managed a private family investment office. In 2009, he joined a California based hedge fund startup, concentrating on high frequency alpha and volatility forecasting. My updated resume is on LinkedIn.

Pages


Disclaimer

Nothing on this site should be construed as a reccommendation to buy or sell any specific security nor as a solicitation of an order to buy or sell any specific security. Before making any trade for any reason you should consult your own financial advisor. The author may hold long or short positions in any of the securities discussed either before or after publication of an article mentioning such a security.

Copyright Notice

All post on this blog are © Copyright property of Giller Investments (New Jersey), LLC. All comments are the property of their respective authors and neither the author or this blog nor any entity associated with him are responsible for or accept any responsibility for their content. Offensive comments and spam may be removed at the authors discretion.

Data provided on this blog or through links to this blog are either property of Giller Investments (New Jersey), LLC or publicly available or derived from data that is publically available. Any data that is proprietary to Giller Investments (New Jersey), LLC is published here for the public interest and may be reproduced for private research or in public forums provided that suitable attribution and acknowledgement of ownership is made.

Privacy Policy

We use third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.