The current focus of development work for high-frequency trading is centred firmly within the regime of machine learning methods. Since these methods have been so successful, it is important to examine the question of the origin of their success relative to the methods of classical inference. Here we will assume that their success is based on utility rather than habit — i.e. These methods have been adopted by high frequency traders because they are methods that work well, not because they are the methods that high frequency traders were familiar with from earlier in their careers
Within the regime of machine learning methods, the support vector methods developed by Vapnik et al. have become very successful. Starting with the goal of producing effective binary classification systems, these methods have been developed into more wide-reaching analytical strategies by the augmentation of the basic concept with several interesting ideas.
The introduction of a non-linear feature space, and associated dimensional regularization methods, has broken the paradigm of simple linear models often constructed in classically motivated analysis. In addition, ε-insensitive regression, created to bridge the link between the well motivated classification methods and empirically useful regression paradigms, involves some interesting ideas quite distinct from those normally used in classical statistical inference.
In ε-insensitive regression we seek to place little weight on residuals less than a critical threshold and linearly increasing weight on more deviant draws. However, in classical analysis (both in the least-squares family and full maximum likelihood methods) we place the most weight on the core of the distribution of residuals, by simple virtue of their much higher frequencies, and, in the case of least squares and associated methods, quadratically increasing weight on more deviant draws.
In fact, a central concept in the initial development of the support vector family of methods is the representation of prediction formulæ by a function of a pruned subset of the training set data vectors (the so-called support vectors) whereas, in classical analysis, the prediction formulæ are a function of the entire training set of data vectors.
As we increase our data's temporal resolution, we encounter phenomenology that causes classical methods to become less useful. Yet, if we seek to maintain the explanatory power and structural insights of classic methods, we need to adopt and adapt the aspects of machine learning methods that have proved so successful at very high frequencies. We can do this by identifying which particular aspects of machine learning approaches are driving their success and constructing analogues for use in classical inference.