Linearly Forecastable Processes and Forecastable Processes that are Not Linearly Forecastable

by Graham Giller January 30, 2012 14:02

In order to understand how Machine Learning can be applied to the problem of discovering optimal trading strategies, one has to understand how traditional analysis is applied to this topic. The basic concepts are what I'm calling Linearly Forecastable Processes and Affine Statistics. I'll start by definining a linearly forecastable process as one that may be written:

dP = α dt + σ dX.

i.e. The change in a price may be separated into a linear combination of a conditional mean, which is locally deterministic, and a stochastic part, or innovation, which is independent of the price. Now, any stochastic process may be written in this manner, since it's changes must have a mean and the distribution of changes can therefore always be conditionally centered, but — and this is important — it is not always true that the locally stochastic part is independent of the conditional mean.

An easy example of this is the discrete Markov Chain, such as that which might be used to describe a price process at high frequency. The change expressed by the conditional mean likely does not coincide with the domain the chain may occupy, and for the linear decomposition to be valid the distribution of the innovation must be contorted to deliver a change in state that does coincide the the domain the process may occupy. This constraint necessarily makes the innovation not independent of the conditional mean.

 

Machine Learning and Optimal Trading

by Graham Giller January 26, 2012 00:07
I am currently writing about using Machine Learning algorithms to discover Optimal Trading Rules. This work will be devided into several parts, the first of which is about developing an appropriate training set to use for the Machine Learning algorithm. I am working around an idea based on the use of Oracles in forecasting. I will put a draft of this paper on my author page at the SSRN when it is complete.  

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Theory

Some Observations on the Interface Between Machine Learning Methods and Classical Statistical Inference

by Graham Giller August 16, 2010 12:34

The current focus of development work for high-frequency trading is centred firmly within the regime of machine learning methods. Since these methods have been so successful, it is important to examine the question of the origin of their success relative to the methods of classical inference. Here we will assume that their success is based on utility rather than habit — i.e. These methods have been adopted by high frequency traders because they are methods that work well, not because they are the methods that high frequency traders were familiar with from earlier in their careers

Within the regime of machine learning methods, the support vector methods developed by Vapnik et al. have become very successful. Starting with the goal of producing effective binary classification systems, these methods have been developed into more wide-reaching analytical strategies by the augmentation of the basic concept with several interesting ideas.

The introduction of a non-linear feature space, and associated dimensional regularization methods, has broken the paradigm of simple linear models often constructed in classically motivated analysis. In addition, ε-insensitive regression, created to bridge the link between the well motivated classification methods and empirically useful regression paradigms, involves some interesting ideas quite distinct from those normally used in classical statistical inference.

In ε-insensitive regression we seek to place little weight on residuals less than a critical threshold and linearly increasing weight on more deviant draws. However, in classical analysis (both in the least-squares family and full maximum likelihood methods) we place the most weight on the core of the distribution of residuals, by simple virtue of their much higher frequencies, and, in the case of least squares and associated methods, quadratically increasing weight on more deviant draws.

In fact, a central concept in the initial development of the support vector family of methods is the representation of prediction formulæ by a function of a pruned subset of the training set data vectors (the so-called support vectors) whereas, in classical analysis, the prediction formulæ are a function of the entire training set of data vectors.

As we increase our data's temporal resolution, we encounter phenomenology that causes classical methods to become less useful. Yet, if we seek to maintain the explanatory power and structural insights of classic methods, we need to adopt and adapt the aspects of machine learning methods that have proved so successful at very high frequencies. We can do this by identifying which particular aspects of machine learning approaches are driving their success and constructing analogues for use in classical inference.

 

Methods for Interior Analysis

by Graham Giller July 29, 2010 10:07

Some of my recent work has used “interior” data to compute metrics which are then analysed by regular time-series methods. I've recently been thinking about paths to generalize this methodology for the analysis of high-frequency data.

Schematic of interior data and analysis methods

The above diagram, I hope, illustrates what I mean by interior data. With regular time-series analysis we typically look at price changes over a homogeneous sequence of intervals and construct linear functions of lagged price changes. This is represented by the lower half of the data.

With higher frequency data, it is easy (i.e. cheap) to obtain data that summarizes trading activity down to resolutions of around one minute via the standard structure of “price bars.” The problem with this data is that, as the data frequency is increased, the data sequence becomes sparse, i.e. there are many intervals that do not contain trading activity, and quantized, i.e. we start to see the fundamental pricing interval of $0.01 strongly. Both of these factors disrupt the utility of classic time-series analysis.

These factors push one to analyze larger time intervals in order that a decent quantity of acceptably continuous data exists. With that constraint, it is easy to then only look in the direction of the data sampled at these larger time intervals. However, there is a set of data that exists within the interior of the larger time intervals, and my thoughts are that we should be able to assemble some kind of non-linear analytical machine to process the set of interior state vectors and forecast the evolution of the exterior process from that.

 

The Win-Loss Statistic

by Graham Giller May 26, 2009 23:03

When I started my career I was asked to analyze the performance of a volatility arbitrage system. One of the tools we looked at was the Market Information Machine (XMIM) produced by Logical Information Machines. In 1994 and 1995, which I when I was using the product, this was a data mining product for traders that allows one to simulate trading by rules such as

when gold is up more than 5% and oil is down more than 2% then sell gold
for example. The MIM people had put a lot of effort into crafting natural language queries, because back in the day traders were thought of as a group that could barely put together an Excel worksheet.

One of the outputs of the system was a statistical breakdown of winning trades and losing trades, in particular the number of winning trades versus the number of losing trades and this reminded me of all the work I had done with event rate counting when I was working in statistical astronomy with cosmic rays for my doctoral research.

Now, for a trading system the number of winning trades versus the number of losing trades is actually an irrelevant metric. What is important is the total dollars won versus the total dollars lost. I have worked with many trading systems over the past 15 years where there were always more losing trades than winning trades, but the winning trades paid out much more than the losing trades lost (this is often true of momentum strategies, for example).

However, for a humans the truth is that it is emotionally easier to deal with a system which makes money more often than it loses, rather than one with skewed payoffs that loses money most of the time and occasionally makes a lot of money. In fact, I would go so far as to suggest that one reason certain dynamic trading anomalies exist in the market, even though they are fairly easy to identify, is because they are so difficult to live with from the perspective of a human risk manager.

The prior philosophical discussion not withstanding, let's look at a statistic that allows us to assess whether there is an excess of up or down items (be it days, trades, stocks). Let's start off by reviewing the statistics of counting. Basically, if we are counting the occurrences of a random event, and that event is one which occurs at a characteristic rate, then the number of events that occur within a particular sample are drawn from the Poisson Distribution. This distribution is fairly basic, and can be derived as the consequences of a binary process (that the event either does or does not occur within an interval that is very small).

The key thing to remember about the Poisson distribution is that

if an event has an expected rate N per interval then the population standard deviation for the interval is √N

What this means from a statistical point of view is that

you need four times the data a achieve twice the accuracy in sampled measure

At the Soudan 2 Experiment, which was a proton-decay experiment, my collaborators were also interested in working on Atmospheric Neutrino Oscillation phenomena, which is the observed flavour change of the muon neutrinos created in extensive air showers, which are the result of very high energy cosmic ray impacts with the upper atmosphere.

We were looking to count the numbers of events that could be associated with electron neutrinos and muon neutrinos, and to compare those numbers to theory. Several people cast around for statistics to measure the (fairly low) event rates. We wanted a statistical that was standardized in the statistical sense. Most members used the fairly direct measure

LaTeX Rendered by www.forkosh.com/mathtex.html

I felt at the time, and still do, that this statistic treats both of its elements asymmetrically, and that this is unfair. Instead of something modelled on the the above, I like to work with an event statistic that looks like

LaTeX Rendered by www.forkosh.com/mathtex.html

This statistic does not favour one channel over the other and is, in the limit of large numbers, statistically standardized (i.e. the WL∼N(0,1) meaning that it is Normally Distributed with a zero mean and unit standard deviation.

Some might argue that in a binary state system — i.e. either you win or you lose — that this is not right as the N(wins) and N(losses) are the results of a binary choice that occurs a fixed number of times, and so we should use the Binomial Distribution to describe our samples. However, in a system where we do not trade on every signal, and which has the possibility of neither making nor losing money, this is not correct. For Barrier Trading Systems, the number of trades follows Poisson Statistics and the lack of a winning trade does not guarantee a losing trade — so my measure is ok. It is what I refer to as the Win-Loss Statistic in the charts and analyses on this site.

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Heuristics | Theory

Rounding --- An Implicit Buy High, Sell Low Strategy

by Graham Giller March 13, 2009 14:10
Last year, before the crash of the emerging markets – pro articulum in general – Prof. Jeremy Siegel was featured in an advert played regularly on CNBC for Wisdom Tree, talking about the inherent "buy high, sell low" strategy embedded in cap. weighted indices.

The basic problem is that when the price of a subset of the index increases then their weight relative to the rest of the index also increases. The index tracking investor is then required to buy more of those components, at their new higher price. If their prices should subsequently decline, then the index tracking investor will be required to sell a little of the investment, for the same reasoning as before, at the new lower price.

Unfortunately, stocks do regularly go up and down relative to each other and so the logic embedded in the previous paragraph represents an embedded buy high – sell low strategy which is overlaid over the basic strategy represented by the index. This is one of the defects of cap. weighted indices and will lead a fund manager that attempts to track such an index to underperform through no fault of their own.

The Markowitz Portfolio is constructed to be Mean-Variance efficient and weights components so that the expected risk-adjusted profit from each position is equal. However, cap. weighting doesn't follow any utility driven formalism and it explicitly contradicts known facts about the market (it overweights large cap. stocks whereas academic reasarch by Fama and French indicates that small cap. stocks consistently outperform).

The adverts. caught my attention because I had just tackled a similar buy high – sell low defect in the basket I own to track the Compact Model Portfolio. The portfolio that tracks the CMP Index is equally weighted, meaning that we allocate the same fraction of the overall equity to each individual investment.

Now equal weighting also has an embedded strategy, but in this case it is reversion rather than momentum. With an equal weighted basket, every time returns occur we need to reduce the position in the stocks that outperformed and increase the position in the stocks that underperformed, in order that we maintain the equal weighting. This is an embedded sell high – buy low strategy.

I was aware of this, but as I watched my basket I realized that I kept repeating the opposite. On the daily rebalance, the strategy would buy some more of a stock that went up at the end of the day and then, then next day, if it lost money, it would sell at a loss. This was repeated again and again.

I finally realized that this was because I was rounding my position into round lots, of a given size. The conventional algorithm for rounding positive numbers is to add one half and then truncate to an integer. The number of lots to hold in a given company is the fraction of the capital allocated to that company divided by the product of the price and the lot size. Following conventional ½ rounding we tend to round up after we've made money and round down after we've lost money. This is an embedded buy high – sell low strategy.

I solved this by rounding against it. I round up on a losing day and round down on a winning day. i.e.

shares=lotsize×⌊capital/(lotsize×price)−½sign δprice⌋.

This seems to work.

n.b. The notation ⌊x⌋ means floor(x) which means the largest integer less than or equal to x. 

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen | Modified by Mooglegiant



RecentComments

Comment RSS

About the Author

Graham Giller - Headshot GRAHAM GILLER
Dr. Giller holds a doctorate from Oxford University in experimental elementary particle physics. His field of research was statistical astronomy using high energy cosmic rays. After leaving Oxford, he worked in the Process Driven Trading Group at Morgan Stanley, as a strategy researcher and portfolio manager. He then ran a CTA/CPO firm which concentrated on trading eurodollar futures using statistical models. From 2004, he has managed a private family investment office. In 2009, he joined a California based hedge fund startup, concentrating on high frequency alpha and volatility forecasting. My updated resume is on LinkedIn.

Pages


Disclaimer

Nothing on this site should be construed as a reccommendation to buy or sell any specific security nor as a solicitation of an order to buy or sell any specific security. Before making any trade for any reason you should consult your own financial advisor. The author may hold long or short positions in any of the securities discussed either before or after publication of an article mentioning such a security.

Copyright Notice

All post on this blog are © Copyright property of Giller Investments (New Jersey), LLC. All comments are the property of their respective authors and neither the author or this blog nor any entity associated with him are responsible for or accept any responsibility for their content. Offensive comments and spam may be removed at the authors discretion.

Data provided on this blog or through links to this blog are either property of Giller Investments (New Jersey), LLC or publicly available or derived from data that is publically available. Any data that is proprietary to Giller Investments (New Jersey), LLC is published here for the public interest and may be reproduced for private research or in public forums provided that suitable attribution and acknowledgement of ownership is made.

Privacy Policy

We use third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.