In order to understand how Machine Learning can be applied to the problem of discovering optimal trading strategies, one has to understand how traditional analysis is applied to this topic. The basic concepts are what I'm calling Linearly Forecastable Processes and Affine Statistics. I'll start by definining a linearly forecastable process as one that may be written:
dP = α dt + σ dX.
i.e. The change in a price may be separated into a linear combination of a conditional mean, which is locally deterministic, and a stochastic part, or innovation, which is independent of the price. Now, any stochastic process may be written in this manner, since it's changes must have a mean and the distribution of changes can therefore always be conditionally centered, but — and this is important — it is not always true that the locally stochastic part is independent of the conditional mean.
An easy example of this is the discrete Markov Chain, such as that which might be used to describe a price process at high frequency. The change expressed by the conditional mean likely does not coincide with the domain the chain may occupy, and for the linear decomposition to be valid the distribution of the innovation must be contorted to deliver a change in state that does coincide the the domain the process may occupy. This constraint necessarily makes the innovation not independent of the conditional mean.