In previous posts, I have presented histograms of the sample distribution of the monthly returns estimated for
the dynamic trading risk factor. With the most recent data set,
we noted an apparent cluster of months with returns in the region of 2%. When performing an ad hoc analysis of data,
it is my normal practice to make a histogram
of the data and to fit a parametric
model for the underlying probability
density function (usually the generalized error distribution).
Histogramming is a basic and popular statistical technique. Among its benefits are:
- the definition of an histogram is elementary;
- histograms can be rapidly composed by hand (i.e. in the field, without the aid of a computer);
- their interpretation is straight forward;
- their sampling properties are well understood; and,
- fitting a model to the data is a rapid and well understood procedure
Of course, there are also disadvantages, including:
- the analyst has to make an arbitary choice of binning parameters;
- the resulting estimate is a step
function, which likely does not represent the true underlying continuous p.d.f. and cannot be used to estimate the gradient
of that function.
A populate alternative is Kernel
Density Estimation, which estimates the population p.d.f. from the data via a kernel smoothing algorithm applied
to the raw, unbinned, data. This procedure is easily to implement by computer, but not really suitable for use by hand. It has the
advantage of the estimator being smooth and fairly robust with respect to the choice of smoothing kernel. The choice of
smoothing bandwidth, analogous to the bin width in a histogram, is arbitary — but there are many good choices
suggested in the literature. The chart above shows the application of this procedure to our data set: the
monthly returns of the dynamic trading risk factor. This analysis illustrates a feature that, I feel, is not particularly noticable in the
histograms previously made — that the data has a pronounced negative skewness. It doesn't appear to back the hypothesis that there is
an anomalous cluster at exactly 2%. On the subject of the origin of this skew, of course, this data is silent. A suspicious mind might suggest
that there is a tendency to adopt cookie
jar accounting in a non-public company with obscure accounts — but salacious speculation is all we can really propose on the basis
of this data alone.