Relationship between Body Mass Index and Rate of Soda Consumption

by Graham Giller December 28, 2013 17:54

Over the last year, before I started working for Bloomberg LP (we're hiring for my group), I got quite familiar with the CDC's Behavioural Risk Factor Surveillance System surveys. Over the holiday break, I figured I'd load up the 2012 edition, and see what I could find. The survey includes the following question:

During the past 30 days, how often did you drink regular soda or pop that contains sugar?
Do not include diet soda.

I thought it'd be interesting to ask first if there is any gross correlation between soda consumption and obesity within this sample of approximately half-a-million responses. There is a mild covariance, but it is not statistically significant.


Of course, the BRFSS is not a longitudenal survey so we cannot answer the real question:

Do changes in soda consumption lead to future changes in BMI?


Sidetrack Continued: The Nth Ranked Item in an Aggregation

by Graham Giller August 12, 2012 22:42

Building on the prior post, it's natural to as for the LASTRANKED() item in an aggregation, and trivial to code up, but also to ask for the Nth ranked item in an aggregation. This is a little more complicated as we have to maintain an ordered list of items, have the ability to merge and prune the sublists prepared by separate threads of an execution plan, do that with reasonable efficiency, and yet keep the memory footprint under control. However, not impossible so here is my SQL Server user defined aggregate:


For compactness I decided to code it so that if a call with ranking order N is positive it returns the items based on the ordering from least to greatest and if it is negative based on the ordering from greatest to least. i.e. order +1 is the first item and −1 is the last item. This is somewhat similar to the function NTH() provided by Google's Big Query system, but actually I believe more general in specification.

A note. This code does not make any attempts to keep it's internal data structures within a single database page in size, so if you ask for the millionth ranked item you might generate an exception in the run time. However, this approach allows such limits to be imposed from the outside rather than as a design constraint, so I prefer it that way!

Here's a zipfile will all of the functions: (5.50 kb)


Sidetrack: SQL Server User Defined Aggregates for the First and Last Items in an Aggregation

by Graham Giller August 03, 2012 22:33

I've had these functions kicking around for a while, so I thought I'd share. For some context: I use relational databases as flexible data stores. The developer edition of Microsoft SQL Server is identical to the enterprise edition, but costs a negligible amount of money (~$50) as it is licensed for a single user only. This system is capable of breaking queries into mutiple independently scheduled threads and recombining the results — a MapReduce engine in effect on a single server.

Anyway one thing that comes up a lot in time-series analysis is the need to find the "first" or "last" item in a list. SQL has many aggregation functions but "first" and "last" are not among them. The reason is important: you cannot control the order in which the database engine processes the data, particularly a multi-threaded database engine that may split your data into multiple packets, and so the definitions of "first" and "last" that seem natural to the analyst simply cannot be taught to query processor.

After a little thought I figured out that the key to solving this problem is to define the ordering, or ranking, of the data yourself. Couple that with the capability of SQL Server 2008 (and later) to: a, build user defined aggregates in any windows language; and, b, allow those aggregates to have multiple arguments; and, the problem is solved. Attached to this post is a sample user defined aggregate function which performs this task. 

Basically, we replace the desired aggregate function FIRST(Price), which possesses an implicit temporal ordering, with a user defined aggregate FIRSTRANKED(Price,Time), in which we specify the ordering explicity. That's all, problem solved.  (I also have these functions written in C for MySQL if there's interest.) 

FirstRanked.vb (3.69 kb)


What Might Statistical Evidence of Collusion in LIBOR Fixes Look Like?

by Graham Giller August 01, 2012 22:28

In the previous post I discussed a little history and a little theory about the consequences of a cartel of banks fixing the LIBOR fixes. Fortunately the data for LIBOR is readily available from the Federal Reserve Bank of St. Louis. Similarly, data for the discount rates of three month treasury bills is also available. Our hypothesis is that in a free market the transmission of risk free policy rates to risky market rates should essentially be linear, described by the equation

This says that each bank's individual cost of funds is linearly related to the risk-free-rate and as a consequence the market average of the banks' cost of funds is also linearly related to the risk free rate.
Thus, in an efficient market the stochastic process that is LIBOR should have the same autocovariance structure as that of US Treasury Bills. I'm not saying that bill rates have to be i.i.d. random, I'm just saying that LIBOR should have the same dynamic. So to find out if the LIBOR fixing process is efficient we should test to see if the processes are statistically similar or not.
I'm going to start with a simple model. Let's say that the changes in rate are a Markov chain with three states: up, unchanged, and down. (The actual change in rate is then the product of a direction variable and a size variable.) This state sequence possesses the advantages of being simple to compute and easy to understand. We can then ask whether the state sequences show evidence of the Markov property, that they are predictable from their prior instance, or whether they are independent of their own history. There is a nice test for this, called Whittle's Test, which is essentially about building a contingency table and appying a χ² analysis to the observed counts. It is simple and it is about counting, so there is no introduction of hypotheses of normal distributions or leptokurtocity or GARCH or any time-series sophistication.

This chart shows the result of applying this test independently to each full year for which we can get LIBOR quotes and to the Bill Rates for the same years. The shading indicates when the test rejects the null hypothesis of independent state sequences with a confidence of better than 99%. You can see that the LIBOR sequences are not independent from about 1996 onwards, whereas the Treasury Bill sequences mostly are.

Furthermore, without any statistical tests, I think it is plain to the eye that the LIBOR series is unnaturally smooth in the latter half, particularly when compared to the Treasury Bill series. Thus the hypothesis if neutral transmission of policy rates to market rates is rejected from 1996 onwards. I would suggest that something was going on with LIBOR since at least that date!


Back to an Old Friend — LIBOR

by Graham Giller July 28, 2012 15:37

When I started working for the PDT group in 1996, Peter Muller assigned me the task of building a model to forecast and trade Eurodollar futures. This was from inside what became a star statistical arbitrage group and had nothing to do with that. Our model was to include structural as well as autoregressive features, and so one task that needed to be done was to compute forward curves for LIBOR for all possible delivery maturies, not just those represented by today's tradable eurodollar futures. To build that curve, we wanted a spot rate — 3m LIBOR.

But I never used LIBOR to build my curves. I found that it had statistical anomalies compared to the projected behaviour one would expect from the deep and liquid Eurodollar futures market. It was too sticky — quotes didn't change as much as they should — and this injected errors into the model for predicting the motions of the Eurodollar futures market itself. I concluded that LIBOR, a tool of brokers seeking to attract business rather than a record of actual transactions, was just biased and incorrect. As somebody building a trading model I just moved on.

However, when I heard that the members of the BBA had been caught in the act of manipulating the LIBOR quotes — let's just say, I wasn't very suprised. But I do not believe the story that this all started in 2008 to “save the world.”

Suppose there was a cartel that assembled quotes from members and subsequently punished those that provided outlier quotes to the assembled mean. However, add to this the economic factor these quotes can generate business for members and quoting too far from their idiosyncratic “true” value would also cause economic distress. I believe that this dynamic would create a stochastic process of mean reversion to a consensus level, an Ornstein-Uhlenbeck process, coupled to stochastic but serially correlated jumps in that level. The dynamic being that the punishment for outliers causes cartel members to seek a common consensus, but the pain of being “wong” is sometimes too much for an individual member and they are finally forced to change their quote — causing a jump in the level. That jump permits other members to also change their next quote to catch up with the establishing consensus and the level change exhibits serial correlation as members converge on that true level.

In practice that would lead to “sticky” quotes on the aggregate coupled with jumps to new levels with the jumps being autocorrelated. And that distorted process, and its interference in the transfer of policy interest rates to market interest rates, should be exhibited in the data. We should be able to find the sticky fingerprints in the LIBOR series iteself.


Net Estimated Sentiment from Twitter vs Actual S&P 500 Activity

by Graham Giller June 24, 2012 20:20

Been too busy to do this for a while, and also currently too sun-burned to get into a proper analysis, but here is a side-by-side comparison of the accumulated net sentiment inferred from S&P 400 Tweets (by hand classification) and the actual moves of the S&P 500. Right analysis is a cross-correlation function — will have to wait for that…


Powered by BlogEngine.NET
Theme by Mads Kristensen | Modified by Mooglegiant


Comment RSS

About the Author

Graham Giller - Headshot GRAHAM GILLER
Dr. Giller holds a doctorate from Oxford University in experimental elementary particle physics. His field of research was statistical astronomy using high energy cosmic rays. After leaving Oxford, he worked in the Process Driven Trading Group at Morgan Stanley, as a strategy researcher and portfolio manager. He then ran a CTA/CPO firm which concentrated on trading eurodollar futures using statistical models. From 2004, he has managed a private family investment office. In 2009, he joined a California based hedge fund startup, concentrating on high frequency alpha and volatility forecasting. My updated resume is on LinkedIn.



Nothing on this site should be construed as a reccommendation to buy or sell any specific security nor as a solicitation of an order to buy or sell any specific security. Before making any trade for any reason you should consult your own financial advisor. The author may hold long or short positions in any of the securities discussed either before or after publication of an article mentioning such a security.

Copyright Notice

All post on this blog are © Copyright property of Giller Investments (New Jersey), LLC. All comments are the property of their respective authors and neither the author or this blog nor any entity associated with him are responsible for or accept any responsibility for their content. Offensive comments and spam may be removed at the authors discretion.

Data provided on this blog or through links to this blog are either property of Giller Investments (New Jersey), LLC or publicly available or derived from data that is publically available. Any data that is proprietary to Giller Investments (New Jersey), LLC is published here for the public interest and may be reproduced for private research or in public forums provided that suitable attribution and acknowledgement of ownership is made.

Privacy Policy

We use third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.