Net Estimated Sentiment from Twitter vs Actual S&P 500 Activity

by Graham Giller June 24, 2012 20:20

Been too busy to do this for a while, and also currently too sun-burned to get into a proper analysis, but here is a side-by-side comparison of the accumulated net sentiment inferred from S&P 400 Tweets (by hand classification) and the actual moves of the S&P 500. Right analysis is a cross-correlation function — will have to wait for that…


Average Sentiment vs Time of Day

by Graham Giller May 22, 2012 22:42

This is a simple non-parametric regression of the sentiment estimated from hand classified S&P 1500 tweets versus the time of day. Wonder if the features (bullish pre-open, sagging mid-afternoon) fade with more evenly balanced market moves?


Crude Exposition of the Properties of Sentiment Inferred from Twitter

by Graham Giller May 17, 2012 22:31

On the left, the autocorrelation of hand-classified S&P 400 tweets that have non-neutral sentiment. On the right a histogram of the calendar time intervals between those tweets. From this I'm concluding — but I'm not suprised by this — that sentiment trends. The real factor will be the cross-correlation function between net sentiment and market returns. That's going to require more data before we can examine it properly.


First Classification of S&P 400 Tweets

by Graham Giller May 15, 2012 21:31

Hand classified into the set {Bullish, Bearish, Unknown}. This is to build a training set for a classifier (so I don't have to hand classify the millions that are streaming in). Here's an interesting chart of accumulated net sentiment:


Building a Training Set

by Graham Giller May 14, 2012 20:52
After some playing around on the internet (see for example I decided I was going to have to build my own training set to classify tweets as bullish, bearish, or neutral. (The above example classifies "$AAPL going to the moon!" as "neutral" — leaving us to wonder what's positive sentiment — Jupiter?). To ease this task a built a quick application in python that links up pyodbc and tkinter to present the user with a simple dialog to classify tweets. Here it is: (2.24 kb). This one has comments!  

Elementary Statistics on Twitter Users

by Graham Giller May 11, 2012 10:41

I have a few days capture of the tweet stream for users who mention S&P 400 stocks (the API will only accept 400 individual keywords) and do it including the $ symbol to indicate a ticker. e.g. $AAPL not AAPL. This is important to exclude ticker symbols that are subsets of regular words. Let's start by looking at the users:

This graph is a histogram of the age of the tweeter account (meaning the number of days from the account creation time to the tweet timestamp) in years and a zoom in days. We see three prominent features:

  1. A sharp peak around 1 day;
  2. A broad peak around 3 years; and,
  3. An apparently triangular underlying distribution.

The first and last of these features are easy to explain.

For the first: twitter is a system which is plagued by spam. In a way you could think of it as a white list for spam since, as a user, when I follow a user I am telling people to send me information that they think is interesting to me. Clearly, a simple naive spambot model is to create an new account and immediately send a lot of spam tweets from it. Presumably, Twitter Inc.'s response to this is to identify the spambot account via well known techniques such as Naive Bayes Classifiers etc. and close down the account. This would lead to an excess of accounts with very short ages — the maximum age being the surveillance frequency of the Twitter anti-spambot's activity. Formally, we should do more analysis to investigate this hypothesis — but it seems fairly clear.

The third feature can be explained by the following stochastic model:

  1. New users create accounts at an approximately constant rate λ;
  2. Each user creates tweets at another apparently constant rate μ.

It is straight-forward to see with this dynamic that the distribution of tweeter account ages that appear in this sample will naturally be of a right triangular form, just as we see.

This leaves an apparent strong peak of users who tweet about the S&P 400 and have done so for around three years. So what happened three years ago?


Powered by BlogEngine.NET
Theme by Mads Kristensen | Modified by Mooglegiant


Comment RSS

About the Author

Graham Giller - Headshot GRAHAM GILLER
Dr. Giller holds a doctorate from Oxford University in experimental elementary particle physics. His field of research was statistical astronomy using high energy cosmic rays. After leaving Oxford, he worked in the Process Driven Trading Group at Morgan Stanley, as a strategy researcher and portfolio manager. He then ran a CTA/CPO firm which concentrated on trading eurodollar futures using statistical models. From 2004, he has managed a private family investment office. In 2009, he joined a California based hedge fund startup, concentrating on high frequency alpha and volatility forecasting. My updated resume is on LinkedIn.



Nothing on this site should be construed as a reccommendation to buy or sell any specific security nor as a solicitation of an order to buy or sell any specific security. Before making any trade for any reason you should consult your own financial advisor. The author may hold long or short positions in any of the securities discussed either before or after publication of an article mentioning such a security.

Copyright Notice

All post on this blog are © Copyright property of Giller Investments (New Jersey), LLC. All comments are the property of their respective authors and neither the author or this blog nor any entity associated with him are responsible for or accept any responsibility for their content. Offensive comments and spam may be removed at the authors discretion.

Data provided on this blog or through links to this blog are either property of Giller Investments (New Jersey), LLC or publicly available or derived from data that is publically available. Any data that is proprietary to Giller Investments (New Jersey), LLC is published here for the public interest and may be reproduced for private research or in public forums provided that suitable attribution and acknowledgement of ownership is made.

Privacy Policy

We use third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.