Correlations are a very powerful statistical tool in the realm of finance and equity markets. Correlations can explain in a single value(correlation coefficient) to what degree that variable A will increase/decrease in the event that variable B has increased by say 50%. A perfectly positive correlation between variable A and B would have Variable A make an equal 50% increase while a perfectly negative correlation would see variable A move down 50%.

Now google has a platform called Google Correlate which allows you to type in a keyword and generate keyword search results that have the most positive search volume correlations. So it came to me, if you could translate S&P 500 close activity data so that google correlate could understand this as normalized search data, I could then have a way to identify keywords where the search volume movements are correlated with movements in the S&P 500. Below are the findings:

SPY_HowToInvestMoney_Correlation
SPY_HowToInvestMoney_Correlation

Google Correlate was able to identify the keyword phrase “How to Invest Money” as having the third highest positive correlation with movements in the S&P 500. (I’ll expand later about why I chose the third highest correlation and not the first.) With a correlation coefficient of 0.74 this would mean for every 1% increase in keyword searches for “How to Invest Money” you should expect the S&P500 to increase 0.75% over that same week.

I would stop short of saying that the correlation can be trusted whole heartedly as there are data mining elements at hand(data goes back only 10 years as well). I’d imagine as the aggregation of search data gets larger and larger that this platform will only become more robust.

I would like to address why the keyword with the third greatest correlation was used instead of the greatest correlated keyword. Some of it has to do with using subjective judgment but more so to do with recognizing that data mining elements are at hand. Google Correlate is very efficient at identifying out of all searches conducted since 2004, the few that had similar patterns and degree of changes as movements in the S&P500. Is this just coincidence? It is very possible, and we should understand correlation coefficients are not linear. That over time they can change, either increasing or decreasing. Let’s say that the correlation coefficient between keyword searches in Apple and IPOD 1 is 1. This would mean that if searches for Apple increased 5% then it would be expected that searches for IPOD 1 should increase by 5%. But what happens when the IPOD 1 becomes obsolete; searches for the term will decrease and thus the correlation between the two terms will begin to break down, this is why having datasets over long periods of time is important.

I hope that this finding proves useful for some of you! I’d appreciate feedback if you have any and please feel free to subscribe to Trendvesting!

Happy Trading!

Advertisements