My name is Campbell Taylor. I am a rising senior and a Statistics major at The Ohio State University. Through my first few weeks as a Quantitative Research Intern for Social Market Analytics I’ve been exposed to alternative data and its applications in the financial market. In this research, I created a day trading strategy built around changes in sentiment on twitter.
Social Market Analytics (SMA) captures unstructured data through alternative sources such as Twitter. Using unique Natural Language Processing sentiment analysis, SMA rates tweets in real time and create metrics that enhance insights into equities’ market movements.
Sentiment factors used for this analysis are distributed through SMA’s S-Factor feed. The factors are:
A large S-Score (> 2) is associated with extreme positive sentiment on Twitter, while a small S-Score (< -2) is associated with extreme negative sentiment. The same applies for SV- Score. S-Buzz ranges from 0 to 4.5, with 1 being the statistical mean. The goal of this research is to use these sentiment factors overlaid with pricing momentum to develop a profitable daily trading strategy.
The momentum used for this research is defined as the following:
This isolates the pricing momentum to strictly overnight movement. Similarly, I used the differences in sentiment to capture the overnight sentiment changes. The two sentiment timestamps are taken at 9:10 AM EST of the current trading day and 3:40 PM EST of the previous trading day, both 20 minutes prior to market open and close. Subtracting the previous day’s closing sentiment from the current day’s opening sentiment isolates the overnight sentiment change.The target return can be defined as:
A popular trading strategy is buying securities with rising momentum and selling them when the momentum has appeared to be exhausted. My original hypothesis was that positive overnight sentiment movements would enhance the overnight pricing momentum. More specifically: a positive S-Score difference (increased positivity), SV-score difference (increased volume of tweets), or S-Buzz difference (more unusual activity) would lead to the positive momentum continuing until close of the trading day, and vis versa for negativity.
To find which sentiment movement was most significant in predicting returns, I built a logistic regression model. This models the probability of a discrete outcome given the input variables. In this case, the probability of positive open to close returns given the various overnight sentiment changes and the overnight pricing momentum. The idea being parameters that increase the probability of positive returns will create a trading strategy that would be more profitable than the market over time.
Before selecting the model, I checked the distribution of the variables to ensure there was no abnormalities in their distribution. The histograms below show the following distributions (left to right, up to down): Overnight Difference in S-Score, Overnight Difference in SV-Score, Overnight Difference in S-Buzz, Overnight Pricing Momentum, and Open to Close Returns. All the variables appear to be roughly normally distributed, which is beneficial for statistical modeling and taking the tails of the distribution.
Using the four remaining variables, I used a stepwise information criteria method to aid the selection of the best parameters for trading. The information criteria measure the model’s performance while considering the number of parameters used. To my surprise, the model showed that positive (negative) overnight momentum decreased (increased) the probability of positive open to close returns for the next trading day. Additionally, overnight momentum was more significant as a factor variable rather than a continuous variable. Meaning the sign of the momentum is more important than the magnitude of the momentum. Each of the sentiment changes were significant in predicting the return. A positive difference in S-Score and S-Buzz increased the probability of positive returns, while a positive difference in SV-score lowered it. While each variable is significant, it is important to consider the number of stocks that will pass the conditions of all parameters when trading. Very few daily stocks will satisfy all 3 specific sentiment parameters, which will lead to a large variance in results. Thus, it makes sense to narrow the model to one S-Factor variable. Selecting the S-Factor variable that has the most occurrences of extreme changes will give the most robust results. The difference in S-Score had more than double the number of stocks with extreme changes than SV-Score and S-Buzz. Therefore, the final trading strategy will be built around the difference in S-Score overlaid with overnight pricing momentum.
Since the difference in S-Score is a continuous variable that follows a normal distribution, I only wanted to trade on stocks with extreme overnight changes. I defined extreme changes as above 2 and below -2, like the S-Score variable itself. A difference in S-Score over 2 indicates there is an extreme increase in sentiment surrounding that stock on Twitter. Similarly, a difference in S-Score below -2 indicates an extreme decrease in sentiment.
I calculated the cumulative returns of 4 different trading strategies and the S&P 500 ETF trust (SPY) as the benchmark for the general market. Each of the strategies enter at market open and exit at market close with an equal weight placed on each stock. Two of the strategies will be long positions and two of them will be theoretical short positions. The long positions have parameters that increase the probability of positive returns, while the short positions have parameters that lower it.
The Long positions:
The theoretical Short positions:
Trading with these 4 different strategies quantifies the effect that sentiment movement has on the overnight momentum. I expected the two long positions to give positive cumulative returns and the short positions to have negative cumulative returns. Based on the model, the long position with sentiment should give the highest returns while the short position with sentiment should give the lowest returns. Before calculating returns, I looked at the number of trades per day in the strategies with sentiment to ensure the trades won’t be too heavily weighted on one stock (top-down).
The x-axis of the histograms shows the number of trades made in a day, while the y-axis shows the number of days with that number. Both distributions suggest there will be some volatility in the number of trades per day. However, the mean and median number of trades for both strategies are high enough to ensure diversity for many of the days. There will be days where there are less than 10 trades, but those will be less than 15% of the trading days in a 10-year span. Therefore, the low volume days will be spread out and not affect the strength of the results. The average also isn’t too high to the point where it is impossible to execute the trades at the markets open. Knowing the number of trades was solid, I used these strategies to trade from December 1st 2011 to June 3rd 2022.
The time series graph shows the cumulative return of the strategies over time. Between April and June of 2020 there is a sharp increase in returns for the negative momentum with sentiment increase strategy. The abnormality can be attributed to the market condition following the beginning of lockdowns for the COVID-19 Pandemic. While markets were turbulent during this time, the long position with sentiment performed very well. Overnight sentiment movement had a significant impact on the pricing momentum. The long positions both gave positive cumulative returns, and the theoretical short positions gave negative cumulative returns. As the model suggested, trading stocks that had negative momentum with an extreme increase in sentiment gave the best returns. This strategy produced a cumulative return over 1400% in the 10-year time frame. The Sharpe and Sortino ratios suggest that the above-average returns are worth the potential volatility of this strategy. A Sharpe above 1 and Sortino above 2 are considered good for a portfolio. For the long positions, adding sentiment movement increased the Annualized Return by nearly 12%. While the effects were not as strong, adding sentiment decreased the Annualized Return of the short position by close to 7%. I then looked at how this strategy has performed since start of 2020.
The jump at the beginning is also during the lockdowns of the COVID-19 Pandemic. Each of the strategies jumped further in the direction the model predicted during this time. This time series graph follows the same behavior as the 10-year trend. The impact of the negative sentiment change on the positive momentum is more evident on this plot. Recently, the long position with sentiment strategy has performed even better than over the 10-year period. While maintaining strong Sharpe and Sortino ratios, the annualized return climbed to nearly 41%. Trading with this strategy would have given a 140% cumulative return since the first trading day of 2020. The short position with sentiment strategy also performed better in this time period. The negative overnight sentiment lowered the annualized returns by nearly 8%. Trading on the long/short positions with sentiment has been an effective trading strategy over time and shows no signs of slowing down.
The limitation with this strategy is the opening of the market being used as a part of the overnight momentum calculation and as the entry point for the trade. Therefore, there will be a delay in executing the trade. In practice this results in adding 5 cents to the opening price for the long positions and removing 5 cents to the opening price for the short positions. The returns will be a bit smaller than the ones calculated but will be very close.
Stocks have generally shown to revert to their mean following overnight movement. Adding sentiment changes appears to enhance the probability and magnitude of reversion. That is why trading on stocks where the overnight sentiment contradicts the overnight pricing momentum is a very profitable strategy. Following this strategy also removes holding stocks overnight where there is risk of news and events breaking after the market close. This research also exemplifies the predictive power of the S-Factors from Social Market Analytics. The overnight S-Score movement proved to have a significant impact on the open to close returns. Capturing the sentiment movement allows traders to identify securities where the price has not yet followed the direction of the public opinion.