Alpha in SMA Machine Readable Filings

September 7, 2022
SMA Team

This blog is a follow-up to our research paper “Machine Readable Filings (MRF) Word Count Alpha” which is an extension of Harvard Lazy Prices. This blog focuses on word count, sentiment factors and changes in those factors associated with regulatory filings.

SMA partnered with S&P Global Market Intelligence to provide textual data in U.S. SEC filings organized by headings with textual data underneath (i.e. Parts, Items). Textual data is parsed to create historical baselines for 10-Ks, 10-Qs, and other regulatory filings. There are 20 filing types in the MRF product; this paper analyzes 10-Ks and 10-Qs to focus on quarterly changes.

Subscribers of the MRF dataset can create derivative metrics stemming from the seven factors. For instance, one metric explored in this paper is Sentiment per Word. That factor is calculated by dividing Sentiment Sum by Word Count. This and other derivative factors are calculated to normalize sentiment based on document length.


This analysis looks at the Quarter-over-Quarter changes in regulatory filings. Each 10-K and 10-Q is compared to the most recent 10-K and 10-Q from the same company. The Percent Change in Word Count is the difference between the word count in two filings divided by the word count of the previous filing.

The Universe used for this analysis includes all securities over $5. The benchmark used, called ‘Universe’, is the average return of all stocks in any Quintile portfolio at that point in time. The analysis begins in 2007 and concludes in July 2022.

When computing calendar-time portfolio returns, stocks are selected into buckets depending on the factor or change in that factor. Stocks enter the portfolio on the last market day of the month the report was released. Portfolios are rebalanced monthly to introduce new filings submitted in the most recent month.


The graphs and metrics below are calendar-time portfolio returns. Quintile 1 contains stocks with the lowest factor value while Quintile 5 encompasses stocks with the highest factor value.

The graph and table above exemplify how Percentage Change in Word Count can enhance stock selection. Percentage Change in Word Count is calculated by comparing the current document’s Word Count to the Word Count from the same company’s most recent document of that same type (10-Ks are compared to 10-Ks and 10-Qs are compared to 10-Qs).

The green line represents securities with the largest increase in Word Count, which averages to an increase of 29.05% words. The red line denotes securities with the largest decrease in Word Count, which is an average of 16.8% decrease in word count. This outperforms all other quintiles while Quintile 5 underperforms all other quintiles.

As filings become longer compared to the company’s previous filing, returns tend to drop relative to the universe. Regulatory filings warn investors about the company’s future proceedings and risks associated. Typically, companies exclude information that is not required. If there are more words in a document, it means there are more potential liabilities, or the company is over-explaining a facet of the business.

As filings become more concise, subsequent stock returns outperform the universe. Regulatory filings will shrink in size if outstanding issues or risks have been resolved. Companies will remove information that is no longer relevant to the period of the report.

The difference in monthly returns between the two lines (Q1 – Q5) produces a hypothetical Long/Short. This portfolio has a T-Statistic of 3.85 and is proven significant at a 95% confidence level. The slow, steady increase of the Long/Short shows limited risk with the Sharpe Ratio being significantly higher than all other Portfolios.

The next metric explored is Sentiment per Word. Sentiment is calculated using Social Market Analytics’ patented sentiment dictionary. To calculate Sentiment per Word, we divide Sentiment Sum by Word Count to normalize sentiment by the length of the document. If a company has more words in its document, it is likely to have a more extreme Sentiment Sum compared to shorter documents.

In this analysis, Quintile 1 underperforms the Universe; this portfolio contains documents that have negative Sentiment per Word. Quintile 5, which contains stocks with extremely positive Sentiment per Word, outperforms the rest of the universe. As we expect, companies that have good news and talk about topics in a positive manner tend to have better price returns compared to companies with a negative tone in their documents.

SMA’s Machine Readable Filings (MRF) product has insightful, unique information on the structure and sentiment of SEC regulatory filings. This dataset provides you with metrics drilled down by Item and can be used in a variety of Long-term strategies.

If you are interested in learning more about how SMA’s MRF product can help your trading strategies, please email us at or schedule a demo using this link.

©2022 - Context Analytcs | All right reserved | Terms and conditions