Narrative AP
Narrative AP
May 3, 2023
Abstract
We estimate a narrative factor pricing model from news text of The Wall Street Journal. Our
empirical method integrates topic modeling (LDA), latent factor analysis (IPCA), and variable
selection (group lasso). Narrative factors achieve higher out-of-sample Sharpe ratios and smaller
pricing errors than standard characteristic-based factor models and predict future investment op-
portunities in a manner consistent with the ICAPM. We derive an interpretation of the estimated
risk factors from narratives in the underlying article text. (JEL C38, C52, G11, G12)
*
We are grateful for comments and suggestions from Tarun Ramadorai (the editor) and two anonymous referees;
several discussants, including Hui Chen, Diego Garcı́a, Ryan Israelsen, Ben Matthies, Maximilian Rohrer, Qian Yang,
and Dexin Zhou; and audience participants at EFA, Future of Financial Information, GSU-RFS, Holden Memorial
Conference, JHU Carey, Kepos Capital, MFA, News and Finance Conference, Northeastern Finance Conference, UConn
Finance Conference, and Yale SOM. AQR Capital Management is a global investment management firm, which may
or may not apply similar investment techniques or methods of analysis as described herein. The views expressed here
are those of the authors and not necessarily those of AQR. Send correspondence to Bryan Kelly, [email protected].
1
A central premise of asset pricing is that differences in expected returns stem from differences
in risk exposures, but what are the fundamental risks that investors care about? According to
Merton’s (1973) Intertemporal Capital Asset Pricing Model (ICAPM), risk is tied to news about
“state variables” that track investors’ wealth and forecast changes in future investment opportuni-
ties. Because the state variables determine optimal current consumption, their shocks constitute
fundamental risks for the investor/consumer, and an asset’s covariances with these shocks dictate
The identity of potential ICAPM state variables has remained largely conceptual because of
the limited success of empirical efforts to isolate interpretable risk factors. Some attempts propose
macroeconomic variables as proxies for ICAPM state variables.1 The main competing modeling
framework uses statistical factor models based on characteristic-sorted stock portfolios. Statistical
factor models tend to perform better than empirical ICAPM models in explaining covariances and
risk premiums of “anomaly” portfolios and other assets, but have the drawback of being detached
In the ICAPM theory, state variables summarize an investor’s information set and vary as she
acquires unexpected new information. The state variables consist of the marketable (e.g., the market
portfolio) and nonmarketable (e.g., human capital and real estate) portions of wealth, as well as
is common in existing work to infer investor expectations via predictive vector autoregression (VAR)
Our paper has two primary contributions. First, we attempt to narrow the gap between ICAPM
theory and empirics by introducing additional data from news text. We are motivated by two poten-
tial advantages of using business news data in place of more standard macroeconomic data. News
is released more continuously and is likely more timely than low frequency numerical macroeco-
nomic data.3 This presents an opportunity to measure covariances between assets and macroeco-
1
For example, Chen, Roll, and Ross (1986), Cochrane (1996), Bali and Engle (2010), and Rossi and Timmermann
(2015) use macroeconomic indicators, such as industrial production, investment, and inflation, to proxy for the state
variables.
2
Examples include Fama and French (1996), Fama and French (2016), Hou, Xue, and Zhang (2015), Kelly, Pruitt,
and Su (2019), and Lettau and Pelger (2020).
3
For example, Kelly, Manela, and Moreira (2021) show that WSJ text successfully forecasts and nowcasts official
macroeconomic data releases.
2
nomic shocks more accurately (by using daily rather than monthly or quarterly data) and more
synchronously (news text arrival is more likely to be concurrent with updates to the market’s infor-
mation set). Second, information in news text enjoys the richness of narrative; that is, it is derived
from the sophisticated process of human understanding of complex contexts. News articles may
contain information that is more accessible to investors because the hard cognitive work—inferring
causes of business events and predicting their subsequent effects—is partially done by the journal-
ists. Presumably, such information is in high demand by investors, thus business news outlets have
hypothesize that narratives (a) can proxy for shocks to marketable and nonmarketable wealth that
might be poorly measured by market returns (i.e., the critique of Roll, 1977) and (b) have a mean-
ingful forward-looking component that helps forecast future investment opportunities and can thus
These potential benefits of using news text are accompanied by a number of empirical challenges as
well. News outlets face incentives to produce articles that are sensationalized or biased, or irrelevant
for asset pricing (Mullainathan and Shleifer, 2005; Gentzkow and Shapiro, 2010). This can obscure
the information content of news that is useful for modeling asset prices. Beyond biased and irrelevant
news, the inherent intricacies of natural language present a challenge to extracting and quantifying
Our second contribution is proposing a method for incorporating news text into an ICAPM factor
pricing model that addresses the challenges of working with text data. Our empirical approach has
three main components. First, we winnow news down to a set of articles that have a comparatively
high likelihood of relevance for asset pricing. While many news sources are available for potential
analysis, we choose to focus on The Wall Street Journal (WSJ) given its specialization in business
news. Of course, the WSJ also produces nonbusiness news, therefore we follow Bybee et al. (2021,
henceforth BKMX) and further filter out articles appearing in sections other than the three core
business sections (“Section One,” “Marketplace,” and “Money and Investing”) as well as articles
with subject tags corresponding to predominantly nonbusiness content, such as sports, leisure, and
arts.
The crux of the empirical problem is distilling a parsimonious set of risk factors (and eventu-
ally a pricing kernel) from the vast amount of textual data. Our empirical approach tackles this
3
problem using the textual dimension reduction technique of latent Dirichlet allocation (LDA) to au-
tomatically group terms (unigrams and bigrams) into interpretable narrative themes based on their
co-occurrences in news articles (following the LDA analysis in BKMX).4 The LDA model consists of
180 topics and is chosen as the statistically optimal specification according to a Bayes factor criterion.
Many of these topics correspond to important investor issues, such as “Recession,” “U.S. Senate,”
“Economic growth,” and “Federal Reserve.” LDA estimates the attention that WSJ allocates to
narratives for each topic on a given day, and we use the time series of allocations as candidate state
variables in an ICAPM. LDA also estimates the term composition for each topic, thereby providing
The third component of our empirical design estimates a mapping from the 180 narrative atten-
tion series into a small number of common asset pricing risk factors using instrumented principal
component analysis (IPCA; Kelly, Pruitt, and Su, 2017). IPCA is estimated from an economically
motivated criterion. It searches for tradable mimicking portfolios of the state variables that best fit
realized individual stock returns, much like the two-step regression approach of Fama and MacBeth
(1973). However, Fama-MacBeth estimates mimicking portfolios of observable state variables. IPCA
is a dimension reduction method that upgrades the Fama-MacBeth logic to infer a small number of
latent state variables from a large set of candidates (such as the 180 narratives). We introduce a
new penalized version of IPCA called Sparse IPCA that selectively excludes irrelevant or especially
noisy narratives before performing IPCA’s usual dimension-reduced risk factor estimation.
Once the narrative factor model is estimated, we demonstrate its performance as a factor pricing
model. It achieves lower out-of-sample pricing errors for 78 anomaly portfolios compared to the
five Fama-French factors plus momentum (which we refer to as the “FFC6” model). The narrative
factor model delivers an out-of-sample mean-variance efficient (MVE) portfolio with a Sharpe ratio
of 1.3, compared to a Sharpe ratio of 0.8 for the FFC6 MVE portfolio. This result is remarkable
in that narrative factors are formed based on stock covariances with narratives and no other firm
characteristic data.
We then demonstrate that estimated narrative risk factors are consistent with the ICAPM
framework. In particular, the narrative MVE portfolio predicts future market returns, consump-
4
For example, terms like “economic downturn,” “steep decline,” “hardest hit,” and “steep drop,” show up to-
gether in a narrative. The narrative label is manually assigned by summarizing the common theme displayed in the
automatically grouped topic terms. This example is labeled as the “Recession” topic.
4
tion growth, and a host of other macroeconomic indexes. The signs of the predictive relationships
align with ICAPM restrictions; that is, the MVE has a positive risk premium, indicating its asso-
ciation with “good news,” and indeed the MVE positively predicts procyclical indexes (like market
returns and GDP growth) and negatively predicts countercyclical indicators (like credit spreads and
unemployment).
Lastly, we derive an interpretation of the risks extracted from news narratives using the estimated
model. The MVE state variable is theoretically important: it is an estimate of the univariate
pricing kernel that determines asset risk premiums. The estimated MVE is a linear combination
of around a dozen narratives selected from the 180 candidate topics. Our interpretation approach
links these weights to narratives to quantify how changes in topic attention affect the model-implied
MVE portfolio. The “Recession” topic has the largest negative impact on the model-implied MVE,
while “Record high” and “Optimism” are the leading positive narratives. Through the model we
can trace narrative impacts on the MVE not only to topics but also to specific articles (headlines
like “Consumer Confidence Slides on Fears of Layoffs” and “Home Building Continues Recovery as
Demand Rises” induce large changes in the MVE) or individual terms (see the term clouds in Figure
8). We also use this approach to identify the articles associated with the largest market swings
over the past three decades. These articles relate to concrete issues like the chances of a double-dip
We contribute to the new and promising area of research using text to understand asset markets
(see the surveys of Loughran and Mcdonald, 2016; Gentzkow, Kelly, and Taddy, 2019). Early papers
Closer to our paper, Liu and Matthies (2022) show that news text predicts consumption growth
over low frequencies and construct a news-based index to proxy for the pricing kernel in a long-run
risks framework. In contrast, our narratives are estimated with little explicit human input (other
than selecting the news data source), and covariances between stocks and narratives leverage higher
frequency daily data. Engle et al. (2020) identify climate change risk by tracking the fluctuations
in WSJ attention to climate change news and propose a dynamic trading strategy to hedge climate
portfolio construction.
Ke, Kelly, and Xiu (2020), Manela and Moreira (2017), and Kelly, Manela, and Moreira (2021)
5
develop textual machine learning methods to predict stock returns, volatility, and macroeconomic
activity. Jeon, McCurdy, and Zhao (2021) attribute news as a source of jumps in stock returns.
BKMX analyze a topic model for WSJ text and analyze its role in a macroeconomic VAR, and
Lopez-Lira (2020) conducts a topic analysis of 10-K text to measure firms’ risk exposures.
In terms of statistical methods, Sparse IPCA combines the selection and shrinkage functions of
(group) lasso (Tibshirani, 1996; Yuan and Lin, 2006) with latent factor analysis via IPCA (Kelly,
Pruitt, and Su, 2019). It is similar to the Sparse Principal Components Analysis (SPCA) (Zou,
Hastie, and Tibshirani, 2012), which imposes lasso-type regularization on factor loadings. Pelger
and Xiong (2021) impose hard-thresholding on factor loadings and emphasize the improved in-
terpretability from sparse estimates. Sparse IPCA selects instruments according to their effects
on factor loadings, in contrast to the two above that select factor loadings themselves. As an
extension to Fama-MacBeth with regularization, our work is also related to Bryzgalova (2015).
1 Model
Figure 1 summarizes the data-generating process for innovations in narrative attention (z), stock
excess returns (r), as well as the term frequencies of individual news articles (w). The figure has
three parts: The main novelty is connecting state variables to news narratives (x to z). The empirical
goal is to estimate this relationship in order to understand the fundamental risks from the perspective
of news narratives. The return generating process (f to r) is the canonical latent factor model with
time-varying factor loadings. We estimate the narrative-based f for factor pricing tests. The text
data-generating process (θ to w, in the gray box) follows LDA and is treated as a stand-alone data
preprocessing step to prepare narrative innovations (z). Throughout the paper, τ indexes days, and
t indexes months. Since z, x, and r are innovations, they can be accumulated from daily to monthly
frequencies.5
5
The discrete-time state variable is a linear approximation of the continuous-time ICAPM. Write
R the instantaneous
state variable as Xs , which the discrete-time state variable that we use is X’s change: xτ = during day τ dXs , and
R
xt = during month t dXs .
6
Figure 1: Illustration of the data-generating process
word frequencies
Narrative attention innovations (V -dim)
(K-dim)
aggregate articles
wm ∼
zτ = Axτ + ητ by day, and
Mult(Φθm , Nm )
xτ zτ extract innovations
θm wm
(sparse connection)
Mimicking
portfolios Topic Model (LDA)
ft ri,t
τ indexes days, t for months, i for stocks, and m for articles. L is the number of narratives, and V
is the size of the vocabulary.
Let xτ (K × 1 vector) be the ICAPM state variables. It contains the growth in the wealth portfolio
(both the marketable and nonmarketable parts) and revisions to expectations about future invest-
ment opportunities. We do not specify an individual entry of xτ to be the market factor as a proxy
for wealth, but instead allow all news narratives to contribute to it in a unified fashion (estimated
from data). This helps circumvent Roll’s critique if, for example, the human capital component of
wealth is not well proxied by the market factor but reflected in news narratives.6
Let zτ be the innovation of each narrative’s attention on day τ arranged in an L × 1 vector with
L being the number of narrative topics. We assume zτ is related to the K ICAPM state variables
xτ via an L × K matrix A:
zτ = Axτ + ητ , (1)
where ητ represents the part of narrative innovations that is irrelevant to the state variables and the
7
Equation (1) relates news narratives to asset pricing state variables and is the key to our model.
We do not stipulate the causal mechanism for how the two are linked. Our interpretation is that
agents read the news to form assessments of the current state, which in turn guides their asset pricing
Numerical representations of text data are typically high-dimensional. The model is designed
to accommodate this feature by allowing the number of narratives L to be much larger than the
number of asset pricing state variables K. The A matrix summarizes which narratives matter for
asset prices and by how much, and thus it plays a central role in deriving narrative interpretations of
risks factor from our estimated model (discussed in detail in Section 6). Our estimator also supports
row-wise sparsity in A, which allows the estimator to zero-out the effect of narratives that are entirely
We construct narrative attention shocks (zτ ) from LDA estimates of the WSJ from BKMX. We
treat the LDA estimation as a stand-alone data processing step to deliver a numerical representation
of the raw news text. Each narrative attention series is accompanied by an estimate for the term
composition for that narrative. Through these estimates, we can eventually trace our risk factors back
to influential individual articles or individual terms. This plays a valuable role in the interpretation
analysis of Section 6.
The LDA model is becoming a standard tool in textual analysis for finance and economics. We
briefly describe its structure to precisely define narrative attention and refer interested readers to
Blei, Ng, and Jordan (2003) for further details. It begins from a “bag-of-words” representation of
the raw text. The bag-of-words dimensionality is large because the WSJ vocabulary size (denoted
as V ) is over 18,000. LDA searches for a tractable thematic summary of the text with much lower
dimensionality than V . To do so, it imposes a factor structure on term counts and nests this in a
count distribution. In particular, LDA assumes the V -dimensional vector of term counts for a given
wm ∼ Mult(Φθm , Nm ), (2)
where Nm is the sum of all term counts for article m (which governs the scale of the multinomial
distribution). According to (2), expected term counts are summarized by a comparatively low
8
dimension set of parameters, θm and Φ = [ϕ1 , ..., ϕL ]′ . The lth “topic” is a V -dimensional parameter
P
vector ϕl , where ϕl,v ≥ 0 for all v and v ϕl,v = 1. That is, a news topic is a probability distribution
that defines the term composition of the topic. Terms with especially high probabilities in ϕl convey
the topic’s thematic content. The model’s dimension reduction is achieved by setting the total
Finally, narrative attention corresponds to article-specific parameter vector θm = (θm,1 , ..., θm,L )′ .
It is also a probability vector and describes article m’s allocation of attention across topics. We
aggregate article attention to a daily attention measure by averaging attention for all articles on day
1 X
θτ = P Nm θm .
m∈τ Nm m∈τ
The vector θτ is our main narrative attention time series. We calculate attention shocks, zτ , as
Next, we describe the model component for the cross-section of returns. Let fτ be the projection of
the state variables, xτ , onto the space of excess returns. We refer to the state-mimicking portfolios fτ
as systematic risk factors. The projection residual, ντ := xτ − fτ , is orthogonal to all excess returns.
zτ = Afτ + gτ , (3)
where gτ := Aντ +ητ is the composite residual of zτ . We have assumed both νt and ητ are uncorrelated
ICAPM theory implies that the intercept is zero (i.e., there is no α) and that ϵi,t is mean zero and
orthogonal to ft .
9
Risk exposures, βi,t , are allowed to vary over time as the firm evolves and economic conditions
change. We assume βi,t is a function of instruments that provide guidance about the asset’s risk
exposures and also assume Covt (ft+1 ) = Σff . These serve as identifying assumptions that enable our
eventual IPCA estimator to recover estimates of time-varying exposures (see Kelly, Pruitt, and Su,
2019). Based on the hypothesis that observed narrative shocks are related to factor risks, we assume
the instruments include the 1 × L vector of covariances between the asset’s return and narrative
attention shocks, covi,t := Covt (ri,t+1 , zt+1 ). Together with the rest of the model structure, this
implies that
Inverting the above expression, we can express βi,t in terms of narrative covariances:
−1
βi,t = covi,t A A⊤ A Σ−1
ff := covi,t Γ,
e (5)
the instrumental mapping from covi,t to βi,t . Equations (4) and (5) embed the return model in the
2 Estimation Method
Next, we turn to the procedure for estimating the model described in Equations (1)–(5). The
1. For each stock i and month t, calculate covariances between ri,τ and zτ from daily data:
! !
X X X
c i,t :=
cov κ(τ ; t)ri,τ zτ⊤ − κ(τ ; t)ri,τ κ(τ ; t)zτ⊤ , (6)
τ τ τ
where cov
c i,t is a 1 × L row vector and κ(τ ; t) is an exponentially decaying weighting function
(kernel) that ends before the last day of month t (see details in Internet Appendix B.1).
2. Append a constant to the covariances to form a set of (L + 1) instruments ci,t := [1, cov
c i,t ], which
10
is supplied to the IPCA model
state-mimicking portfolios {ft } and Γ with Sparse IPCA, defined by the optimization:
L
1X
(ri,t − ci,t−1 Γft )2 + λNS ∥ft ∥22 ,
X X
min σlc ∥Γl ∥2 + (8)
Γ,{ft } 2
i,t∈S l=0 t∈S
q
where S denotes a training set, ∥Γl ∥2 = Γ2l,1 + · · · + Γ2l,K is the Euclidean (L2 ) norm for group
lasso regularization, and λ is the regularization hyperparameter (whose tuning we discuss below).
Then estimate Σff as the sample covariance of ft estimates. Internet Appendix B.2 describes the
3. Wrap-up step. Given estimates for Γ e := [Γ1 ; . . . ; ΓL ] and Σff , calculate the plug-in estimate for A
The first two steps are a generalization of the Fama-MacBeth two-step regression approach to
constructing mimicking portfolios. In a simple case in which each narrative represents a stand-alone
state variable, Fama-MacBeth estimates the mimicking portfolios ft of the observed state variables
(Giglio and Xiu, 2021). Our generalization extends to cases in which the state variables that we
wish to mimic must be reduced and selected from a large set of noisy proxies. The estimated ft are
portfolios of individual stocks formed with narrative covariances as firm-level signals (see detailed
expressions in Internet Appendix B.2). The third step recovers the map from estimated asset pricing
factors back to text narratives. The map plays a central role in our model interpretation analysis.
As discussed in the introduction, some narratives are irrelevant for asset pricing, leaving the
corresponding rows of A with zero entries. Γ inherits row-wise sparsity structure A. If narrative l
is irrelevant, the lth row of both A and Γ will be zero. The Sparse IPCA estimator selects relevant
narratives by inducing row-wise sparsity in Γ estimation, and is achieved through the group lasso
11
The third term in (8) is included for a technical reason. Regularizing ft is necessary to properly
induce sparsity in the Γ estimate by preventing the second term from making Γ infinitesimally small
and ft arbitrarily large. See Internet Appendix B.1 for further detail.
successful factor pricing model should span the global MVE frontier and attain the highest Sharpe
ratio among all excess returns. For a sample panel S, each λ value corresponds to a different estimate
of (Γ, {ft }). For each value of λ, we calculate the training sample Sharpe ratio of the model-implied
q
MVE portfolio as SR(λ; S) := µ b⊤ b −1 bf , where µ
f Σff µ bf and Σ
b ff are the mean and variance of ft in S.
We set λ to its Sharpe ratio maximizing value, λ∗S := arg maxλ SR(λ; S), and in turn parameters Γ, A
estimated under λ∗S are the tuned estimates, which we report and analyze in the next section.8
3 Empirical Results
3.1 Data
Daily LDA-based narrative attention estimates for WSJ come from BKMX. This is based on full arti-
cle text from 1984:01–2017:06 after applying preliminary filters to remove articles about nonbusiness
topics (for details, see BKMX and Internet Appendix A.1). The model uses 180 topics and is selected
based on a Bayes factor criterion (details in Appendix A.3). BKMX manually assigns topic labels
based on automatically generated keyword lists for each topic. Appendix A.4 visualizes the LDA
model estimates including keywords (ϕl ) and daily attention levels (θτ ) for a subset of narratives
that we will later show to be influential in our factor pricing model. Our timing convention defines
θτ as topic attention for the WSJ edition published on the morning of calendar day τ + 1, based on
the view that this reflects market information accrued on day τ and thus synchronizes zτ with asset
returns ri,τ .9 We construct shocks to narrative attention (zτ ) as the deviation of attention level (θτ )
pricing.
8
Note that we select λ based on the training sample rather than a separate tuning/validation sample, because
the tuning criterion (MVE Sharpe ratio) is distinct from the estimation criterion (factor model mean-squared error).
This has the benefit of efficiency by including more data in the training set, but may positively bias in-sample model
performance and negatively bias out-of-sample performance if it introduces in-sample overfit. In Internet Appendix
C.3, we report empirical results based on standard “leave-one-out” cross-validation (LOOCV), which are essentially
unchanged compared to our main analysis.
9
To avoid the potential leak of ex post information after the start of month t into the ex ante portfolio weights of
ft , we calculate cov
c i,t−1 in the window up to the second last day before the start of month t. See Internet Appendix
B.1.
12
1 P5 10
from its 5-day moving average: zτ := θτ − 5 j=1 θτ −j .
Stock return data are from CRSP for firms listed on NYSE, AMEX, and NASDAQ.11 To match
the span of our news data, we use daily stock returns from 1984:01 to 2016:12 to construct monthly
covariances cov
c i,t . After a 1-year burn-in period to prepare the earliest cov
c i,t , the full sample spans
32 years from 1985:01 to 2016:12 and contains 1,850,401 stock-month observations for 15,831 unique
stocks.
Our first empirical results report the overall model fit in the full sample and explore how estimates
change as we vary the degree of regularization. Figure 2 reports the following four full-sample
proportion of realized return variation explained by the factors. It restates the first term in the
4. L2 Norm of each row of Γ, defined as ∥Γl ∥2 , measures each instrument’s marginal contribution
The horizontal axes in Figure 2 begin on the left with unregularized IPCA (λ = 0). As λ increases,
the estimation becomes more heavily penalized, which reduces the model R2 (panel 1), shrinks the
estimates of Γ toward zero (panel 4), and reduces the number of selected narratives (panel 3). Panel
2 shows that the model-implied MVE Sharpe ratio is nonmonotonic in λ. When λ is near zero,
the model overfits the in-sample estimation objective, mechanically inflating the R2 . The overfit
in R2 belies poor model performance and mis-estimation of the systematic risk factors. Overfit is
revealed in the suboptimal MVE Sharpe ratio (which is not mechanically inflated by the estimation
10
In Internet Appendix C.5, we verify that our empirical results are robust to using other shock definitions.
11
We restrict the sample to firms with nonmissing Fama-French characteristics (Fama and French, 2016) to match
the sample underlying the benchmark factors.
13
Figure 2: Full-sample estimates with varying λ values
1.1
0.105
1.0
0.9
0.100
0.8
0.095
0.7
0.6
50 0.4
25 0.2
0.0
0
0.000 0.002 0.004 0.006 0.008 0.010 0.000 0.002 0.004 0.006 0.008 0.010
Reg-Constant (λ) Reg-Constant (λ)
objective). A small amount of regularization begins to combat the effects of overfit and the MVE
Sharpe ratio initially rises. This increase is evidence that regularization is effective in improving
systematic risk estimation. Eventually the MVE Sharpe ratio peaks, corresponding to a balance
point at which the model achieves a good in-sample fit without being overfit. Raising λ beyond
this point leads to a deterioration in both R2 and Sharpe ratio, indicating the region in which the
14
model is too heavily regularized and thus underfit. Next, comparing across specification choices
for the number of latent states K, we find large improvements in MVE Sharpe up to K = 5. At
the Sharpe ratio maximizing penalty, we also see large improvements in R2 up to K = 3. Further
increasing K above 3 still pushes up the full-sample MVE Sharpe ratio, although the improvement
specification going forward (see plots of the estimated factors in Internet Appendix C.1). Panel 4
reports the sensitivity of Γ estimates along the λ path in the benchmark case of K = 3. The optimal
hyperparameter is represented by the vertical dashed line, which is at the peak of the Sharpe ratio
curve (the green “K = 3” curve in panel 2). As λ increases, each instrument’s contribution to βi,t
(measured by ∥Γl ∥2 ) first shrinks toward zero, and eventually drops to zero for sufficiently high λ.
This demonstrates both the shrinkage and selection effects of group lasso and Sparse IPCA. We color
and label the ten most influential narratives in terms of ∥Γl ∥2 , with plots for the remaining narratives
in gray. The top narratives in the legend are closely related to business and economics in general (see
their keyword lists in Figure A.2). To further demonstrate that the selection method can effectively
distinguish relevant and irrelevant instruments, we conduct an experiment with simulated placebo
narratives. In addition to the 180 observed narratives (zτ ), we introduce random placebo narratives
to “confuse” the estimator. The results in Internet Appendix C.2 show that Sparse IPCA effectively
filters out the placebo narratives that we know for certain are irrelevant. Moreover, the estimates of
the remaining selected narratives are largely unaffected by interference from the irrelevant ones.
This section evaluates the asset pricing performance of narrative-based systematic risk factors. Factor
estimates that successfully mimic the true unobservable systematic risks should deliver small pricing
errors (α) for test assets and their MVE combination should achieve a high Sharpe ratio. Our
benchmark factor models include the CAPM, the Fama-French three-factor model (“FF3”), the
Fama-French five-factor model (“FF5”), and the Fama-French-Carhart six-factor model (“FFC6”).
We report the performance of the narrative factor model with up to six factors (“NF1” to “NF6”),
15
4.1 Cross-sectional pricing performance
Mkt 1.11 2.64 0.54 8.56 Mkt 0.42 3.03 0.84 11.61
FF3 0.97 2.65 0.54 8.50 FF3 0.32 3.89 0.84 15.12
FF5 1.18 3.20 0.68 7.48 FF5 0.27 3.31 0.72 13.06
FFC6 1.27 3.43 0.74 7.41 FFC6 0.28 3.40 0.76 12.68
NF1 1.29 3.03 0.73 8.78 NF1 0.35 2.33 0.64 6.56
NF2 0.97 2.72 0.54 7.98 NF2 0.16 1.14 0.16 5.29
NF3 0.84 2.62 0.54 7.84 NF3 0.25 1.95 0.56 5.72
NF4 0.92 2.81 0.55 7.53 NF4 0.23 1.75 0.44 5.58
NF5 0.91 2.78 0.60 7.43 NF5 0.23 1.78 0.48 5.48
NF6 0.96 2.89 0.63 7.38 NF6 0.25 1.91 0.48 5.56
Statistics: (1) avg |α̂a |, cross-sectional average absolute value of the intercepts (a indexes anomaly
test assets); (2) avg |t(α̂a )|, the cross-sectional average absolute value of the intercept t-statistics; (3)
#|t(α̂a )|>1.96
#test assets , the proportion of intercepts statistically significantly different from zero; and (4) GRS,
the GRS statistic for the joint test of all intercepts are zero.
Table 1 reports the cross-sectional asset pricing results with respect to two sets of test assets.
We follow the standard empirical procedure in the factor pricing literature, calculating alphas and
their t-statistics from time series regressions of test asset returns on factors over the full sample. We
also calculate the GRS test statistic for the joint significance of alphas among all test assets as a
model comparison metric. The test assets in panel A are 78 anomaly portfolios constructed as long-
short portfolios of 78 characteristics used in Gu, Kelly, and Xiu (2020), including standard anomaly
characteristics, such as idiosyncratic volatility, accruals, short-term reversal, and so forth.12 Test
assets in panel B are 25 double-sorted portfolios based on size and book-to-market from Kenneth
French’s data library. In general, the narrative factor models yield smaller and less significant pricing
errors than the benchmarks. For example, anomaly portfolios pricing errors from the NF6 model
are on average 24% smaller than those from the FFC6 model, with smaller average alpha t-statistic
magnitudes (2.9 for NF6 vs. 3.4 for FFC6), and fewer significant pricing errors (63% for NFC6 vs.
74% for FFC6). We see similar patterns among size and value portfolios, where NF6 pricing errors
are 11% small than those from FFC6, average t-statistics are 1.9 for NF6 versus 3.4 for FFC6, and
12 P
In detail, the test assets are managed portfolios defined as ra,t := i chara,i,t−1 ri,t , where chara,i,t−1 is the
rank-standardized characteristic a of stock i at time t.
16
only 48% of alphas are significant for NF6 versus 76% for FFC6 (despite the fact that Fama-French
factors are essentially designed to price size and value portfolios). Internet Appendix Figure C.3
displays each anomaly’s pricing errors under FFC6 and NF3 for comparison. For example, NF3 is
effective in pricing anomalies based on realized risk measures, such as betting-against beta (BAB)
and idiosyncratic volatility. Short-term reversal remains the strongest anomaly for both models.
This is understandable as we do not expect the ICAPM mechanism to explain fleeting mispricings
Next, we evaluate the narrative factor models in terms of their mean-variance efficiency. We construct
out-of-sample factors and their MVE as tradable portfolios that mimic narrative state variables. This
analysis focuses on out-of-sample performance to avoid Sharpe ratio inflation for the MVE portfolios
arising from any potential in-sample overfit. The out-of-sample factor estimate for month t + 1 is
" ! #−1
X X
OOS ⊤
ft+1 = β̂i,t β̂i,t + 2IK β̂i,t ri,t+1
i i
on the ft expression, see Internet Appendix B.2). We combine the out-of-sample factor estimates
MVE,OOS b −1 OOS
ft+1 b⊤
=µ f Σff ft+1 ,
where µ
bf and Σ
b ff are the estimated factor means and covariances based on the in-sample factor
MVE series with expanding training
realization in the training sample. We form out-of-sample ft+1
samples. The first out-of-sample realization corresponds to January 2000 (the initial training sample
ends in December 1999). To reduce computation, we retrain the model once each December and
apply these estimates to construct the out-of-sample MVE portfolio for each of the next 12 months.
The result is a 16-year out-of-sample evaluation period for the MVE portfolios. We construct the
out-of-sample MVE portfolio of benchmark Fama-French factors using the same expanding training
scheme for their means and covariances. Figure 3 plots ∥Γl ∥2 across the expanding training samples
17
Figure 3: L2 norm of Γ in expanding training samples
0.6
0.5
Recession
Record high
0.4
Trading activity
Problems
L2 Norm Γl
Earnings forecasts
0.3
Bear/bull market
Earnings
0.2 Optimism
Treasury bonds
Profits
0.1
0.0
The horizontal axis is the end of the expanding training samples. The start is always 1985.
Panel A reports out-of-sample MVE portfolio Sharpe ratios for narrative factor models with up to six
factors. The rows λ = λ∗S correspond to Sparse IPCA and the “# narratives” reports the number of
selected narratives (averaged over all training samples). The rows λ = 0 correspond to unregularized
IPCA and thus all narratives are included without penalization. In panel B, each column reports
the out-of-sample MVE portfolio Sharpe ratio using all factors up to that column (e.g., the column
labeled “RMW” reports the MVE combination of Mkt, SMB, HML, and RMW). The out-of-sample
period is from January 2001 to December 2016.
to illustrate how each topic l’s contribution changes as we vary the training sample. Note that the
identity and magnitude of estimates for each topic are relatively stable over time. The stability of
Γ estimates provides the foundation for constructing robust out-of-sample factor portfolios. Panel
A of Table 2 reports out-of-sample Sharpe ratios for narrative factor model MVE portfolios. We
18
Figure 4: Out-of-sample MVE Sharpe ratio versus λ
Out-of-sample MVE Sharpe ratios for different regularizing constant (λ) values.
present models estimated under the tuned regularization constant (λ = λ∗S ) and the unregularized
(λ = 0) for comparison. The “# narratives” row reports the number of selected narratives averaged
across the expanding training samples. In panel B, each column reports the out-of-sample MVE
portfolio Sharpe ratio using all factors up to that column (e.g., the column labeled “RMW” reports
the MVE combination of Mkt, SMB, HML, and RMW). The MVE Sharpe ratio of narrative factors
dominates that of the benchmark factor for each value of K. At K = 5, both the narrative model
and the benchmark model achieve their highest out-of-sample MVE Sharpe ratios. In this case, the
narrative Sharpe ratio of 1.3 achieves an improvement of roughly 50% over the benchmark. Panel
A also shows that as K increases, the number of narratives selected also increases, indicating that a
larger factor dimension gives the model more capacity to incorporate narratives and better matching
asset price behavior out-of-sample. The results also show the large gains in out-of-sample model
performance due to regularization. Without this, the narrative model MVE is much closer to the
benchmark (i.e., the row labeled λ = 0). Figure 4 is another illustration of the benefits of Sparse IPCA
regularization. It shows out-of-sample MVE Sharpe ratios for the full range of λ values. The steep
increase in Sharpe ratio on the left side of the plot directly quantifies the benefit of regularization.
Furthermore, this plot is very similar to its in-sample analog Figure 2, panel 2, demonstrating the
efficacy of our in-sample tuning scheme for identifying λ values that optimize out-of-sample model
performance. Finally, Figure 5 plots MVE portfolio cumulative returns over the evaluation sample.
19
Figure 5: Out-of-sample MVE cumulative returns
Cumulative returns of the OOS f MVE in different specifications. The left panel fixes K = 3 and
plots the benchmark specification in green, the unregularized in blue, and the market factor in red
for comparison. The right panel always uses regularization but varies K = 1 ∼ 6. In both figures,
the return sequences are standardized with the same standard deviations as the market return.
The first panel compares the MVE return with and without regularization holding K = 3 fixed
(and compared to the equal-weighted excess market return). The second panel compares the MVE
for different specification choices of K. We standardize all return series to have the same volatility
over the evaluation sample. The figures show that the investment performance of the narrative
MVE portfolio is not concentrated in a particular period or driven by a particular event. While
statistical horse races and anomaly pricing tests favor narrative factors, their interpretability and
ICAPM properties (discussed in detail below) are further economic reasons to favor the narrative-
based model over the benchmarks. As emphasized by Cochrane (2009), “It is probably not a good
idea to evaluate economically interesting models with statistical horse races against models that use
portfolio returns as factors. . . . one should always ask of a factor model, ‘what is the compelling
economic story that restricts the range of factors used?’ ... If the purpose of the model is not just to
predict asset prices but also to explain them, this puts an additional burden on economic motivation
20
Table 3: Sharpe ratios of MVE portfolios combining FFC6 and NF
K (number of NF added)
Specification 0 1 2 3 4 5 6
NF 0.48 1.00 1.10 1.26 1.32 1.31
NF + Mkt 0.25 0.48 1.00 1.08 1.26 1.32 1.31
NF + FF3 0.36 0.41 0.60 0.68 1.02 1.17 0.98
NF + FF5 0.90 0.89 0.94 0.92 1.01 1.01 1.08
NF + FFC6 0.67 0.65 0.76 0.80 0.87 0.92 1.19
The first row (only NF, no FFC6) repeats the numbers from Table 2, panel A, Line “λ = λ∗S .” The
first column (only FFC6, no NF) repeats the numbers from Table 2, panel B. The rest of the entries
are new results combining K NF’s and a subset of factors from FFC6.
4.3 Robustness
We conduct additional sensitivity analyses for narrative factor model performance. First, we examine
the effect of replacing news-based topics with a set of 129 numerical macroeconomic data series from
FRED-MD as candidates for the latent state variables. The results are reported in panel B of Internet
Appendix Table C.3 (which can be compared to our main results restated in panel A of Table C.3 for
ease of reference). Model performance deteriorates when we replace narrative data with FRED-MD
data. The out-of-sample MVE Sharpe ratio drops from 1.3 to 0.7. Second, we conduct robustness
checks for our construction of narrative attention innovations. Instead of the 5-day trailing moving
average (zτ := θτ − 15 5ι=1 θτ −ι ), the alternatives include 1-day, 3-day, and 20-day moving averages
P
(Table C.3, panel C). In these cases, the out-of-sample MVE Sharpe ratios remain well above one
and the cross-sectional pricing errors are as good as in our main analysis. In the third exercise, we
examine the characteristics of stocks with high and low exposures to the narrative factors. We regress
narrative β’s on the traditional firm characteristics across the stock-month panel. The correspondence
is strikingly high (the R2 is around 40%), although the two are from drastically different information
sources, namely, textual news versus CRSP/Compustat. Stocks with high β’s to the narrative-
based pricing kernel, ftMVE is associated with anomalous characteristics known to be related to high
average returns. For example, they tend to have smaller sizes, higher momentum, lower market
beta, and higher dividend-to-price ratios. See Internet Appendix C.6 for the regression results and
more detailed analysis. Lastly, instead of running horse races between separate MVE portfolios
based on either narrative factors or characteristics-sorted portfolios, we analyze MVE portfolios that
21
combine the two sets of factors in a joint model. To do so, we reestimate each narrative factor model
(with K = 1 to 6) controlling for various combinations of the benchmark factors (more details in
Internet Appendix C.7). We find that adding narrative factors to the benchmark factors produces
a large economic improvement in the out-of-sample MVE Sharpe ratio (see, e.g., the last row of
Table 3). The reverse is not true. If we begin from the NF6 specification and gradually introduce
the benchmark FFC6 factors into the model, the out-of-sample MVE Sharpe ratio either does not
We introduce news narratives in a factor pricing model in hopes of bringing empirical asset pricing
closer to its theoretical underpinnings of macroeconomic risk. We adopt the ICAPM framework
by hypothesizing that narratives have a meaningful forward-looking component that helps forecast
future investment opportunities and track investor wealth. Given the narrative-based state variables
estimated above, we loop back to test this hypothesis by examining the forecasting power of our
estimated state variables, particularly the MVE combination of the narrative attention measures
(which is an estimate of the model’s pricing kernel). We run long-term predictive regressions of dif-
ferent macroeconomic variables onto the estimated MVE state variable (xMVE
t ). The macroeconomic
variables include market return, inflation, interest rates, credit spreads, and growth in consumption,
GDP, employment, payroll, and housing from FRED-MD (McCracken and Ng, 2016). For each
forecast target, we predict the cumulative changes over different horizons (h):
h
(h)
X
ψt+s = bh xMVE /std(xMVE
t t ) + errort . (9)
s=1
On the left-hand side, ψt denotes the one-month change in a macroeconomic variable, such as ψt =
GDP growtht , and the summation takes the cumulative change in the future horizon of h months. We
standardize xMVE
t so that the coefficient bh can be interpreted as the effect per one-standard-deviation
change in the state variable. The ICAPM not only implies the existence of predictive relationships
between state variables and future outcomes but also imposes sign restrictions. A pricing kernel
(xMVE ) should be procyclical, meaning it should rise with “good news,” positively correlate with
contemporaneous consumption growth, and positively predict future investment opportunities as well
22
Figure 6: News-based pricing kernel (xMVE ) forecasts future macroeconomic variables
Nonfarm payroll growth Unemployment rate change (%) Housing starts growth
Baa-Fed funds spread (%) 10-year Treasury yield change (%) CPI change
Each figure reports the estimation results of predictive regression (9) for a macroeconomic variable.
The red line plots coefficient (bh ) estimates, and the shaded band represents the 90% confidence
interval. We use the Newey-West standard errors computed with h lags to account for the autocor-
relation.
as future cash flow of investor wealth. Our working hypothesis is thus that xMVE
t positively predicts
market returns, inflation, interest rates, and growth in consumption, GDP, payroll, and housing, and
that it negatively predicts credit spreads and changes in unemployment.13 Figure 6 reports predictive
regression results. Each panel corresponds to a different prediction target, the horizontal axis is the
prediction horizon h from 1 to 24 months, the vertical axis is the estimated coefficient bh , and the
shaded area marks the 90% confidence interval. The sign of the predictive associations between xMVE
t
with future economic outcomes is consistent with the ICAPM in every regression, though the result is
13
Our analysis is similar to that of Maio and Santa-Clara (2012) and Liu and Matthies (2022), who analyze state
variables constructed by other means.
23
not statistically significant in all cases. The first panel implies that a one-standard-deviation increase
in xMVE
t is associated with a 1.2-percentage-point increase in market value over the subsequent 24
months (the estimate is borderline insignificant at most horizons). The most statistically significant
effects correspond to longer-term predictions of unemployment, credit spreads, and 10-year interest
rates, and to a lesser extent of GDP growth, payroll growth, and inflation. The positive association
with future consumption growth (which is though insignificant as in the second panel) supports the
view that the news-based pricing kernel is compatible with long-run consumption risk as in Bansal and
Yaron (2004) and Liu and Matthies (2022).14 In the fourth and fifth panels, the positive prediction
for nonfarm payroll and negative prediction for unemployment rate suggest that our state variables
track (in part) the human capital component of investor wealth from anticipated labor income.15
That xMVE
t positively predicts housing development and negatively predicts credit spread suggests
that narrative states track other nonequity components of investor wealth as well. Additionally, we
employ the VAR framework to inspect the prediction properties of narrative state variables above
and beyond other standard macroeconomic variables. We estimate VAR(3) systems of five variables:
the narrative attention levels (θtMVE ), the S&P 500 index, the Federal Funds rate, employment,
and industrial production. We experiment with two VAR specifications by switching the order
of θtMVE and the S&P 500 index. They correspond to different specifications of the unidentified
structural shocks. The impulse response functions show θtMVE shock has a positive and significant
impact on market values over the next year. The effect persists beyond 2 years, though it becomes
statistically insignificant at this horizon. This is true even with the conservative specification where
the contemporaneous effect of θtMVE on the S&P 500 index is forced to zero. Internet Appendix C.8
Our estimated factor pricing model explicitly links news narratives to state variables, allowing us to
trace factor behavior back to the narrative attention data. Furthermore, the estimated LDA model
provides a link between narrative attention and the underlying primitive text data. Taken together,
14
We report contemporaneous correlations of narrative state variables with macroeconomic variables in Table C.1.
It shows xMVE is positively correlated with contemporaneous consumption growth, supporting the general rationale of
consumption CAPM models.
15
The literature that focuses on labor income risks in asset pricing includes Jagannathan and Wang (1996), Julliard
(2007), and Liu (2021).
24
the estimated factor model and LDA model provide a vehicle for interpreting asset pricing risk factors
Our interpretation method quantifies the impact of news on state variables. Imagine a hypothetical
state of the world s, where a new piece of text is reflected as a narrative attention shock, z(s). The
news shock can be at the level of daily narrative attention, a single news article, or even an individual
term count. Equation (1) maps narrative attention shocks to state variables according to
where the L × K matrix Iz→x := A(A⊤ A)−1 summarizes how L narratives affect the K states. The
latent factor structure of our model (because of the concomitant rotational indeterminacy) makes
the identity of x(s) ambiguous. However, the mapping between z(s) on the MVE combination of
state variables
−1
xMVE := bMVE x, with bMVE := µ⊤
f Σff
is invariant and thus unambiguous.16 The MVE state variable is special in that it represents the
univariate pricing kernel, which according to the theory is the sole source of cross-sectional variation
where the L × 1 “impact vector” Iz→MVE summarizes the impact of each narrative on the pricing
kernel. We also trace the MVE impulse one step further from the narrative level to the term
level. Imagine now that in hypothetical state s, a new piece of text is reflected as a change in term
frequencies, ∆w(s) (a V ×1 vector with V the vocabulary size). The LDA model maps V -dimensional
−1 ⊤
term frequencies into L-dimensional narrative attention according to z(s) = Φ⊤ Φ Φ ∆w(s). The
induced narrative-level innovations then impact the MVE according to (11). Combined, the mapping
16 MVE
x is defined such that its mimicking portfolio is the tradable MVE combination of systematic factors (f MVE :=
MVE
b f ). The rotational indeterminacy refers to that the model is invariant if β, f, x are changed to βR−1 , Rf, Rx, for
some K × K invertible matrix R. However, f MVE , xMVE are invariant to the rotation matrix R.
25
from term shocks to state variable response is
−1
xMVE (s) = bMVE (A⊤ A)−1 A⊤ Φ⊤ Φ⊤ Φ⊤ ∆w(s). (12)
| {z }
⊤
:=Iw→MVE
Notice that the term-level impact vector Iw→MVE is a linear combination of the L columns of Φ. Hence
the term-level effects (Iw→MVE ) describe a composite narrative that combines the basis narratives
(ϕl ’s) according to their impacts on xMVE . Unlike Φ, which consists of only nonnegative entries,
Iw→MVE has both positive and negative entries because the pricing kernel xMVE can be affected in
both directions. Besides the MVE combination of latent factors, we can also interpret the observed
market factor, Mktt , given that a combination of the latent factors almost perfectly spans Mktt : the
R2 of projection Mktt = bMkt ft + errorst is 97.6%. Replacing the combination weights bMVE by bMkt
in (11) and (12), respectively, we similarly define impact vectors, Iz→Mkt and Iw→Mkt , that trace the
impact on the market factor originating from narrative-level and term-level shocks.17
We report impact vectors from full-sample model estimates for the K = 3 specification. Figure 7,
panel 1, plots the impact of each selected narrative on the pricing kernel (Iz→MVE ). The “Recession”
narrative (whose keywords include “economic downturn,” “weak economy,” and “economic slump”)
has the most negative impact on the pricing kernel. Thus, the model infers that an increase in
“Recession” narrative attention associates with an increase in the marginal utility of consumption.
It also implies that assets whose returns positively correlate with “Recession” attention provide
hedging benefits and hence carry high valuations (low risk premiums). The narratives “Record High”
(whose keywords include “highest level,” “pent-up demand,” and “remain strong”) and “Optimism”
(whose keywords include “remain optimistic,” “express confidence,” and “positive sign”) have the
largest positive impact on xMVE . The next three panels in Figure 7 report the impacts on the three
individual risk factors comprising xt (the three columns of Iz→x ). While the three impacts are not
rotationally identified, they show that some narratives like “Trading Activity” (keywords including
“market action” and “volume total”) and “Problems” (keywords including “major problem” and
17
We run full-sample projection of Mktt onto the estimated ft to construct bMkt and use full-sample estimates µ
bf , Σ
b ff
MVE
for b . Besides Mktt , one can follow the projection method to build impact vectors for other series as well.
26
Figure 7: Risk factor interpretation at the narrative level
Recession
Record high
Trading activity
Problems
Earnings forecasts
Bear/bull market
Earnings
Optimism
Treasury bonds
Profits
International exchanges
Credit cards
Venture capital
Pos.
Trade agreements Neg.
Fast food
0.00 0.01 0.000 0.025 0.00 0.05 0.00 0.05 0.00 0.01
This figure plots the narrative-level impact vectors. The length of the bars represents the absolute
values of corresponding entries, with red for negative impact and blue for positive.
“debacle”) can have large yet opposing impacts on individual factors that net out to a small impact
on the MVE. In the last panel in Figure 7, we see the market factor’s impact vector has similar
signs with the MVE in terms of “Recession,” “Record high,” and “Optimism.” The differences
are in the additional loadings on the narratives that describe financial market activeness, such as
“Trading activity,” “Bear/bull market,” and “Problems.” As these narratives are orthogonal to
xMVE , they add risk but contribute little expected return, dragging Mktt away from multivariate
mean-variance efficiency. Moving from narratives to individual terms, Figure 8 reports the model’s
term-level impact vector, Iw→MVE . The dimension of this vector is over 18,000, so we display the
terms’ impact on xMVE in term of clouds, where the magnitude of a term’s impact is proportional
to the size of the term in the cloud. Terms with negative and positive impacts are reported in
separate clouds. The term clouds most reflect keywords from the “Recession” and the “Record high”
narratives. Although the construction is without human input on the terms’ semantic meanings,
the term clouds show two distinct extremes in the language used to describe the current market
condition and the investment outlook. It is understandable that these terms could be related to
27
Figure 8: MVE state variable (xMVE ) interpretation at the term level
Negative Positive
The figure illustrates each term’s impact on the MVE state variable (Iw→MVE ). A term’s font size
corresponds to the absolute magnitude of its impact. The construction method is detailed in Section
6.1.
concurrent adjustments to consumption and hence the pricing kernel. Table 4 describes the detailed
contents of a few example narratives. It reports headlines of articles with the highest attention to
“Recession,” “Record High,” and “Trading Activity” narratives. LDA identifies these articles by their
articles cover various aspects of the economic outlook, ranging from durable goods consumption to
labor market activities. And, while the keyword list for “Record High” in the Internet Appendix
might appear to indicate a focus on asset price movements, we see that its articles are in fact about
fluctuations in the real economy. Both of these examples are consistent with the ICAPM mechanism
linking the pricing kernel to real investment opportunities, as the “Recession” and “Record High”
narratives have the largest negative and positive impacts on xMVE , respectively. As a contrasting
example, articles closest to the “Trading Activity” narrative mostly provide an ex post account of
the prior day’s asset returns, and interestingly this narrative has essentially zero net impact on the
pricing kernel.
28
Table 4: Example news articles of the three most prominent narratives
Recession
1993-05-07 Auto Registrations Continued to Slump In Europe Last Month
2001-04-25 Consumer Confidence Slides on Fears of Layoffs
2009-02-19 U.S. News: Housing Starts Hit Lowest Level In Half-Century
2011-08-02 World News: Manufacturing Slowdown Adds to Gloom on Economy
2016-07-08 World News: U.K. Consumer Sentiment Takes Dive
Record high
1989-07-05 Japan Vehicle Sales Rise
1994-07-01 Purchasing Managers In U.K. Survey Report Rise for June Orders
1995-02-27 Hiring Outlook For Second Quarter Appears Vigorous
2006-01-12 Wall Street Bonuses Hit a Record in 2005
2016-07-20 U.S. News: Home Building Continues Recovery as Demand Rises
Trading activity
1993-12-30 Industrials Rise A Bit to Record; Bonds Decline
1994-10-20 Profit News Helps Boost Stock Prices — Indexes Gain Ground Despite Weakness
Of Bonds and Dollar
1996-06-21 Nasdaq Sinks Amid Sell-Off Of Tech Stocks
1997-12-09 Blue Chips Fall As Dollar’s Rise Causes Concern
1998-04-21 Drug Stocks Resume Gains; Blue Chips Fall
This table reports example articles with the highest attention to the three narratives, respectively
(titles only; see text bodies in Internet Appendix D.1). The shade of the yellow highlighting on each
term reflects ϕv,l , the term’s probability for the corresponding narrative.
We can use the impact vectors from the model to interpret daily fluctuations in the market factor.
The model-implied impact on the market factor of an article m with narrative attention innovation
⊤
z(m) is Iz→Mkt z(m).18 Figure 9 plots the observed market return series adjusted by the conditional
volatility. We focus on the days with extreme values in order to understand the types of news
associated with large swings in the market factor. For each of those days, we retrieve the article
⊤
whose model-implied impact on the market, Iz→Mkt z(m), is the greatest. 19 The retrieved articles
in Figure 9 provide a human-readable account of influential events that triggered investor concerns
and moved the market factor. For instance, the articles capture the Black Monday in October 1987,
the onset of the global financial crisis in 2007, and the worries that the looming Brexit would slow
18
We calculate z(m) similar to zτ : z(m) := θm − 51 5ι=1 θτm −ι , where θm ∈ ∆V is article m’s topic attention levels
P
calculated from LDA; the summation is over the attention levels in the 5 days before the date of article m.
19
The method can pick more than one influential article to provide a more thorough narrative account of the day.
In principle, this approach to narrative retrieval can be potentially applied in real time to “translate” from textual
news to quantitative price changes.
29
Figure 9: Event retrieval for market returns
Jun 27, 2016
Sep 12, 1986 EU Tumult Ripples Through Markets --- Europe's battered
Free Fall: Interest-Rate Worries And Program Trading Send lenders face new risks from investors and an uncertain
Stocks Plunging --- Automated Selling Generates Biggest One- Feb 17, 1993 economy
Day Decline As Volume Sets a Record --- A Fluke or a Stocks Slump as Clinton's Plan Sparks Fears
Possible Omen?
Nov 18, 1991 Oct 28, 1997
The Outlook: Double-Dip Recession Possible Not Likely Drug Makers High-Tech Stocks Head Roster of Day's Big Losers
0
Mkt Return Standardized by Conditional Volatility
10
Mkt
Blue curve: Daily market returns adjusted by conditional volatility, σfτMkt , where στMkt is EWMA
τ
volatility. We tag large spikes with retrieved news events. The article titles are reported in this
figure, and the text bodies are in Internet Appendix D.2.
Europe’s recovery from the debt crisis in 2016. The news articles connect market fluctuations with
investor’s forward-looking concerns about economic fundamentals. For example, the event in 1986
is linked to worries about interest rates. Fiscal and labor market policy concerns during the Clinton
administration are retrieved in 1993 and 1994. To show more detail, Internet Appendix D.2 reports
excerpts of the retrieved articles and displays how the machine “reads” quantitative contents from
the textual articles. We color highlight each term according to its impact on the market return
(Iw→Mkt ), with red for negative and blue for positive impacts. Terms “weak outlook,” “global
economic slowdown,” “government debt worries,” “EU Tumult Ripples,” etc., are picked up by the
model and assigned negative impacts on the market factor. These terms belong to the “Recession”
30
and “Problems” narratives, which are the leading negative drivers of the market as reported in the
last panel of Figure 7. Aggregating these terms in an article gives rise to the article’s overall impact
7 Conclusion
Investors’ macroeconomic risk assessments are central to theoretical asset pricing but are empiri-
cally difficult to measure. We propose a novel method for estimating ICAPM state variables and
asset pricing factors from business news narratives in The Wall Street Journal. Our method fuses
natural language processing models with empirical asset pricing models. First, we use an LDA
topic model to distill a comparatively low-dimensional set of 180 narratives from the much higher-
dimensional raw term counts. Next, we use a form of IPCA that generalizes the Fama-MacBeth
two-step procedure to deal with the situation in which state variables are not directly observed and
must be reduced and selected from many potential proxies (in the form of narrative attention in
our application). We devise a Sparse IPCA implementation that uses a group lasso penalty to filter
out narratives that are irrelevant for describing the behavior of asset prices, and we find that this
regularization approach significantly improves the model’s out-of-sample asset pricing performance.
Quantitatively, the narrative factors explain returns on anomaly portfolios with smaller pricing errors
than benchmark characteristic-based factor models. The narrative factors are portfolios formed with
only textual news data, a source of conditioning information that is completely different from the
vast literature that uses firm characteristics. The MVE combination of narrative factors achieves
higher out-of-sample Sharpe ratios than Fama-French and momentum factors. The narrative factors
are empirically consistent with the ICAPM mechanism that state variables are priced sources of risk
because they forecast future investment opportunities and components of nonmarketable wealth.
Integrating news text in an asset pricing model affords concrete interpretations of the estimated risk
factors. Among the narratives studied, we find that attention to the “Recession” narrative has the
largest negative impact on the pricing kernel. We show that the underlying raw text associated with
the “Recession” narrative covers a variety of events related to real macroeconomic activity. We view
news text as a promising source of information for quantitative models of asset prices. Concepts like
public information and investor expectations are central to both rational and behavioral models but
31
are difficult to proxy for with traditional data. This paper demonstrates a set of tools for extracting
signals from news text that can quantitatively match asset pricing phenomena, while aligning with
32
References
Baker, S. R., N. Bloom, and S. J. Davis. 2016. Measuring economic policy uncertainty. Q. J. Econ.
131:1593–636.
Bali, T. G., and R. Engle. 2010. The intertemporal capital asset pricing model with dynamic
conditional correlations. Journal of Monetary Economics 57:377–90.
Bansal, R., and A. Yaron. 2004. Risks for the long run: A potential resolution of asset pricing
puzzles. Journal of Finance 59:1481–509.
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine
Learning Research 3:993–1022.
Bryzgalova, S. 2015. Spurious factors in linear asset pricing models. Working Paper, Stanford .
Bybee, L., B. T. Kelly, A. Manela, and D. Xiu. 2021. Business news and business cycles. Working
Paper, Yale University .
Chen, N.-F., R. Roll, and S. Ross. 1986. Economic forces and the stock market. Journal of Business
59:383–403.
———. 2009. Asset pricing, revised edition. Princeton, NJ: Princeton University Press.
Cong, L. W., T. Liang, and X. Zhang. 2019. Textual factors: A scalable, interpretable, and data-
driven approach to analyzing unstructured information. Working Paper, Cornell University .
Engle, R. F., S. Giglio, B. Kelly, H. Lee, and J. Stroebel. 2020. Hedging climate change news. Review
of Financial Studies 33:1184–216.
Fama, E. F., and K. R. French. 1996. Multifactor explanations of asset pricing anomalies. Journal
of Finance 51:55–84.
———. 2016. Dissecting anomalies with a five-factor model. Review of Financial Studies 29:69–103.
Fama, E. F., and J. D. MacBeth. 1973. Risk, return, and equilibrium: Empirical tests. Journal of
Political Economy 81:607–36.
Gentzkow, M., B. Kelly, and M. Taddy. 2019. Text as data. Journal of Economic Literature 57:535–
74.
33
Gentzkow, M., and J. M. Shapiro. 2010. What drives media slant? Evidence from US daily newspa-
pers. Econometrica 78:35–71.
Giglio, S., and D. Xiu. 2021. Asset pricing with omitted factors. Journal of Political Economy
129:1947–90.
Gu, S., B. Kelly, and D. Xiu. 2020. Empirical asset pricing via machine learning. Review of Financial
Studies 33:2223–73.
Hou, K., C. Xue, and L. Zhang. 2015. Digesting anomalies: An investment approach. Review of
Financial Studies 28:650–705.
Jagannathan, R., and Z. Wang. 1996. The conditional CAPM and the cross-section of expected
returns. Journal of Finance 51:3–53.
Jeon, Y., T. H. McCurdy, and X. Zhao. 2021. News as sources of jumps in stock returns: Evidence
from 21 million news articles for 9000 companies. Journal of Financial Economics 145:1–17.
Julliard, C. 2007. Labor income risk and asset returns. Working Paper, London School of Economics
.
Ke, Z. T., B. T. Kelly, and D. Xiu. 2020. Predicting returns with text data. Working Paper, Harvard
University .
Kelly, B., A. Manela, and A. Moreira. 2021. Text selection. Journal of Business & Economic
Statistics 39:859–79.
Kelly, B. T., S. Pruitt, and Y. Su. 2017. Instrumented principal component analysis. Working Paper,
Yale SOM .
———. 2019. Characteristics are covariances: A unified model of risk and return. Journal of
Financial Economics 134:501–24.
Lettau, M., and M. Pelger. 2020. Factors that fit the time series and cross-section of stock returns.
Review of Financial Studies 33:2274–325.
Liu, Y., and B. Matthies. 2022. Long run risk: Is it there? Journal of Finance 77:1587–633.
Lopez-Lira, A. 2020. Risk factors that matter: Textual analysis of risk disclosures for the cross-section
of returns. Working Paper, University of Florida .
Loughran, T., and B. Mcdonald. 2016. Textual analysis in accounting and finance: A survey. Journal
of Accounting Research 54:1187–230.
Maio, P., and P. Santa-Clara. 2012. Multifactor models and their consistency with the ICAPM.
Journal of Financial Economics 106:586–613.
34
Manela, A., and A. Moreira. 2017. News implied volatility and disaster concerns. Journal of Financial
Economics 123:137–62.
McCracken, M. W., and S. Ng. 2016. Fred-Md: A monthly database for macroeconomic research.
Journal of Business & Economic Statistics 34:574–89.
Mullainathan, S., and A. Shleifer. 2005. The market for news. American Economic Review 95:1031–
53.
Pelger, M., and R. Xiong. 2021. Interpretable sparse proximate factors for large dimensions. Journal
of Business & Economic Statistics 40:1642–64.
Roll, R. 1977. A critique of the asset pricing theory’s tests part i: On past and potential testability
of the theory. Journal of Financial Economics 4:129–76.
Rossi, A. G., and A. Timmermann. 2015. Modeling covariance risk in Merton’s ICAPM. Review of
Financial Studies 28:1428–61.
Steyvers, M., and T. Griffiths. 2007. Probabilistic topic models. Handbook of Latent Semantic
Analysis 427:424–40.
Stock, J. H., and M. W. Watson. 2001. Vector autoregressions. Journal of Economic Perspectives
15:101–15.
Tetlock, P. C. 2007. Giving content to investor sentiment: The role of media in the stock market.
Journal of Finance 62:1139–68.
———. 2011. All the news that’s fit to reprint: Do investors react to stale information? Review of
Financial Studies 24:1481–512.
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
Society: Series B (Methodological) 58:267–88.
Yang, Y., and H. Zou. 2015. A fast unified algorithm for solving group-lasso penalize learning
problems. Statistics and Computing 25:1129–41.
Yuan, M., and Y. Lin. 2006. Model selection and estimation in regression with grouped variables.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68:49–67.
Zou, H., T. Hastie, and R. Tibshirani. 2012. Sparse principal component analysis. Journal of
Computational and Graphical Statistics 15:265–86.
35
Internet Appendix
36
(a) Replace trailing “sses” with “ss”
(b) Replace trailing “ies” with “y”
(c) Remove trailing “s”
(d) Remove trailing “ly”
(e) Remove trailing “ed.” Replace remaining trailing “ed” with “e”
(f) Replace trailing “ing” with “e.” For remaining trailing “ing” that follow a pair of identical
consonants, remove “ing” and one consonant. Remove remaining trailing “ing.”
(g) Remove words with fewer than three letters.
12. From the resultant unigrams, generate the set of bigrams as all pairs of (ordered) adjacent
unigrams.
13. Exclude terms (both unigrams and bigrams) appearing in less than 0.1% of articles. The unique
set of terms is the corpus vocabulary. Each column of the DTM corresponds to an element of
the vocabulary.
14. Convert an article’s word list into a vector of counts for each term in the vocabulary. This
vector is the row of the DTM corresponding to the article.
We estimate θ and ϕ via the Gibbs sampling procedure proposed by Steyvers and Griffiths (2007).
This procedure uses an equivalent form to the DGP given in Equation (2), while introducing an
intermediate parameter ym,i corresponding to the topic assignment for each word.
where ωm,i is the observed word assignment of the i’th word in article m. The Gibbs sampler
PLenm PM PLenm
I(ym,i = l) I(ωm,i = v)I(ym,i = l)
θm,l = i=1
, ϕv,l = PV PM i=1
m=1
PLenm .20 , 21 (14)
Lenm v=1 m=1 i=1 I(ωm,i = v)I(ym,i = l)
While the topic model is estimated using article-level data, we aggregate news attention at the
P PLenm
m∈Mτ i=1 I(ym,i = l)
θτ,l = P (15)
m∈Mτ Lenm
37
where Mτ is the set of articles published on the next morning of calendar day τ .
LDA requires specifying the number of topics (L) when estimating the model. To make this decision,
we estimate a series of models with various L’s from 50 to 250 in increments of 10. We then choose
the L that maximized the Bayes factor, the ratio of posterior probabilities for the specified model to
the null model. As an additional robustness check, we perform a ten-fold cross-validation, where we
iteratively refit the model on a subset of training data and then evaluate it on a held-out validation
sample. Figure A.1 reports the results for both methods and suggests that 170–180 is an optimal
range.
Cross-Validation
Bayes-Factor
100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250
Number of Topics (L)
Figure A.2 plots θτ,l time series for the few relevant narratives. The words below each plot are those
with large ϕv,l . They help display the readable content of the narrative. The red curves visualized
the low-frequency variation using the HP filter. Visualization of all other narratives can be found on
structureofnews.com.
38
Figure A.2: Narrative Attention and Keywords
Note: Blue curve: narrative attention level (θl,t ) at the monthly frequency for selected narratives.
Red curve: Trend component of the blue curve extracted by the monthly HP-filter. Word list below:
terms (unigrams and bigrams) with high ϕv,l .
39
B Additional Details about the Estimation Method
In step 1, we use exponentially decaying weights to estimate the realized narrative covariances. The
kernel function is κ(τ ; t) := ξ (t−tτ ) /( τ <τt ξ (t−tτ ) ) for τ < τt and 0 otherwise, where ξ := 0.99, τt is
P
the last day of month t, and tτ is the month of day τ . Next, we provide additional details for a few
terms in the target function (8), which is repeated here for ease of reference:
L
1X 2
∥ft ∥22 ,
X X
min (ri,t − ci,t−1 Γft ) + λNS σlc ∥Γl ∥2 +
Γ,{ft } 2
i,t∈S l=0 t∈S
We extend the linear instrumental mapping to allow for a constant term. To wrap it in the matrix
both {f } and Γ, the wrap-up step ⟨3⟩ cuts out Γ e from the estimated Γ, from which matrix A is backed
design Sparse IPCA with the row-wise group lasso regularization, ∥Γl ∥2 (where ∥·∥2 is the Euclidean
q
norm, a.k.a. L2 norm, such that ∥Γl ∥2 = Γ2l,1 + Γ2l,2 + · · · + Γ2l,K ), instead of the simpler element-
P
wise lasso that additively penalizes the absolute value of each element, k |Γl,k |. It is not meaningful
to distinguish the particular factor an instrument matters to, since the K individual factors are
rotationally unidentified anyway. Instead, we regularize the norm of Γl without distinguishing the
direction in which it deviates from 01×K . Scaling constant σlc is the standard deviation of cov
c l,i,t−1
across S (with σ0c assigned as 1). The purpose of σlc is to place the strength of regularization of
each narrative on the same scale. We are effectively regularizing the panel-wise standard deviation
of ∥cl,i,t−1 Γl ∥2 , which is the part of βi,t variation contributed by instrument l. This adjustment with
σlc indeed follows the conventional procedure in lasso regressions, where regressors are standardized
in order to bring the coefficients to the same level such that the coefficients are subject to the
same strength of regularization. We do not want to simply standardize the regressors (in our case
cl,i,t−1 ), as it will change the magnitude of the coefficients (in our case Γ). Instead, the strength
40
of the penalty is adjusted by σlc directly, such that the magnitude (and the interpretation) of Γ is
preserved. Constant NS is the number of {i, t} observations in the estimation sample panel (S); It
rescales λ such that the magnitude of the regularization term keeps up with the model fit term as
the sample size changes. The rescaling is necessary with expanding window based OOS construction
(appearing in Section 4.2). Otherwise, the effective strength of the same λ value would be varied
P 2
across samples with different sizes. The third term, t∈S ∥ft ∥2 , is added for a technical reason.
Notice the model fit term (the first term) is invariant if we shrink Γ and expand ft by the same
multiple. Therefore, without the third term’s restriction on ft , the minimization will return an
infinitesimal Γ that bypasses its penalties. The third term is to balance the shrinkage effect on Γ,
Minimization problem (8) is solved numerically with an Alternating Regularized Least Squares
(ARLS) algorithm. It alternates between minimizing over Γ, while holding {ft } fixed and mini-
mizing over {ft }, while holding Γ fixed. The ARLS algorithm can be seen as a special case of the
Block Coordinate Descent algorithm, with the two parameters as the two “blocks.” The process
is terminated when the joint target function’s descent is small (or when the first-order condition
is satisfied) within a numerical tolerance. This algorithm is similar to the unregularized IPCA’s
alternating least squares (ALS) method, except that the two subproblems become regularized least
squares. In particular, the Γ subproblem is a group lasso regression on the {i, t} panel:
L
1 X 2 X
min ri,t − ci,t−1 ⊗ ft⊤ vect (Γ) + λNS σlc ∥Γl ∥2 ,
Γ 2
{i,t}∈S l=0
where ci,t−1 ⊗ft⊤ constitutes the (L+1)K-variate regressor, and the regression coefficients are vect (Γ)
(LK × 1 vector), the vectorization of Γ that transposes and stacks up the rows of Γ. We solve the
group lasso regression numerically using Yang and Zou’s (2015) algorithm. The {ft } subproblem is
1
min ∥rt − Ct−1 Γft ∥22 + ∥ft ∥22 ,
ft 2
41
with the analytical solution
−1
ft = Γ⊤ Ct−1
⊤
Ct−1 Γ + 2IK Γ⊤ Ct−1
⊤
rt , (16)
where rt (Nt × 1 vector) is the cross-section of ri,t at time t. Similarly, Ct−1 (Nt × (L + 1) matrix)
is ci,t−1 ’s stacked up. Notice, according to this formula, the sample estimates of ft are tradeable
portfolio returns constructed from individual assets. As long as Ct−1 and Γ are estimated with data
observable before the start of month t, the portfolio weights are ex ante available. Both the two
subproblems easily adapt to unbalanced panels, hence so does Sparse IPCA overall. As a result,
Sparse IPCA is applicable to data sets with a large panel and missing entries like stock returns.
42
C Additional Empirical Results
Figure C.1 visualizes the full-sample estimates of ft and ftMVE at K = 3. Table C.1 reports the
Factor 1 Factor 2
Factor 3 MVE
factor’s correlation with macroeconomic variables and canonical financial time series at the monthly
frequency.
To further justify that Sparse IPCA is effective in distinguishing relevant and irrelevant instruments,
we conduct an experiment with simulated placebo instruments. In addition to the 180 observed
narratives (zτ ), we randomly generated an equal amount of placebos to “confuse” the estimation.
In detail, we generate each placebo zl,τ as an i.i.d. normal sequence that matches the times series
variance of a corresponding real zl,τ sequence. We examine whether the narratives that we know for
sure are irrelevant can be successfully filtered out. Figure C.2 reports the results of this experiment
at the benchmark of K = 3. The y-axis plots each instrument’s maxλl (the maximum λ at which
43
Table C.1: Narrative Factor’s Correlations with Macroeconomic Variables and Financial Time Series
narrative l is still selected) in log scale. The black dots mark the real instruments’ maxλl in the
original estimation with only the real narratives (i.e., L = 180). The blue and red bars are for real
and placebo instruments, respectively, estimated under the specification with placebos (i.e., L = 360).
They are sorted according to maxλl from left to right. The gray horizontal line represents the tuned
λ∗S under the specification with placebos (L = 360), so that only the bars that are higher than the
gray line are eventually selected. The figure shows all of such bars are blue (real). That means none
of the 180 placebos is selected under λ∗S . The vertical differences between the blue bars and black
dots represents the difference in selection before and after introducing placebos. We see vertical
differences are narrow, at least for the more relevant ones (the ones on the left with high maxλl ).
This implies that Sparse IPCA can effectively filter out irrelevant instruments, and the estimates are
We provide the results with the λ tuning based on the leave-one-out cross-validation (LOOCV)
44
Figure C.2: Placebo Test
Note: Vertical axis: maxλl , the maximum λ at which a narrative is still selected. Bars: maxλl of
180 real (in green) + 180 placebo (in red) narratives, sorted by maxλl . Black dots: maxλl of 180 real
narratives in the original specification without placebos. Gray horizontal line: the tuned λ∗S in the
specification with placebos (L = 360).
Method: The key difference is three stages of sample partitioning rather than two as in the baseline
specification. Within each expanding in-sample S, we further partition it into training samples and
validation samples following the LOOCV method. In detail, for each month t ∈ S, form an “leave-
one-out” training sample S \ {t}. Estimate parameters (Γ, µ, Σ) in the training sample with different
λ values. Form ftMVE in month t that has been left out. Stitch these ftMVE together over all the t’s
in S to form the testing sequence of investment returns and evaluate the sequence’s Sharpe ratio as
Once λ∗LOOCV
S is tuned from S, the rest of the OOS formation is the same as before: estimate
OOS MVE portfolio returns. The reported Sharpe ratios are still OOS, the only difference is it is
on the LOOCV method. The process is largely the same as above except for also tuning K at the
place where λ is tuned. That is, for each in-sample S, we find the pair (λ, K) that yields the highest
45
cross-validated ftMVE Sharpe ratio. The ftMVE sequence is formed in the same fashion as above by
Results: Table C.2 adds the results with the LOOCV tuning to the results with standard tuning
resulting in greater numbers of selected instruments. However, Sharpe ratio performance does not
show any consistent improvement. The Sharpe ratio results are very close. We suggest the reason is
the simpler main configuration (λ∗S ) is already good enough as it does not suffer in-sample overfitting.
We also find the joint λ, K tuning is able to achieve the ex post highest level of Sharpe ratio of 1.30,
suggesting that the tuning method is also able to make the discrete choice of K.
K
λ selection method Statistics 1 2 3 4 5 6
λ = λ∗S Sharpe ratio 0.48 1.00 1.10 1.26 1.32 1.31
(copy Table 2) # narratives 2.9 4.9 12.1 39.1 43.4 61.8
λ = λ∗LOOCV
S Sharpe ratio 0.48 0.95 1.11 1.21 1.20 1.30
# Narratives 2.9 7.6 19.3 44.1 49.1 76.8
Figure C.3 shows the pricing errors (α) of the 78 characteristic-sorted portfolios under the two factor
models, NF3 and FFC6. The anomaly portfolios are listed from top to bottom in the order of |α|
under the NF model. The figure shows the signs and the magnitudes of the α’s across the tests
assets tend to be correlated between FFC6 and NF3. The 1-month momentum portfolio remains the
strongest anomaly not explained by both models. It produces large negative α’s under both models.
This anomaly (also known as the short-term reversal strategy with the sign switched) is known for
being an illiquidity-driven phenomenon unrelated to risk exposure and supported by limits to arbi-
trage. This makes sense as we do not expect the ICAPM mechanism to explain fleeting mispricings
that are driven by short-term market frictions. A few other anomalies based on accounting ratios
are also not well explained by NF, for example, Depreciation/PP&E, Asset growth, and Cash hold-
46
ings. Betting-against beta (BAB, denoted as “Beta” in Figure C.3) is among the more interesting
portfolios whose expected return is well explained by NF. Both the BAB factor and NF sort stocks
on realized risk exposures. While BAB measures exposure to the market factor, our model is more
circumspect in how it quantifies systematic risk exposure. We observe that other anomalies that
have a risk exposure flavor (beta squared, return volatility, and idiosyncratic return volatility) are
Table C.3 repeats the asset pricing tests reported in Tables 1 and 2 for factor models estimated with
different specifications. The specifications are different from the benchmark reported in the main
text only in terms of the instruments supplied to the estimation procedure. Panel A repeats the
benchmark specification. Panel B abandons the news narrative-based approach. It uses a set of 129
macroeconomic series from the FRED monthly data series as zt inputs in calculating ci,t . The FRED
series are processed following the transformations recommended by the data set documentation
(McCracken and Ng, 2016). Panel C changes the benchmark specification for narrative attention
innovation calculated against the 5-day moving average, zτ := θτ − 15 5ι=1 θτ −ι , to three different
P
We examine the characteristics of the stocks with high and low β’s under the narrative factor model.
across the stock-month panel. Table C.4 reports the regression results. The narrative β’s have a
strikingly high correspondence with traditional characteristics (the R2 is around 40%), although
the two are from drastically different information sources, namely, textual news covariances rather
than “anomaly” characteristics computed from CRSP/Compustat. The signs of the MVE beta’s
coefficients conform to the existing knowledge on characteristics-based expected return patterns (aka
anomalies). In particular, we find a negative coefficient for Size (consistent with the size anomaly),
a negative coefficient for Beta (consistent with BAB), and a positive coefficient for Momentum
(consistent with the momentum). The usual measure for value, Book-to-market, is insignificant,
47
Figure C.3: Model Implied α Comparison Between FFC6 and NF3
48
Table C.3: Asset Pricing Performance under Different Specifications
Specification Statistic 1 2 3 4 5 6
See the table explanation in Appendix C.5 and the interpretation of the results in Section 4.3.
while Dividend-to-price is positive and significant, consistent with the value premium. To understand
the economic magnitude of these coefficients, let us take the coefficient of MVE β on Size (−0.29)
as an example. The characteristics are cross-sectionally rank-standardized from −0.5 to 0.5 at each
t. That mean, moving from the smallest to the biggest firm, the “Size” characteristic increases by 1,
so the exposure (β) to MVE should drop by 0.29. The implied cross-sectional difference of expected
return between the smallest and biggest stocks is 0.29 × 22.35%/12 = 0.54% per month (where
22.35% is the annual factor premium of ftMVE ). It matches the magnitude of the small size premium.
49
Cash flow to debt -0.05 ** 0.10 *** -0.22 *** -0.04 ***
Cash productivity -0.03 -0.02 * 0.08 *** 0.03 ***
Cash flow to price ratio -0.08 *** -0.11 *** 0.30 *** 0.08 ***
Industry-adjusted cash flow to price ratio 0.07 *** 0.02 ** -0.09 *** -0.03 ***
Industry-adjusted change in asset turnover 0.00 0.03 *** -0.08 *** -0.02 ***
Change in shares outstanding -0.01 0.00 -0.00 -0.00
Industry-adjusted change in employees -0.05 *** -0.01 0.02 0.01
Change in inventory -0.04 *** 0.02 *** -0.05 *** -0.01 **
Change in 6-month momentum -0.02 ** -0.01 * 0.03 ** 0.01 **
Industry-adjusted change in profit margin 0.01 -0.01 * 0.02 0.00
Convertible debt indicator 0.05 ** 0.05 *** -0.12 *** -0.03 ***
Current ratio -0.07 * 0.07 *** -0.16 *** -0.03 **
Depreciation / PP&E 0.08 *** 0.08 *** -0.22 *** -0.06 ***
Dividend initiation 0.20 *** 0.10 *** -0.23 *** -0.06 ***
Dividend omission -0.00 -0.07 *** 0.17 *** 0.04 ***
Dollar trading volume 0.18 *** 0.03 -0.12 -0.05 **
Dividend to price -0.20 *** -0.16 *** 0.41 *** 0.10 ***
Earnings announcement return 0.01 0.01 *** -0.02 *** -0.00 **
Growth in common shareholder equity -0.01 -0.03 *** 0.07 *** 0.02 ***
Earnings to price 0.09 *** 0.01 -0.05 * -0.02 ***
Gross profitability -0.07 *** 0.00 -0.00 0.00
Growth in capital expenditures 0.04 *** -0.00 0.00 -0.00
Industry sales concentration -0.01 0.02 -0.02 0.00
Employee growth rate 0.05 *** -0.00 0.00 -0.00
Idiosyncratic return volatility 0.31 *** 0.08 *** -0.22 *** -0.06 ***
Illiquidity 0.04 -0.02 0.11 * 0.04 **
Industry momentum 0.00 -0.01 ** 0.03 ** 0.01 *
Capital expenditures and inventory 0.10 *** -0.02 ** 0.06 ** 0.01
Leverage -0.13 *** 0.02 -0.02 0.01
Growth in long-term debt -0.05 *** -0.01 0.02 0.01
Maximum daily return 0.04 *** -0.01 0.01 -0.00
12-month momentum 0.03 *** -0.10 *** 0.25 *** 0.06 ***
1-month momentum -0.00 -0.03 *** 0.08 *** 0.02 ***
36-month momentum 0.06 *** -0.05 *** 0.10 *** 0.02 ***
6-month momentum 0.01 -0.01 0.03 0.01
Financial statement score -0.12 *** 0.01 -0.01 0.00
Industry-adjusted size -0.05 ** -0.02 0.03 0.01
Size -0.33 *** 0.61 *** -1.41 *** -0.29 ***
Number of earnings increases 0.02 ** 0.02 *** -0.05 *** -0.01 ***
Operating profitability -0.01 0.03 * -0.07 * -0.02
Industry adjusted % change in capital expenditures -0.03 *** -0.01 *** 0.04 *** 0.01 ***
% change in current ratio -0.01 0.01 -0.03 -0.01
% change in depreciation 0.00 -0.03 *** 0.07 *** 0.02 ***
% change in gross margin - % change in sales 0.05 *** 0.01 *** -0.05 *** -0.02 ***
% change in quick ratio 0.03 -0.03 ** 0.08 ** 0.02 **
% change in sales - % change in A/R 0.02 *** -0.01 0.01 0.00
Percent accruals 0.04 ** -0.05 *** 0.10 *** 0.02 **
Price delay -0.06 *** 0.03 *** -0.07 *** -0.01 ***
Financial statements score -0.05 *** -0.03 *** 0.08 *** 0.03 ***
Quick ratio 0.04 -0.02 0.06 0.01
R&D increase -0.01 0.03 *** -0.07 ** -0.02 **
Return volatility -0.07 *** 0.05 *** -0.11 *** -0.02 ***
Return on assets 0.01 -0.05 *** 0.13 *** 0.03 ***
Return on equity -0.06 *** 0.01 -0.02 -0.00
Return on invested capital -0.03 -0.01 0.02 0.00
Sales to cash 0.01 -0.07 *** 0.18 *** 0.04 ***
Sales to receivables -0.00 -0.01 0.01 -0.00
Secured debt indicator 0.07 *** 0.03 *** -0.07 *** -0.02 ***
Sales growth 0.02 * -0.03 *** 0.07 *** 0.01 **
50
Sin stocks -0.03 -0.07 0.18 0.04
Sales to price -0.08 ** 0.07 *** -0.17 *** -0.04 ***
Volatility of liquidity (dollar trading volume) 0.11 *** -0.15 *** 0.30 *** 0.05 ***
Volatility of liquidity (share turnover) -0.05 *** 0.05 *** -0.08 *** -0.01
Debt capacity/firm tangibility 0.01 0.03 ** -0.07 ** -0.02 **
Tax income to book income -0.01 -0.02 *** 0.05 *** 0.01 **
Share turnover -0.02 0.06 *** -0.12 *** -0.02 **
R2 20.85 44.61 43.69 38.96
We form MVE portfolios that combine narrative factors and FFC6 portfolios together and show
adding narrative factors improves the Sharpe ratio investment performance of the characteristics-
sorted portfolios. To connect FFC portfolios with our narrative selection and tuning method, the
detailed procedure goes as the following. Each entry in Table 3 represents the combination of K
NFs and a particular subset of the FFC6 portfolios. At each in-sample S, for each λ value, the
same procedure estimates narrative factors {ft } as well as parameter Γ. Calculate the (annualized)
Sharpe ratio of the MVE portfolio of the combination of narrative factors and FFC portfolios as
SRcombo (λ; S). In-sample tuning picks λ∗combo := arg maxλ SRcombo (λ; S). The rest of the OOS
portfolio formation is once again the same as before using the in-sample estimated MVE weights
under λ∗combo , except that the weights are with respect to the combination of K NFs plus a subset
We analyze the prediction properties of narrative state variables above and beyond other standard
time-series tool of VAR, we can control for expectations of future outcomes summarized in macroe-
conomic covariates. This provides a conservative test for the predictive effects of narrative states
above and beyond standard macroeconomic variables. In particular, we estimate two five-variate
VAR(3) systems with the same set of five variables but in different orders. In the first, the variables
are, in order, the MVE combination of the narrative attention levels (θtMVE ), the log level of the S&P
500 index, the Federal Funds rate, log employment, and log industrial production. In the second, we
switch the order of the first variables by having log S&P 500 first and θtMVE second, keeping the rest
51
unchanged. The structural VAR shocks are orthogonalized with a Cholesky factorization such that
the shock’s coefficient matrix is lower triangular (c.f. Stock and Watson, 2001), hence the ordering
of the variables matter for the result. This identification condition restricts the contemporaneous
impulse response of any variable ψ to the shock of another variable ϕ as zero if ϕ is ordered after ψ
in the VAR. The analysis follows Baker, Bloom, and Davis (2016), and we update their data to fit
Figure C.4 plots the impulse response function (IRF) of future equity market values to a one-
standard-deviation shock in θtMVE (in red). For comparison, we show the IRF of future equity
market values to a shock in current market value (in yellow). The two panels correspond to the
results attained switching the order of θMVE and log S&P. The figures show the θtMVE shock has a
θMVE first, log S&P second log S&P first, θMVE second
Note: Impulse response functions (IRF) of five-variate three-order VARs. Variable ordering in the
left panel: θMVE , log S&P 500, Fed funds rate, log employment, and log industrial production. The
right panel switches the ordering of the first two variables. Red curve: IRF of the logarithm of S&P
500 to standardized shocks of θMVE . Yellow curve: IRF of the logarithm of S&P 500 to standardized
shocks of the logarithm of S&P 500 itself. Shaded band: 90% confidence intervals.
large positive and highly significant impact on market values over the next year. The effect persists
beyond 2 years, though it becomes statistically insignificant at this horizon. In the right panel, by
listing the market index in front of θMVE , we force a zero contemporaneous response of the market
to the θMVE impulse. Despite this additional conservatism in assessing the impact of θMVE , the
intertemporal effect from θMVE on future market value remains positive and significant in the first
52
year but reverts over longer horizons.
53
D Raw News Text Content
This appendix section provides the text bodies of the news article titles that appear in Table 4 and
Figure 9.
The yellow highlighting follows the same rule as that in Table 4. The shades of the yellow highlighter
on each term reflect ϕv,l , the term’s probability for the corresponding narrative. For each article, we
54
LONDON The U.K. manufacturing sector posted an unexpected contraction in July falling to its lowest level in
more than two years while activity at eurozone factories slowed to a nearstandstill. The July data released Monday
suggested a poor start to the third quarter and damped hopes for a rebound. The U.K. manufacturing purchasing
managers index fell to 49.1 in July from 51.4 in June Markit Economics and the Chartered Institute of Purchasing
and Supply said. Markit Economics final eurozone manufacturing purchasing managers index fell to 50.4 in July
from 52 in June. A reading below 50 indicates activity is contracting. The last time the sector contracted in the
U.K. was June 2009 when Britain was still in recession. Eurozone new orders a forwardlooking indicator of activity
fell to a reading of 47.6 the lowest since June 2009.
Record high
1989-07-05: Japan Vehicle Sales Rise
TOKYO Sales of cars trucks and buses in Japan climbed 15.5 in June from a year earlier to 508319 units the Japan
Automobile Dealers Association said. The total was a record for the month surpassing the previous high of 439966
units set in June last year. The brisk June sales were the latest sign of the strength of the domestic auto market
which has seen demand surging in recent months. In May and April for instance sales renewed the record for these
months. In March they set an alltime high totaling 683299 units. This is totally unexpected one association official
said of the June sales. Everybody here is surprised. We didnt think sales would remain so strong for so long.
1994-07-01: Purchasing Managers In U.K. Survey Report Rise for June Orders
LONDON Britains purchasing managers index rose to a record in June the latest monthly survey from the Char-
tered Institute of Purchasing Supply shows. The index rose from 59.2 in May to 61.4 in June its highest level ever
and the fifth month in a row that purchasing managers have reported an upsurge in manufacturing activity. There
was significant growth in manufacturing activity during the month overtaking previous record levels and prices were
forced up as suppliers failed to meet the increase in demand. The institute said the June index was boosted by
record rises in new orders and employment and a strong surge in output. Order books improved across all U.K.
industries and regions in June. Increased demand in the domestic market was led by sales promotions and seasonal
factors and was supported by a recovery in exports.
55
WASHINGTON Home building in the U.S. rebounded in June a sign demand for housing continues to firm heading
into the second half of the year. Housing starts rose 4.8 from a month earlier to a seasonally adjusted annual rate
of 1.189 million in June the Commerce Department said on Tuesday. Home building continues to gradually recover
from the housing bust that accompanied the great recession said PNC chief economist Stuart Hoffman. Demand
for new singlefamily homes is slowly but steadily improving. That rising demand has led to concerns about the low
inventory of new and existing homes on the market which is pushing up prices and could weigh on further expansion.
But Tuesdays report showed an estimated 1.015 million homes under construction in June the highest level since
February 2008. Junes uptick was driven by a jump in starts in the West and the Northeast two of the pricier regions
in the country.
Trading activity
1993-12-30: Industrials Rise A Bit to Record; Bonds Decline
The Dow Jones Industrial Average crept to a thirdstraight record. Bond prices fell and the dollar rose. The in-
dustrial average added a scant 0.56 point to 3794.33. Standard Poors 500 stock index fell 0.36 to 470.58 and the
Nasdaq Composite Index rose 3.92 to 768.48. The industrial average climbed in early trading nearly cracking the
3800 level but then spent most of the day in negative territory until just before the close. Investors were greeted
early by some positive economic news the Commerce Departments index of leading indicators rose 0.5 in November
and existinghome sales jumped a betterthanexpected 2.9. The Dow Jones Transportation Average declined after
hitting a record on Tuesday. But Larry Rice chief market strategist at Josephthal Lyon Ross wasnt surprised that
the average slipped. The history of this market lately is that you get marginal new highs in the averages and then
they back off he said.
1994-10-20: Profit News Helps Boost Stock Prices — Indexes Gain Ground Despite Weakness Of Bonds and Dollar
Stock prices moved higher on the strength of robust earnings shrugging off declining bond prices and a weak dollar.
The Dow Jones Industrial Average rose 18.50 to 3936.04 marching closer to its record high of 3978.36. The bluechip
indicator which was up more than 30 points late in the session has gained an impressive 160.48 points or 4.2 in the
past nine sessions. Other indexes have failed to keep pace with the Dow industrials recent climb but yesterday they
also marched forward. The Standard Poors 500 stock index jumped 2.62 to 470.28 the New York Stock Exchange
Composite Index gained 1.07 to 258.32 and the Nasdaq Composite Index bolstered by strong technology earnings
rose 5.81 to 770.62. Analysts said another round of solid thirdquarter earnings highlighted by AMRs report yesterday
helped drive stock prices higher.
56
Drug and technology stocks soared financial and economically sensitive stocks swooned and the stock market fin-
ished mixed. The dollar also finished mixed and bonds declined. Pulling back from its Friday record the Dow Jones
Industrial Average lost 25.66 to close at 9141.84. But Standard Poors 500 stock index and the Nasdaq Composite
Index both bettered their Friday records. The SP 500 gained just 0.93 to 1123.65 but the technologystockheavy
Nasdaq surged 20.54 or 1.1 to close at 1887.14. After slipping last week on disappointing earnings announcements
drug stocks resumed their rise with news that Pfizers Viagra impotence pill is a huge seller and that Eli Lillys Evista
may prevent breast cancer. Tech stocks particularly Internetrelated shares have regained momentum following recent
favorable earnings news. KTel International which has announced that it will sell compact disks and other recordings
over the Internet rose 12 1516 to 41 58 it traded at 6 58 earlier this month.
We highlight each word according to its impact on the market return (Iz→Mkt ), with red for negative
and blue for positive impacts. The shades of the highlighting reflects the absolute magnitude.
1987-10-20 (-17.44): The Crash of ’87: Stocks Plunge 508.32 Amid Panicky Selling — A Repeat of ’29? Depression
in ’87 Is Not Expected — Banking System Safeguards And Federal Mechanisms Are Viewed as Adeqaute
Can it happen again On Oct. 28 1929 the stock market fell 12.8 ushering in the Great Depression. While the market
plunged 22.6 yesterday economists generally dont expect another depression. I dont think the economy looks like
it did in 1929 says George Stigler the winner of the 1982 Nobel Memorial Prize in Economics and a University of
Chicago economics professor. The most violent and urgent of factors in the great crash was the collapse of the bank-
ing system. That cant happen anymore because of the Federal Deposit Insurance Corp. and additional safeguards.
Mr. Stigler like other economists stresses that todays financial system and economic policy mechanisms provide
considerably more protection against the type of cascading economic collapse that crippled the nation during the
Depression which lasted from 1929 to 1933. During that period the value of the nations output contracted by more
than 50 and unemployment rates rose to nearly 25.
57
NEW YORK With recoveries like this who needs a recession Last weeks gloomy news from the drop in retail
sales to the jump in joblessinsurance claims to Fridays stockmarket plunge hardly inspires confidence that the
economy is indeed recovering and will avoid a doubledip recession. Much will depend of course on what occurs in
coming weeks in Washington. For now no one knows if the economys modest thirdquarter rise was merely an uptick
in a longrunning slump or the start of a sustained recovery. But the evidence on balance still points to the latter
eventuality. Doubledip recessions are rare but they do occur. The economy rose briefly amid the yearlong recession
of 196970. And many analysts regard the sixmonth recession of 1980 and the 16month recession of 198182 as really
a huge doubledipping slump interrupted by a year of economic growth.
1997-10-28 (-6.58): Drug Makers High-Tech Stocks Head Roster of Day’s Big Losers
NEW YORK In a day that saw the largest trading volume ever on the New York Stock Exchange the 30 stocks
in the Dow Jones Industrial Average lost a total of 129 billion in market capitalization. From the Dows peak Aug.
6 when the average closed at 8259 and the market capitalization stood at 1.94 trillion the 30 stocks in the indus-
trial average have surrendered 264 billion. Yesterdays Big Board volume totaled 685496330 breaking the previous
record of 683800820 set Jan. 23. The outstanding losers in a session chockablock with big losses came from two
groups healthcare stocks including both healthcare providers and drug makers and hightechnology stocks. Among
the healthcare stocks the biggest loser was Oxford Health Plans which plunged 42 78 to 25 78 after the company
said it would post a thirdquarter loss despite expectations that the company would show a profit for the period.
2011-08-05 (-5.04): Stocks Nose-Dive Amid Global Fears — Weak Outlook Government Debt Worries Drive Dow’s
Biggest Point Drop Since ’08
58
Stocks spiraled downward Thursday as investors buckled under the strain of the global economic slowdown and
the failure of policy makers to stabilize financial markets. The selling began in Europe and continued in the U.S.
where stocks plunged from the opening bell. The Dow Jones Industrial Average posted its worst point drop since
the financial crisis in December 2008 falling 512.76 points or 4.31 to 11383.68. Oil and other commodities were also
hammered. Even gold was a safe haven no more as prices fell. Asian markets slid on Friday morning with benchmark
indexes in Tokyo Australia South Korea and Hong Kong all falling more than 3 by midday. It was an absolute
bloodbath said John Richards head of strategy at RBS Global Banking Markets. There was no one single catalyst
for the downdraft traders said. Rather it reflected multiple concerns that have mounted over the past month and
came to a head this week. Worries about a U.S.
2016-06-27 (-3.70): EU Tumult Ripples Through Markets — Europe’s battered lenders face new risks from investors
and an uncertain economy
Just a few years ago Europes banks managed to stagger out of crisis brought on by the Continents debt woes.
Britains looming exit from the European Union analysts and investors fear could push them back in. A wide swath
of European financial institutions are at risk Hobbled behemoths like Deutsche Bank AG and Credit Suisse Group
AG that are limping through difficult turnarounds clusters of regional banks pressured by negative interest rates and
banks across Europes weak periphery that are reeling under piles of bad loans. Still wounded from the eurozone debt
crisis European banks need investor confidence and steady economic growth to prosper. Brexit risks both. Perhaps
most acutely Britains breakaway calls into question the durability of the European Union and the euro. All of a
sudden the prospects of Europes political framework disintegrating at its core considered farfetched just days ago
have edged up.
59