Skip to main content
Log in

The information content of mandatory risk factor disclosures in corporate filings

  • Published:
Review of Accounting Studies Aims and scope Submit manuscript


Beginning in 2005, the Securities and Exchange Commission (SEC) mandated firms to include a “risk factor” section in their Form 10-K to discuss “the most significant factors that make the company speculative or risky.” In this study, we examine the information content of this newly created section and offer two main results. First, we find that firms facing greater risk disclose more risk factors, and that the type of risk the firm faces determines whether it devotes a greater portion of its disclosures towards describing that risk type. That is, managers provide risk factor disclosures that meaningfully reflect the risks they face. Second, we find that the information conveyed by risk factor disclosures is reflected in systematic risk, idiosyncratic risk, information asymmetry, and firm value. Overall, our evidence supports the SEC’s decision to mandate risk factor disclosures, as the disclosures appear to be firm-specific and useful to investors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others


  1. Some firms voluntarily provide risk disclosures in MD&A if they also provide forward-looking statements about future performance. That is not our focus. We focus on the newly created risk factor disclosure section because it is mandatory for all firms, and throughout all of our tests, we explicitly control for MD&A risk disclosures (and their risk related keywords). We acknowledge that some firms may have moved their voluntary risk disclosures from MD&A to the risk factors disclosure section after it was mandated. Thus, for these firms, the newly created section may not provide completely new information. Nevertheless, our tests are designed to incorporate investors’ expectations of risk disclosure and suggest that the newly created disclosures are informative.

  2. We acknowledge that it is possible for firms to provide risk factor disclosure that describes a negative/pessimistic event that is actually less negative/pessimistic than the market expects it to be. Thus, for our market tests, we explicitly control for investors’ expectations of risk factor disclosure. See Sects. 3.1 and 3.3.

  3. Appendices 1 and 2 describe our text analysis procedures and our methods for classifying key words into the five risk subcategories. We first classify key words into financial, tax, and legal risk subcategories. With the remaining words, we classify them as “other-systematic” if they relate to economy-wide risk and “other-idiosyncratic” if they relate to firm-specific risk. As shown in Table 2, 69 % of keywords in the average risk factor disclosure are comprised of words that fall into the “other” categories.

  4. Cornwell et al. v. Credit Suisse Group et al., No. 08 Civ. 3758, 2010 U.S. Dist.

  5. We acknowledge that short-window returns at the 10-K release date could be a function of changes in either (1) firms’ expected future cash flows, or (2) the assessment of firm risk. We interpret our results as being related to firm risk. For more assurance regarding this interpretation, in Table 8 we control for both firms’ earnings surprise and changes in analysts’ estimates of future earnings, as well as other variables in prior literature that could indicate a change in firms’ cash flows. Our tests suggest that the inferences with respect to abnormal returns reflect a change in investors’ perception of firm risk.

  6. In our main tests, we only use qualitative disclosure information from Item 1A “Risk Factors” and Item 7 “Management’s Discussion & Analysis of Financial Condition and Results of Operation” (MD&A). In sects. 4.1 and 4.3, we also control for disclosures in “Quantitative and Qualitative Disclosures about Market Risk” (Item 7A) and our results are unaffected.

  7. Alternative methods to measure changes in textual risk disclosures include the rate of change in the frequency of specific words used within text or the frequency of word groups within a sentence (Brown and Tucker 2011; Nelson and Pritchard 2007). We choose an expectations model as our proxy for changes in risk factor disclosure due to its relative empirical simplicity and since it explains approximately 80 % of the variation in risk factor disclosures (see Table 9).

  8. We acknowledge that there is no consensus in the literature as to which key word list is most appropriate. There are two other relevant studies that report their list of key risk words. Nelson and Pritchard identify 75 risk factor terms, and Kravet and Muslu (2013) identify 20 risk-related keywords based on their reading of 100 randomly selected annual reports. Our key word list includes the vast majority of these identified key words but is expanded considerably using the Latent Dirichlet Allocation method described in the paper.

  9. Specifically, the trading industry includes the following firm descriptions: “security and commodity brokers,” “closed-end management investments,” “trusts,” and “unit investment trusts.”

  10. Throughout the analysis, we evaluate the effects of multicollinearity with variance inflation factors (VIFs). In their textbook, Kutner et al. (2004) indicate that multicollinearity is not a problem when VIFs are less than 10. The results indicate that multicollinearity is not a serious concern in any of our multivariate regressions. Thus, for expositional purposes, we do not tabulate or discuss these results for each model.

  11. Since each left-hand side variable is specified as a percentage of the total key words in the risk factor section, and since the sample (and dependent variables) are the same across all regressions, the coefficients for a particular risk proxy are comparable across all of the regressions and indicate the percentage increase in total key words resulting from that risk proxy.

  12. We acknowledge that we do not explicitly control for the MD&A tone. As noted by Kothari et al. (2009), this is not easy to do as it requires software-reading technologies that are not particularly accurate. However, we follow prior literature that assesses the tone of MD&A (Tetlock 2007; Kravet and Muslu 2013) and count the number of words that relate to risk, assuming that the context of these words is negative/pessimistic (i.e., our variables MDA_DISC, MDA_SYS, MDA_IDIO, MDA_FIN, MDA_LIT, and MDA_TAX).

  13. The fog index is defined by Li (2008b) as (words per sentence + percent of complex words) * 0.4.

  14. As before, we also add financial risk to each of these measures since prior literature shows that financial risk affects both systematic and idiosyncratic risk. Similarly, we do not include legal or tax risk in either of these categories since it is difficult to determine whether these risks are firm-specific, and prior literature does not provide much guidance in this respect.

  15. For comparison purposes, Kothari et al. (2009) examine the effect of negative/pessimistic disclosure across three sources of disclosure (corporations, analysts, business press), and their results suggest that moving from the 25th percentile to the 75th percentile increases firms’ cost of capital by 2.0%. However, as previously mentioned, they find no such relation when the source of the disclosure is the corporation itself.

  16. To ensure that the removal of plain text filings does not bias our sample, in Sect. 3, we compare our sample to the overall universe of Compustat firms. We find that our sample is generalizable across industries and years. In addition, we include industry and year fixed effects in all of our multivariate analyses.

  17. We consider whether our final sample is biased as a result of this sample size reduction by comparing our sample to the Compustat universe of firms in Sect. 3 and in Table 1.


  • Amihud, Y., & Mendelson, H. (1986). Asset pricing and the bid-ask spread. Journal of Financial Economics, 17, 223–249.

    Article  Google Scholar 

  • Ball, R., & Brown, R. (1968). An empirical evaluation of accounting income numbers. Journal of Accounting Research, 6, 159–178.

    Article  Google Scholar 

  • Ball, R., & Kothari, S. P. (1989). Nonstationary expected returns: Implications for tests of market efficiency and serial correlation in returns. Journal of Financial Economics, 25, 51–74.

    Article  Google Scholar 

  • Barry, C., & Brown, S. (1985). Differential information and security market equilibrium. Journal of Financial and Quantitative Analysis, 20, 407–422.

    Article  Google Scholar 

  • Beaver, W. (1968). The information content of annual earnings announcements. Journal of Accounting Research, 6, 67–92.

    Article  Google Scholar 

  • Bernard, V., & Stober, T. (1989). The nature and amount of information in cash flows and accruals. The Accounting Review, 64, 624–652.

    Google Scholar 

  • Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    Google Scholar 

  • Botosan, C. (1997). Disclosure level and the cost of capital. The Accounting Review, 72, 323–350.

    Google Scholar 

  • Botosan, C., & Plumlee, M. (2002). A re-examination of disclosure level and the expected cost of equity capital. Journal of Accounting Research, 40, 21–41.

    Article  Google Scholar 

  • Brown, S., & Tucker, J. (2011). Large sample evidence on firms’ year-over-year MD&A modifications. Journal of Accounting Research, 49, 309–346.

    Article  Google Scholar 

  • Bryan, S. (1997). Incremental information content of required disclosures contained in management discussion and analysis. The Accounting Review, 72, 285–301.

    Google Scholar 

  • CFO (2010). SEC pushes companies for more risk information. CFO Magazine, August 2, 2010.

  • Chan, K. (1988). On the contrarian investment strategy. Journal of Business, 61, 147–163.

    Article  Google Scholar 

  • Clarkson, P., Kao, J., & Richardson, G. (1999). Evidence that management discussion and analysis (MD&A) is a part of a firm’s overall disclosure package. Contemporary Accounting Research, 16, 111–134.

    Article  Google Scholar 

  • Corporate Counsel (2006). MD&A Risk Factors (Nelson Rocks Preserve-Style. By Broc Romanek and Dave Lynn., November 30, 2006.

  • Demsetz, H. (1986). Corporate control, insider trading, and rates of return. American Economic Review, 76, 313–316.

    Google Scholar 

  • Diamond, D., & Verrecchia, R. (1991). Disclosure, liquidity, and the cost of capital. Journal of Finance, 66, 1325–1355.

    Article  Google Scholar 

  • Drake, M., Roulstone, D., Thornock, J. (2011). The demand for mandatory disclosure: evidence from investors’ use of the EDGAR database. Working paper, Ohio State University.

  • Easley, D., & O’Hara, M. (2004). Information and the cost of capital. Journal of Finance, 59, 1553–1583.

    Article  Google Scholar 

  • Fama, E., & French, K. (1993). Common risk factors in the returns on stock and bonds. Journal of Financial Economics, 33, 3–56.

    Article  Google Scholar 

  • Fields, T., Lys, T., & Vincent, L. (2001). Empirical research on accounting choice. Journal of Accounting and Economics, 31, 255–307.

    Article  Google Scholar 

  • Guedhami, O., & Pittman, J. (2008). The importance of IRS monitoring to debt pricing in private firms. Journal of Financial Economics, 90, 38–58.

    Article  Google Scholar 

  • Hamada, R. S. (1972). The effect of the firm’s capital structure on the systematic risk of common stocks. Journal of Finance, 27, 435–452.

    Article  Google Scholar 

  • Han, B., Jennings, R., & Noel, J. (1992). Communication of nonearnings information at the financial statement release date. Journal of Accounting and Economics, 15, 63–86.

    Article  Google Scholar 

  • Healy, P., & Palepu, K. (2001). Information asymmetry, corporate disclosure, and the capital markets: A review of the empirical disclosure literature. Journal of Accounting and Economics, 31, 405–440.

    Article  Google Scholar 

  • Jayaraman, S. (2008). Earnings volatility, cash flow volatility and informed trading. The Accounting Review, 46, 809–851.

    Google Scholar 

  • Jiang, G., Lee, C., & Zhang, Y. (2005). Information uncertainty and expected returns. Review of Accounting Studies, 10, 185–221.

    Article  Google Scholar 

  • Jones, C., & Weingram, S. (1996). The determinants of 10b-5 litigation risk. Stanford Law School: Working paper.

    Google Scholar 

  • Ke, B., Huddart, S., & Petroni, K. (2003). What insiders know about future earnings and how they use it: Evidence from insider trades. Journal of Accounting and Economics, 35, 315–346.

    Article  Google Scholar 

  • Khan, M., & Watts, R. (2009). Estimation and empirical properties of a firm-year measure of accounting conservatism. Journal of Accounting and Economics, 48, 132–150.

    Article  Google Scholar 

  • Kim, O., & Verrecchia, R. (1994). Market liquidity and volume around earnings announcements. Journal of Accounting and Economics, 17, 41–68.

    Article  Google Scholar 

  • Klein, R., & Bawa, V. (1976). The effect of estimation risk on optimal portfolio choice. Journal of Financial Economics, 3, 215–231.

    Article  Google Scholar 

  • Kothari, S. P. (2001). Capital markets research in accounting. Journal of Accounting and Economics, 31, 105–231.

    Article  Google Scholar 

  • Kothari, S. P., Li, X., & Short, J. (2009a). The effect of disclosures by management, analysts, and business press on cost of capital, return volatility, and analyst forecasts: a study using content analysis. The Accounting Review, 84, 1639–1670.

    Article  Google Scholar 

  • Kothari, S. P., Shu, S., & Wysocki, P. (2009b). Do managers withhold bad news? Journal of Accounting Research, 47, 241–276.

    Article  Google Scholar 

  • Kravet, T., & Muslu, V. (2013). Textual risk disclosures and investors’ risk perceptions. Review of Accounting Studies. doi:10.1007/s11142-013-9228-9.

  • Kutner, M., Nachsheim, C., & Neter, J. (2004). Applied linear statistical models (5th ed.). New York, NY: McGraw-Hill Irwin.

    Google Scholar 

  • Kyle, A. (1985). Continuous auctions and insider trading. Econometrica, 6, 1315–1335.

    Article  Google Scholar 

  • Lambert, R., Leuz, C., & Verrecchia, R. (2007). Accounting information, disclosure, and the cost of capital. Journal of Accounting Research, 45, 385–420.

    Article  Google Scholar 

  • Lambert, R., Leuz, C., & Verrecchia, R. (2012). Information asymmetry, information precision, and the cost of capital. Review of Finance, 16, 1–29.

    Article  Google Scholar 

  • Lang, M., Lins, K., & Maffett, M. (2012). Transparency, liquidity, and valuation: International evidence on when transparency matters most. Journal of Accounting Research, 50, 729–774.

    Article  Google Scholar 

  • Lang, M., & Lundholm, R. (1996). Corporate disclosure policy and analyst behavior. The Accounting Review, 71, 467–492.

    Google Scholar 

  • Lehavy, R., Li, F., & Merkley, K. (2011). The effect of annual report readability on analyst following and the properties of their earnings forecasts. The Accounting Review, 86, 1087–1115.

    Article  Google Scholar 

  • Lennox, C. (1999). Audit quality and auditor size: An evaluation of reputation and deep pockets hypotheses. Journal of Business, Finance, and Accounting, 26, 779–805.

    Article  Google Scholar 

  • Leuz, C., & Verrecchia, R. (2000). The economic consequences of increased disclosure. Journal of Accounting Research, 38, 91–124.

    Article  Google Scholar 

  • Li, F. (2008a). Do stock market investors understand the risk sentiment of corporate annual reports? Working paper, University of Michigan.

  • Li, F. (2008b). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45, 221–247.

    Article  Google Scholar 

  • Li, F. (2010). The information content of forward-looking statements in corporate filings: a naïve Baysian machine learning approach. Journal of Accounting Research, 48, 1049–1102.

    Article  Google Scholar 

  • Linsmeier, T., Thornton, D., Venkatachalam, M., & Welker, M. (2002). The effect of mandated market risk disclosures on trading volume sensitivity to interest rate, exchange rate, and commodity price movements. The Accounting Review, 77, 343–377.

    Article  Google Scholar 

  • Mansi, S., Maxwell, W., & Miller, D. (2004). Does auditor quality and tenure matter to investors? Evidence from the bond market. Journal of Accounting Research, 42, 755–793.

    Article  Google Scholar 

  • Modigliani, F., & Miller, M. (1958). The cost of capital, corporation finance and the theory of investment. American Economic Review, 48, 261–297.

    Google Scholar 

  • Nelson, K., & Pritchard, A. C. (2007). Litigation risk and voluntary disclosure: The use of meaningful cautionary language. Working paper, Rice University.

  • Reuters (2005). “Refco risks boiler-plate disclosure.” By Scott Malone. Friday, October 21, 2005.

  • Roulstone, D. (1999). Effect of SEC financial reporting release No. 48 on derivative and market risk disclosures. Accounting Horizons, 13, 343–363.

    Article  Google Scholar 

  • Schrand, C. (1997). The association between stock-price interest rate sensitivity and disclosures about derivative instruments. The Accounting Review, 72, 87–109.

    Google Scholar 

  • Scott, T. (1994). Incentives and disincentives for financial disclosure: Voluntary disclosure of defined benefit pension plan information by Canadian Firms. The Accounting Review, 69, 26–43.

    Google Scholar 

  • SEC (2005). Securities and exchange commission final rule, release no. 33–8591 (FR-75).

  • SEC (2009). Remington Arms Company, Inc. Form 10-K Filing for the year ended December 31, 2009, SEC EDGAR database.

  • SEC (2010). Form 10-K instructions.

  • Skinner, D. (1994). Why firms voluntarily disclose bad news. Journal of Accounting Research, 32, 38–60.

    Article  Google Scholar 

  • Spindler, J. (2006). Is it time to wind up the Securities Act of 1933? Regulation, Winter 2006–2007.

  • Stoll, H. (2000). Friction. Journal of Finance, 55, 1479–1514.

    Article  Google Scholar 

  • Tetlock, P. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62, 1139–1168.

    Article  Google Scholar 

  • Tufano, P. (1998). Who manages risk? An empirical examination of risk management practices in the gold mining industry. Journal of Finance, 51, 1097–1137.

    Article  Google Scholar 

  • Vuolteenaho, T. (2002). What drives firm-level stock returns? Journal of Finance, 57, 233–264.

    Article  Google Scholar 

  • Wahlen, J. (1994). The nature of information in commercial bank loan loss disclosures. The Accounting Review, 69, 455–478.

    Google Scholar 

  • Watts, R. (1977). Corporate financial statements: Product of market and political processes. Australian Journal of Management, 2, 53–75.

    Article  Google Scholar 

  • Watts, R., & Zimmerman, J. (1986). Positive accounting theory. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Wong, M. H. F. (2001). The association between SFAS No. 119 derivatives disclosures and the foreign exchange risk exposure of manufacturing firms. Journal of Accounting Research, 38, 387–417.

    Article  Google Scholar 

  • You, H., & Zhang, X. (2009). Financial reporting complexity and investor underreaction to 10-K information. Review of Accounting Studies, 14, 559–586.

    Article  Google Scholar 

Download references


We are grateful for the helpful comments and suggestions of an anonymous reviewer, Kris Allee, Terry Baker, Liz Chuk, James Cotter, Jon Duchac, Fabio Gaertner, Ronen Gal-Or, Andrea Kelton, Feng Li, Russell Lundholm, Bill Marcum, Dale Martin, Rick Mergenthaler, Karen Nelson, Deon Strickland, Jake Thornock, the doctoral students at the University of Arizona, workshop participants at Wake Forest University, and conference participants at the 2011 AAA Annual Meeting and the 22nd Annual Conference on Financial Economics and Accounting (FEA) at Indiana University. Finally, we thank both the Securities and Exchange Commission (SEC) and the Institute of Chartered Accountants in England and Wales (ICAEW) for their interest in this paper. Portions of this paper were included in the ICAEW’s October 2011 report titled “Reporting Business Risks: Meeting Expectations.” Professor Lu gratefully acknowledges assistance from the National Science Council of Taiwan (grant number NSC100-2410-H-002-025-MY3).

Author information

Authors and Affiliations


Corresponding author

Correspondence to John L. Campbell.


Appendix 1: Summary of textual analysis procedures to generate risk factor disclosure measures

1.1 Data collection technique

In this appendix, we discuss the textual analysis procedures employed in the collection of risk disclosures from annual filings in the SEC’s Electronic Data Gathering and Retrieval (EDGAR) database (for a thorough treatment on who uses the EDGAR database to obtain information, and the most commonly retrieved forms, see Drake et al. 2011). Figure 1 illustrates the specific sequence of steps used in our textual analysis. As presented in Fig. 1, annual filings are downloaded and processed to generate appropriate counting measures that can objectively quantify firms’ risk disclosure. The rest of this section presents our system design in detail.

1.2 File collection

Our system starts by collecting all relevant 10-K filings and storing them in a relational database. Since subsequent analysis and processing may require several runs of prototyping and testing, we create our own repository in order to increase the efficiency of subsequent activities. EDGAR provides convenient filing download mechanisms based on the File Transfer Protocol (FTP). The downloader first retrieves all index files beginning in 2005. All 10-K filings are downloaded according to the index files. The procedure ensures that all annual filings that were uploaded to the EDGAR database between January 1, 2005 and December 31, 2009 are collected. In total, 44,998 10-K filings were downloaded and stored in our repository.

1.3 File preprocessing

The goal of file preprocessing is to extract important items, including risk factors, from individual filings for subsequent text analysis. The EDGAR system requires firms to upload their Form 10-K reports in either (1) plain text format or (2) HTML format. Our software is unable to reliably scan plain text filings, so we delete these filings from our sample. These filings account for 21 % of the population.Footnote 16 On the other hand, our software is able to reliably scan HTML filings, and we use an automatic procedure to extract text from the following subsections of the Form 10-K: risk factors (Form 10-K, Item 1A), MD&A (Item 7), and market risk disclosures (Item 7A).

Our item extraction procedure is based on the assumption that HTML filings contain visual clues for human readers to recognize item boundaries easily. These visual clues include the use of subsection titles, boldface fonts, extra spacing, and so on. Most filings also use standard item headings that start with “Item,” followed by an item number and a description (e.g., “ITEM 1A. RISK FACTORS”). The HTML format, however, allows visually the same display to be achieved using different tags. One example is that < b > and < strong > have the same visual effect in most browsers. The flexibility in composing HTML filings creates a technical challenge when designing the item extraction procedure. Our design overcomes this challenge by first converting an HTML file into an intermediate representation that combines HTML tags that have similar effects for human readers when deciding item boundaries. A list of candidate item heading locations than can be identified using the intermediate representation. Finally, items are extracted based on the locations of candidate heading locations. We describe the three steps in details below.

In the first step, the input HTML filing is parsed into a tree structure, where the leaf nodes are text segments of the content and the internal nodes are HTML tags such as “b,” “title,” or “li.” The tree structure allows us to associate the characteristics of a text segment by traveling upward and inspecting the parent nodes.

To facilitate subsequent extraction, we further convert the parse tree into a flat structure by traversing the tree and calculating two scores for each text segment: (1) an emphasizing score and (2) a segmentation score. Both of these scores are important to ensure that we have fully extracted all of the text within a Form 10-K “Item” or subsection. The emphasizing score is designed to determine the prominence of the text within the disclosure section. To compute the emphasizing score, we examine the set of HTML tags surrounding the text. Examples include the “strong” tag and the “underline” style within a “div” tag. A complete list of emphasizing tags can be found in Appendix 2. Emphasizing scores are computed for each text segment according to the number of emphasizing tags in its parent nodes. A positive emphasizing score indicates that the text segment is visually more prominent and may be the heading of an item.

The segmentation score is designed to indicate whether and how the text is visually broken down into sections. HTML tags such as “<br>” and “<p>” are two common tags used to separate text. By tracking the number of “separation tags” in the parent nodes when traversing the parse tree, we can detect the locations where text segments are visually separated. The segmentation score is the number of increased or decreased “separation tags.”

The second step is to construct candidate item heading locations using the flat structure. We assume that the item headings are visually more prominent (emphasizing score > 0) with a text segment that “looks like” item headings (using regular expression “(Item\s +\d\D?\.?),” case ignored). The output of this step is a list of locations that may be the beginning of an item.

The last step is to extract items of interest using the list of candidate item heading locations. To extract the risk factor subsection (Item 1A), we scan the text around the candidate item heading locations for the keyword “risk factors.” The text from a matching item heading location until the beginning of the following candidate headings locations are assigned to the risk factor subsection. Other items of interests are processed in the same manner.

Extracted items go through consistency checks before recording back to the database. The assumption is that Item 1A (Risk Factors) should precede Item 7 and Item 7A. Item heading locations that do not match the ordering are rejected. We conduct performance analysis to ensure the quality of the item extraction procedure. Two aspects are of particular interest. The first aspect is the proportion of HTML filings that can be extracted. The proportion is referred to as the coverage of selected items. The following table summarizes the coverage of our procedure:

Fiscal year

10 K Filings

HTML filing

Extracted Item 1A

Extracted Item 1A and Item 7

Extracted Item 1A, Item 7, and Item 7A

































80 %

67 %

65 %

Among all the filings collected, 34,491 filings are from fiscal years 2005 to 2008. Eighty-three percent (28,797) of them are HTML filings. Our procedure is able to extract 80 % of the risk factor subsections from the HTML filings. If we restrict to the subset that have successfully extracted both Item 1A and Item 7, then the number drops to 67 %. Our procedure can extract Item 1A, Item 7, and Item 7A from 65 % of HTML filings

The second performance aspect of importance is the precision of the extracted items. We visually inspect 300 random filings from the subset that Item 1A was extracted using the procedure. We were unable to extract nine of these filings.Footnote 17 Among the remaining 291, all of them correctly contain only the appropriate subsection. Thus, the precision of Item 1A extraction is 100 %. Similarly, our software was unable to extract Item 7 from 47 firms in this subset. Among the remaining 253, 249 of them were extracted correctly. Thus, the precision for Item 7 is 98 % (249/253). Finally, our software extracted Item 7A from 281 of the 300 firms in this subset and five extracted items contain other subsections, for a precision of 98 % (281/286)

Overall, the results show that our software extracted the risk factors from 80 % of HTML filings. Moreover, for over 98 % of the extracted items, the right subsection, and only the right subsection was extracted. Among the small number of incorrectly extracted subsections, most of them contain the text from the target item, with small chunks of text from other subsections

1.4 Text quantification using risk keywords

The extracted items need to be quantified in order to be included in our empirical models. We first count the total words within a subsection and label it as word count (ALL_WORDS). Then, we identify key words using the predefined dictionary from Appendix 3. We developed our list of key words in three steps. First, we begin with key risk words used by prior literature (Nelson and Pritchard 2007). Second, we add additional key words to the list that, based on our review of risk factor disclosures, appear to be common across firms. Third, we classify the list of key words as relating to financial, litigation, tax, other-systematic, or other-idiosyncratic risk subcategories as described in Sect. 3.1 of the text. To enhance the coverage of our key words, we reviewed important terms identified by a document clustering approach, known as the Latent Dirichlet Allocation (Blei et al. 2003). By inspecting important terms in the same cluster, keywords that were previously missing were included.

Finally, the text quantification module computes the frequency of each term and aggregates the number of key words according to the risk subcategory (KEY_WORDS). To increase the precision of the key word matching process, the text quantification module allows term matching to be case sensitive. This constraint is especially useful for acronyms such as IRS (i.e., Internal Revenue Service) and EU (i.e., European Union). If there are variations of terms that need to be matched, these variations are explicitly specified in the keyword list. For example, the criterion “(lease|leases|leasing)” is used to match the variation of “lease.” By performing these techniques, we compute three measures for each extracted subsection of the Form 10-K: (1) total word count (ALLWORDS), (2) total key word count (KEYWORDS), and (3) key word count (KEYWORDS) by risk subcategory.

Appendix 2

HTML Tags Used to Identify the Text from Applicable Subsections of Form 10-K.

This appendix provides the specific HTML tags that were used to determine the “emphasizing score” and the “separation score.” These tags are used to help us fully extract on those subsections of the Form 10-K that are used in our analysis. For example, the emphasizing tags help to identify when text is bolded or underlined, and the separation tags help to identify when text is segmented by paragraph breaks or section headings. This procedure is fully explained in Appendix 1, Textual Analysis Data Collection Process.

Emphasizing tags

Separation tags






































style = (bold|underline)




style = (bold|underline)




style = (bold|underline)















Appendix 3

See Table 9.

Table 9 Key words list by risk category

Rights and permissions

Reprints and permissions

About this article

Cite this article

Campbell, J.L., Chen, H., Dhaliwal, D.S. et al. The information content of mandatory risk factor disclosures in corporate filings. Rev Account Stud 19, 396–455 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI:


JEL Classification