Introduction

Regulations have drawn increasing attention from policymakers in recent years. The World Trade Organization (WTO) identifies regulatory barriers as the most salient obstacle to economic globalization in the past decade.Footnote 1 Researchers show that regulations have become a powerful tool wielded by multinational firms to protect their special interests (Büthe and Mattli 2011; Carpenter and Moss 2013; Gulotty 2020; Perlman 2019; Kennard 2020). Nonetheless, due to the complexity of regulatory regimes, understanding the causes and effects of regulatory barriers often turn out to be very challenging. The obstacle is especially pronounced when scholars attempt to study regulatory barriers systematically. Regulations vary considerably by country. It is generally infeasible to assess how stringent regulations are by studying their texts, as understanding texts of regulations requires in-depth knowledge of the operation of an industry in a specific country. To make things worse, the distributional effects of regulations are often so subtle that researchers cannot tell who the winners and losers are by only reading the texts. For example, rules on automobile parts in a country are applied to all auto manufacturers, domestic or foreign. Yet, the impact of such regulations is far from uniform across auto manufacturers. To a large extent, the difficulty of identifying and measuring regulatory barriers has limited the research progress in that area. In this paper, I propose a novel approach to measuring regulatory barriers at the country-year level. Leveraging on the information included in the annual reports of U.S. public companies, I build an original database of observed regulatory barriers and use it to estimate the barrier level of each country with a dynamic two-level item response theory model. This new approach helps us reveal new empirical patterns that are previously unobservable to scholars.

My main source of information comes from the United States Securities and Exchange Commission (SEC). SEC requires all U.S. public firms to disclose information about their operation in their annual reports (i.e., 10-K forms). The annual reports submitted to SEC are much more detailed than the annual reports published to share information with the firms’ shareholdersFootnote 2. Specifically, federal laws require every public firm to list all major factors that may adversely impact their performance in their 10-K forms. For that reason, companies that are subject to regulatory barriers are required to report their encountered difficulties. This valuable information provides us with the knowledge that we otherwise would not know: whether a firm is adversely affected by specific regulations. As explained by Kennard (2020), regulations do not evenly affect all firms in an industry; instead, it is usually the case that rules can tilt the playground in favor of some firms but against others. It is almost impossible for scholars to tell how the “playground” is tilted without acquiring any information from the “players” (i.e., firms). Thus, the information on the adverse impacts of regulations included in the firms’ 10-K forms can be valuable for researchers.

To quantify the level of regulatory barriers for each country, I first convert the information in these reports into a well-structured dataset using text processing techniques. Then, I build a dynamic item response theory model to estimate the level of regulatory barriers at the country-year level. I believe that this new approach contributes to our existing knowledge in the following aspects:

  1. 1.

    Compared with traditional survey-based measurement, my approach can be easily extended to include more countries for a longer time period. More importantly, it can estimate the entry-deterrence effects of regulations under reasonable assumptions. Gulotty (2020) argues that one of the major effects of regulations is entry-deterrence: large firms advocate more stringent industry standards to forestall new firms from entering by raising the fixed cost of operation. For example, increasing the quality standard of a product will increase the cost of production, which can eliminate small producers and deter other small firms from entering the market. However, existing survey-based measurements only survey firms that are active in the market and hence cannot estimate the entry-deterrence effects. In addition, due to the cost of running surveys, the coverage of existing survey measurements is far from ideal. For example, the NTM Business Surveys run by the International Trade Commission only cover around 25 countries which do not include large countries such as the United States, China, Japan, Germany, South Korea, India, and many others.Footnote 3

  2. 2.

    The proposed approach is more micro-founded than data collected by international organizations, such as the Special Trade Concerns (STCs) Report data compiled by WTO. Members/Observers of WTO can raise STCs against other countries if they find laws and regulations in other countries discriminatory. Scholars use whether a country is subject to STCs to measure whether the country imposes any regulatory barriers. However, it is well documented that the behaviors of countries in WTO are shaped by political concerns (Davis 2012). The STCs records are very likely to be outcomes of both domestic regulations and international politics. It is then problematic if scholars of political science use it to study countries’ political behaviors. However, the firm-level information in the annual reports is less susceptible to international and domestic political factors. For that reason, my proposed measurement should be better suited for studies in political science.

  3. 3.

    Existing work has shown promising progress in estimating non-tariff barriers to trade (e.g., Cooley (2019); Martini (2020)). The proposed approach complements their contributions by adding information about barriers to foreign direct investment and operations of foreign subsidiaries.

My research directly addresses the recent political and economic shifts in Asia and beyond. Ever since the initiation of the U.S.-China trade war in 2018, there has been a notable surge in governmental discrimination against foreign companies. The U.S. government has consistently accused the Chinese authorities of imposing unjust regulatory measures on American firms, a claim that the Chinese government vehemently denies. However, assessing the extent of discrimination by the Chinese government against foreign firms is a complex task, especially given the diverse range of foreign entities operating in China. Relying solely on reports from a selective sample of companies may introduce bias, hindering a comprehensive understanding of China’s foreign investment landscape. This paper aims to fill this gap and offer crucial insights into this matter. Specifically, the proposed measurement seeks to quantify how U.S. companies perceive their treatment by the Chinese government and compares this treatment across major global economies. In my analysis, I did not find systemic evidence supporting the notion that Chinese regulatory regimes consistently bias against U.S. firms when comparing China with other countries. The reported levels of regulatory barriers faced by U.S. companies in China appear to be on par with those encountered in major European economies such as Germany and France. This evidence significantly contributes to our comprehension of the U.S.-China relationship, paving the way for the development of new frameworks in this evolving era of international relations.

Lastly, this paper does not aim to provide a theoretical argument or answer a causal question. However, I humbly believe that my attempt to construct a informative measurement servers as the foundation for future theory-building or causal identification. As I have mentioned previously, a significant obstacle if not the single most damaging obstacle for research in regulatory politics is the lack of high-quality data. This exercise, despite being flawed, showcases a potential pathway for future research. My hope is not for this single paper to perfectly solve a long-lasting academic problem, however, I wish that it can inspire more innovative work to tackle this problem.

Information in 10-K forms

The U.S. federal securities laws require public companies to disclose information on an ongoing basis. All U.S. public firms must submit annual reports, a.k.a. form 10-K, which provides a comprehensive overview of the company’s business and financial condition. In the 10-K forms, a company must offer a detailed description of its main products or services, major subsidiaries, relevant regulations, major competition, and any possible risks associated with its business. Therefore, the 10-K forms contain information on a comprehensive set of regulatory barriers observed by international business practitioners.

To better illustrate the type of information included in 10-K forms, I will present some examples here. Many firms report that they have encountered restrictive laws or regulations in certain countries:

  1. 1.

    A wholesale drug company Nu Skin Enterprises reports in their 2006 10-K form: “laws and regulations in Japan, Korea and China are particularly restrictive and difficult.”

  2. 2.

    A farm machinery producer Deere & Co reports that: “recent industry and regulatory changes have negatively impacted John Deere’s competitive position in the potential high growth Russian markets during the fiscal year.”

  3. 3.

    An ophthalmic goods producer Cooper Companies INC claims that: “we have difficulty gaining market share in countries such as Japan because of regulatory restrictions and customer preferences.”

  4. 4.

    A medical equipment producer Immucor Inc states that: “in addition to the U.S., Europe, Canada and Japan, there are multiple countries worldwide that also impose regulatory barrier to market entry.”

  5. 5.

    An insurance company Gerova Financial Group Ltd claims that “The Chinese and Vietnamese governments have imposed regulations in various industries, including the leisure and hospitality and financial services industries, that would limit foreign investors equity ownership or prohibit foreign investments altogether in companies that operate in such industries.”

  6. 6.

    A software company Versant Corp reports that they are faced with “burdens of complying with a variety of foreign laws, including more protective employment laws affecting our sizable workforce in Germany”.

  7. 7.

    A technology company Kenexa Corp reports their concerns about the intellectual property issues: “Further, the laws of some countries, and in particular India, where we develop much of our intellectual property, do not protect proprietary rights to the same extent as the laws of the United States.”

  8. 8.

    A technology company National Instrument Corp reports the difficulties of doing business in Hungary: “In response to significant and frequent changes in the corporate tax law, the unstable political environment, a restrictive labor code, the volatility of the Hungarian forint relative to the U.S. dollar and increasing labor costs, we have doubts as to the long term viability of Hungary as a location for our manufacturing and warehousing operations.”

It is evident from these examples that reporting regulatory barriers in 10-K forms is a common practice among firms. However, one may question whether reporting the existence of regulations is a good indicator of the firm being adversely affected by them. In other words, the presence of regulations does not necessarily imply whether they constitute barriers for firms. To address this concern, I present three more examples to show that the information on regulations included in the annual reports is in fact related to the distributional effects of regulations.

  1. 1.

    In 2005, China passes a regulation that mandates all truck manufacturers to install electrical throttle to reduce emission. This piece of regulation obviously affects all major truck producers serving the Chinese market. Cummins, a U.S. based natural gas engines producer who has joint ventures in China, reports in their 10-K form that “These (earning) increases were partially offset by decreased earnings from DCEC (one of Cummins’ joint venture in China) of $7 million due to reduced demand in China’s truck market in response to regulatory changes.” It is quite evident that the operation of Cummins is harmed by the newly implemented regulation. However, in the same year, another U.S. based automobile parts manufacturers, Williams Control, found the new regulation an opportunity rather than a hurdle. In their 10-K form, they reported: “Increases in off-road volumes in China primarily results from adoption of more stringent emissions standards, which mandate the inclusion of electronic throttle controls on new vehicles, thus allowing us to expand our customer base in this market.” We can immediately tell from this two pieces of information that Cummins is the loser of the new regulation while Williams Control is the winner.

  2. 2.

    Still in 2005, a cosmetic and fragrance producer Inter Parfums reports that the existing regulations in France have little effect on their operation: “our fragrances that are manufactured in France are subject to certain regulatory requirements of the European Union, but as of the date of this report, we have not experienced any material difficulties in complying with such requirements.”

The cases presented above demonstrate what these firms report is closely related to how they are affected by the reported regulations. If a firm finds a law positively impacts their business, they will have strong incentives to report it truthfully since their 10-K forms are made available to all shareholders. Meanwhile, if a firm encounters a harmful regulatory barrier, it may or may not want to share it with the public, but federal laws make it mandatory for them to disclose that piece of information. For these reasons, I argue that firms’ annual reports are both informative and truthful, which makes them ideal sources for studies of regulatory politics.

The truthfulness of 10-K forms

I will further justify the 10-Ks truthfulness in the subsection, as the plausibility of my proposed barrier index crucially depends on the information accuracy.

Since 10-K forms can significantly impact a firm’s stock market performance, one may be suspicious about the authenticity of its information. The scrutiny is especially warranted when we evaluate how firms report their encountered regulatory barrier, due to the inherent ambiguity of regulations. In this section, I would like to establish the reliability of 10-Ks by providing more substantive information on how public companies write these reports and how SEC enforces the disclosure requirements. Hopefully, readers can be more assured about using 10-Ks as a data source.

Public firms have two main concerns when writing the annual reports: 1) they are concerned with the shareholders instituting legal actions against them for financial loss resulting from undisclosed issues, and 2) the punishment from SEC if found guilty of hiding information. For example, In 2013, stockholders sued a company called Dole Foods for failure to disclose positive information. In the end, the judge found the company guilty of unfairly keeping the stock price down.Footnote 4 In this monitoring process, the shareholders serve as a “fire alarm” that forces the public firms to disclose any positive and negative information honestly (McCubbins and Schwartz 1984).

In addition to shareholders’ monitoring efforts, the SEC also actively takes measures to ensure the company’s annual reports’ authenticity. After firms submit their 10-K forms, the SEC staff will review the submissions to monitor and enhance companies’ compliance with the requirements. If the review process finds the disclosed information deficient in explanation or clarity, the SEC staff will provide comments for a company to resolve the issues. Moreover, the SEC has made disclosure of qualitative information (such as the risk factor subsection) a focus of its corporate filing reviews (Campbell et al. 2014; Brown et al. 2018). From 2013 to 2017, the chairman, Mary Jo White emphasized the effectiveness of information disclosure which urged firms to include more relevant qualitative information in their disclosures.

In reality, the SEC also imposes significant punishment on firms that fail to disclose crucial information. Facebook, for example, is expected to pay $100 million for making misleading disclosures regarding their user privacy policy.Footnote 5 The SEC’s complaint alleges that Facebook failed to disclose its risk of a data breach even after it had discovered the misuse of its users’ information in 2015.

The above evidence suggests that the SEC is well aware of the issue of dishonest reporting and has taken a series of actions to enforce the filing requirement. According to a study by Brown et al. (2018), firms improve their filling quality after the SEC issues comments to them during the review process.

For these reasons, public firms usually utilize two strategies to minimize the legal risks: 1) they timely add issues that might adversely affect their operation in the annual reports, and 2) once added, the issues are rarely removed from a future version of the reports. I argue that these two features make the annual reports ideal for measuring regulatory barriers. First, it is very costly for public firms to misreport in the annual reports. Thus, as researchers, we should be confident in the information quality. Second, the reluctance of firms to remove negative issues from the report makes it easier to interpret the reported barrier. Barrier estimates based on the reported barrier information can be viewed as cumulative regulatory barriers instead of the change in regulatory barrier levels. As a result, the 10-K forms can be beneficial for international political economy research.

Text processing

To quantify the information in the annual reports, I need to identify instances where a firm reports being adversely affected by a specific piece of regulation. The text in annual reports is not well-structured: different firms have different reporting formats. In addition, the annual reports often contain several hundred pages of text filled with technical terms. Thus, converting the annual reports to a well-structured dataset is challenging.

In this paper, I adopted the following strategy to create a feasible data processing pipeline, which combined a dictionary-based method with supervised learning:

  1. 1.

    First, I break each report into sentences by using a regular expression.

  2. 2.

    Second, I select sentences using a dictionary of regulation-related words: “regulation, regulator, regulatory, law, standard, quota, approval, policy, intellectual property, requirement, permit, license”. In addition, I filter out sentences that do not contain a country name.

  3. 3.

    Next, I randomly sample 3,846 sentences for human coding, with the help of two research assistants. Each research assistant independently labeled half of the training instances, while I manually validate a random sample of the annotated instances. If the sentence clearly indicates that the firm is adversely affected by a regulation, the sentence is coded as “1” (i.e., positive). Otherwise, the sentence is coded as “0” (i.e., negative).

  4. 4.

    Finally, I train a neural network model to predict the rest of the corpus. If a report contains any sentence the model predicts to be positive, I code the entire document as the firm reporting barriers in that country.

This strategy is a balance between accuracy and feasibility. Annotating annual reports with human coders is almost impossible, as the reports are too long and highly technical. However, the task is greatly simplified if we only code specific sentences instead of the entire report. Admittedly, filtering the document with a dictionary of regulation-related words introduces (unmeasurable) errors. We leave improving the text processing pipelines to future research.

Training a prediction model capable of dealing with complex sentence structures with a limited sample size is the next challenging task. In this paper, I choose a state-of-the-art neural language model, “BERT”, to perform this task. “BERT” is a neural language model pre-trained by scientists at Google (Devlin et al. 2018). It is designed to understand the context of a sentence and predict the appropriate words suitable for the context. The model is trained on a vast corpus that includes the entire Wikipedia (2,500 million words) and BookCorpus (800 million words. As a result, “BERT” is a powerful tool for most text classification tasks.

I train a classifier on my sample of 3,846 sentences using “BERT” as the underlying workhorse model (i.e., fine-tuning). The training sample includes 3,486 sentences, while I leave the other 400 sentences as the test set. After training, the model returns satisfactory prediction accuracy: among the 400 test sentences, the results are:

Table 1 Model Performance: Confusion Matrix

Since the training set has very few positive examples (i.e., only a few sentences contain information on regulatory barriers), monitoring both the false positive and negative rates is crucial. In the sparse prediction task, a model that blindly returns “0” can still achieve seemingly perfect accuracy simply because the number of positive examples accounts for a tiny proportion of the training set. Fortunately, my model performs well in distinguishing positive and negative examples, illustrated by the low false negative rate. As shown in Table 1, both the false positive rate and the false negative rate are less than 15%, demonstrating that the model refrains from blindly assigning “0”.

Using the fine-tuned BERT as the machine reader, I assign 0 or 1 to all sentences in the corpus. If any sentence in an annual report is predicted to be positive, I classify the example as the firm reporting regulatory barriers. Then, I extract the country names by comparing the text with the list of all country names. The final data contains a matrix of I firms and N countries over T years. The problem, however, is to aggregate information from the firm-country-year triad level to the country-year dyad level, as we need to compare regulatory barrier levels both across country and across time. A naive approach is to take the mean reporting level at the country-year level, but such an approach requires strong assumptions that are unsubstantiated by theories. Therefore, I use a widely-accepted latent factor model to estimate regulatory barriers, a similar solution to that used by Hollyer et al. (2014).

The statistical model

Setup

The statistical model aims to assign a scalar-valued index to each country in each year while accounting for major firm and country heterogeneity. I propose to use an item response theory model to estimate this quantity of interest, a popular model often used in estimating ideological positions of legislators (Clinton et al. 2004) but has become increasingly popular in studies of international relations (e.g., Hollyer et al. (2014); Bailey et al. (2017)).

First, I convert the firm reported barrier into a well-structured dataset (Table 2). In a given year t, we have a \(I \times J\) matrix \(\{U_{ijt}\}\) with \(i \in \{1, 2, 3, \cdots , I\}\) and \(j \in \{1, 2, 3, \cdots , J\}\), where I indexes the total number of firms in the sample and J indexes the number of countries:

Table 2 Data Structure

The term \(U_{ijt}\) can take three possible values:

$$\begin{aligned} U_{ijt} =\left\{ \begin{array}{ll} 3 &{} \text {firm}\ i\ \text {does not enter country}\ j\ \text {in year}\ t\\ 2 &{} \text {firm}\ i\ \text {enters country}\ j\ \text {AND reports barrier in year}\ t\\ 1 &{} \text {firm}\ i\ \text {enters country}\ j\ \text {AND does not report barrier in year}\ t \end{array}\right. \end{aligned}$$

I assume that each country j has a regulatory barrier level \(\theta _{jt}\) in year t that is observable to firms but is not to researchers. Each firm-country dyad (ij) in year t has two dyadic characteristics: an entry cutoff \(\alpha ^E_{ij}\) and a reporting cutoff \(\alpha ^R_{ij}\), for which I will provide a more detailed exposition later. The observed firm behavior \(U_{ijt}\) is a function of the country barrier level \(\theta _{jt}\) and the dyad-specific reporting and entry cutoffs. Note that the cutoffs are time-invariant, a strong assumption to avoid the model identification problems.

The intuition behind such a setup is straightforward. Each firm i observes the regulatory barrier level of country j in year t (i.e., \(\theta _{jt}\)) and chooses an action. The reporting and entry cutoffs capture the firms’ tolerance level of a country’s barrier. The concept of tolerance is an abstraction from firms’ real-world calculations, which often include market size, cultural similarity, and geographical distance. Because firms’ decisions are affected by a multitude of factors, I wrap any factors that shape firms’ decisions but are not components of a country’s regulatory barriers into the tolerance terms, to alleviate concerns of omitted variables bias caused by firm and country heterogeneity.

Specifically, if a country’s barrier level is higher than a firm’s entry tolerance cutoff (i.e., \(\alpha ^R_{ij}\)), the firm will not operate in that country; if, on the other hand, the country’s barrier level is below the firms’ entry tolerance cutoff but higher than the reporting tolerance cutoff, the firm will enter the country but report the barrier in their annual reports; lastly, if the country’s barrier level is lower than both the entry and reporting cutoff level, the firm will operate in that country and no barrier reporting will be witnessed in the annual reports. To fix ideas, I will provide a formal explanation of this logic later in this section.

For ease of exposition (and model estimation), I define a random latent firm-country barrier level \(U_{ijt}^*\):

$$\begin{aligned} U^*_{ijt} = \theta _{j,t} + \epsilon _{ijt} \end{aligned}$$

That is, each firm i perceives the barrier level of country j in year t as slightly different. The random disturbance \(\epsilon _{ijt}\) captures any idiosyncrasies at the firm-country-year level. Readers can interpret the errors as factors that are unobservable to researchers but affect firms’ perception of countries’ regulatory barriers. Following the practice in the scaling literature, I assume that the disturbance term follows a standard normal distribution \(N(0,1)\).

Each firm i compares the latent firm-country barrier level \(U_{ijt}^*\) with its entry and reporting cutoffs and chooses an action according to the following decision rule:

  • If country j’s latent barrier level is higher than firm-country (i.j)’s dyadic entry cutoff (i.e., \(U^*_{ijt} > \alpha _{ij}^E\)), firm i will not enter country j (i.e., \(U_{ijt} = 3\)).

  • If country j’s barrier level is lower than (ij)’s entry cutoff but the barrier level is higher than the dyad’s reporting cutoff (i.e., \(\alpha _{ij}^E> U^*_{ijt} > \alpha ^R_{ij}\), firm i will enter country j but report encountering barrier (i.e., \(U_{ijt} = 2\)).

  • If country j’s barrier level is lower than the dyad (ij)’s reporting cutoff (i.e., \(U^*_{ijt} \le \alpha _{ij}^R\)), firm i will enter country j and not report any barrier (i.e., \(U_{ijt} = 1\)).

Note that the implicit assumption is that the entry cutoff is greater than the reporting cutoff (i.e., \(\alpha _{ij}^E > \alpha ^R_{ij}\)).

Since barrier level \(U^*_{i,j,t}\) and \(\theta _{jt}\) is unobservable to researchers, the proposed approach leverages the connection between regulatory barrier and firms’ entry and report behaviors to make inference about the barrier level. It is worth noting that the model will interpret a firm not entering a country as a signal of prohibitive regulatory barriers after taking the firms’ tolerance level into account. This is admittedly the strongest assumption of the model, as it fails to distinguish between a country being unattractive and a country imposing a significant entry barrier. I try to address this concern by only including the most globally active U.S. firms, which I will explain in detail in the next subsection. Nonetheless, readers should be cautious when interpreting the results.

This model can be translated into a statistical model by noting the relationship between the theoretical model and our observed data.

$$\begin{aligned} U_{ijt} = \left\{ \begin{array}{lll} 1 \quad \text {if} \quad U^*_{ijt} \le \alpha _{ij}^R \\ 2 \quad \text {if}\quad \alpha _{ij}^R < U^*_{ijt} \le \alpha _{ij}^E \\ 3 \quad \text {if} \quad U^*_{ijt} > \alpha _{ij}^E \end{array}\right. \end{aligned}$$

Denote the set of \(\{\theta _{j,t}\}\) as \(\Theta\) and the set of \(\{\alpha _{ij}^E\}\) and \(\{\alpha _{ij}^R\}\) as \(\alpha ^E\) and \(\alpha ^R\). Let U denote the observed data and \(U^*\) the augmented data. Then, we can write the full data likelihood with data augmentation as:

$$\begin{aligned} L\left(\Theta , \alpha ^E, \alpha ^R | U, U^*\right) = \prod _{t=1}^T \prod _{j = 1}^J \prod _{i = 1}^I \left[I\left(U_{ijt} = 1, U^*_{ijt} \le \alpha _{ij}^R\right) + I\left(U_{ijt} = 2, \alpha _{ij}^R < U^*_{ijt} \le \alpha _{ij}^E\right) \right.\\ \left.+ I\left(U_{ijt} = 3, U^*_{ijt} > \alpha _{ij}^E\right)\right] \cdot \phi _{\theta _{jt}}\left(U_{ijt}^*\right) \end{aligned}$$

where \(\phi _{\theta _{jt}}(\cdot )\) denotes the probability density function of \(N(\theta _{jt}, 1)\).

The high dimensionality of the model poses a considerable challenge for efficient estimation. Therefore, I propose a Gibbs sampler with a Kalman filter to achieve optimal estimation performance. Existing political science and economics studies have demonstrated the superiority of such an estimation strategy over a generic Gibbs sampler or a Metropolis-Hastings sampler (Martin and Quinn 2002; West and Harrison 2006), as it efficiently utilizes the time series information.

Posterior

Recall that the model aims to calculate the mean barrier level for each country in each year. We can obtain the desired quantity by calculating the mean of each parameter’s posterior distributions.

First, priors distributions are necessary for calculating the posterior distributions,

$$\begin{aligned} \alpha _{ij}^R|\alpha _{ij}^E{} & {} \sim N_{(-\infty , \alpha _{ij}^E)}\left(0,5^2\right) \\ \alpha _{ij}^E|\alpha _{ij}^R{} & {} \sim N_{(\alpha _{ij}^R, \infty )}\left(0,5^2\right) \\ \theta _{j,0}{} & {} \sim N\left(0,5^2\right) \end{aligned}$$

Priors are designed to capture our existing knowledge of these parameters. Since we lack information on the possible values of these parameters, I choose diffuse priors with large variances. However, it is worth noting that the likelihood function has flat regions that can not be distinguished using only the data. Specifically, it is equally likely for the model to assign a large positive number (e.g., 100) or a small negative number (e.g., \(-100\)) to a country’s barrier level, as we did not provide information to the model about whether a positive number signifies a higher barrier level or a negative number does. To overcome this issue, I set semi-informative priors for two countries in the sample: Russia’s barrier level is positive while Singapore’s is negative. It tells the model that a positive number entails a higher barrier level than a negative one, as Russia has a higher level of barriers than Singapore to U.S. firms, according to anecdotal evidence.

Next, using the priors, I sample the augmented data \(U_{ijt}^*\) from its conditional distribution:

$$\begin{aligned} p(U_{ijt}^* | U, \Theta , \alpha ^E, \alpha ^R) = \left\{ \begin{array}{ll} N_{(-\infty , \alpha _{ij}^R]}(\theta _{jt}, 1) \quad &{}\text {if} \quad U_{ijt} = 1 \\ N_{(\alpha _{ij}^R, \alpha _{ij}^E]}(\theta _{jt}, 1) \quad &{}\text {if} \quad U_{ijt} = 2\\ N_{( \alpha _{ij}^E, \infty )}(\theta _{jt}, 1) \quad &{}\text {if} \quad U_{ijt} = 3 \\ \end{array}\right. \end{aligned}$$

the notation \(N_{(a,b)}\) denotes a truncated normal distribution on the support of (ab). This is the standard data augmentation step for models with latent variables (Albert and Chib 2017).

After sampling the latent barrier \(U_{ijt}^*\), I sample the reporting cutoffs \(\alpha _{ij}^R\) from its posterior distribution.

$$\begin{aligned} f\left(\alpha _{ij}^R | U, U^*, \Theta , \alpha ^E\right){} & {} = L\left(\Theta , \alpha ^E, \alpha ^R | U, U^*\right) \cdot p\left(\alpha _{ij}^R | \alpha _{ij}^E\right) \\{} & {} \propto \prod _{t = 1}^T \left[I\left(U_{ijt} = 1, U^*_{ijt} \le \alpha _{ij}^R\right) + I\left(U_{ijt} = 2, \alpha _{ij}^R < U^*_{ijt} \le \alpha _{ij}^E\right)\right] \cdot p\left(\alpha _{ij}^R | \alpha _{ij}^E\right) \end{aligned}$$

Note that the product of T indicator functions truncate the prior distribution \(p(\alpha _{ij}^R | \alpha _{ij}^E)\) and only allows it to have positive densities over the interval \([\max (U_{ijt}^* | U_{ijt} = 1), \textrm{min}(U_{ijt}^* | U_{ijt} = 2))\). Analogously, the posterior distribution of \(\alpha _{ij}^E\) is proportional to the prior distribution \(p(\alpha _{ij}^E | \alpha _{ij}^R)\) truncated on the interval \([\max (U_{ijt}^* | U_{ijt} = 2), \min (U_{ijt}^* | U_{ijt} = 3))\).

To sample the barrier level \(\theta _{jt}\), we need to define its evolution probability. That is, how a country’s barrier level fluctuates across years. In this paper, I model the state transition as a random walk:

$$\begin{aligned} \theta _{j,t} = \theta _{j,t-1} + \delta _{jt} \end{aligned}$$

where \(\delta _{jt}\) follows a normal distribution \(N\left(0, \sigma ^2\right)\). I fix the variance term \(\sigma ^2\) a priori as 1 for identification. However, it is worth noting that the variance term \(\sigma\) smoothes the barrier level across years because the degree to which an estimated \(\theta _{jt}\) shrinks back to the prior mean is inversely proportional to the variance of the disturbance term \(\delta _{jt}\). For example, if a country’s barrier level is completely independent of one another across years, or in other words, the barrier is not sticky at all, the disturbance variance would be infinity. Thus, fixing it as 1 assumes a sticky regulatory barrier.

Finally, we sample the quantity of interest \(\theta _{jt}\) from the full conditional posterior distribution \(f\left(\Theta | \alpha ^R, \alpha ^E, U, U^*\right)\). A naive Gibbs Sampler approach requires sampling \(\theta _{jt}\) conditional on \(\theta _{j,t-1}\), which fails to incorporate the sticky evolution process and is hence inefficient. Literature has demonstrated that sampling highly correlated parameters with Gibbs Sampler can also lead to difficult-to-converge chains (Martin and Quinn 2002). To improve estimation, I adopt the Kalman filter to sample the entire times series at a time rather than sampling from the component by component conditional distributions (i.e., sample the entire time series \(\theta _{j,k}\), \(k \in \{1,2,\cdots , T\}\) instead of each \(\theta _{jt}\) at a time). For ease of exposition, I omit the description of the forward-filtering and backward-sampling procedure.

One major weakness of this approach is that it cannot provide a regulatory barrier estimate for the U.S. because firms that file 10-K forms are U.S. firms. The U.S. is the home country for these firms, while any other country in which they operate is the host country. By definition, these firms must enter the U.S. market. And it is reasonable to argue that the data generating process is very different for the home and host countries. For these reasons, I exclude the U.S. from my analyses.

Data and results

Data

My primary data source is the 10-K forms published by the U.S. Security and Exchange Commission (SEC). Since the model cannot distinguish between 1) a firm does not seek market entry and 2) the regulatory barrier is too restrictive for a firm to enter, the confounding effect of preference undermines the interpretability of the results. Although the proposed model accounts for time-invariant unobserved dyadic heterogeneity (e.g., an energy company always prefers natural resource abundant countries), yearly fluctuations in global economic/political conditions and firms’ financial performance may still bias my results. I partially circumvent this thorny issue by excluding 1) countries that host very few U.S. firms and 2) firms that have minimal international commercial activities. In essence, I aim to construct a sample in which firms’ preferences can be viewed as constant so that any variation in firm behaviors must be due to the changing barrier levels.

As a result, I adopt the following exclusion criteria:

  • I exclude countries that more than 95% of firms in my sample never entered. These are mostly African countries.

  • I only include the top 1500 firms that are most active in the international market between 2006 and 2015. A firm is deemed more active if it consistently operates in more countries.

The period of my focus is between 2006 and 2015. The 10-K database used in this paper is compiled by Loughran and McDonald (2016), which covers 10-K reports from 1993. At the time of download in 2019, I excluded years after 2015 to avoid potential backfiling issues. Years before 2006 are also excluded to reduce computational complexity, as information from that period is relatively outdated.

After cleaning, I have a sample of 853 firms and 40 countries. Because I only keep firms that are consistently in the sample between 2006 and 2015, the number of firms drops from 1500 to 853.

Results

Figure 1 presents the temporal change in the estimated barrier level for four countries (Brazil, Canada, China, Japan); however, readers can find the entire list of estimated barrier levels in the Appendix. Since the four chosen countries have close economic connections with the U.S., examining their estimated barrier level is a preliminary test for the estimates’ plausibility.

Fig. 1
figure 1

Temporal Change in Regulatory Barrier (Brazil, Canada, China, Japan)

Among the four countries, Canada has the lowest barrier level consistently. Recall that the estimates draw information from the U.S. firms’ entry and reporting decisions. In the case of Canada, it demonstrates that more U.S. firms enter Canada, but fewer of them report barriers when compared with the three other countries. It is worth noting that the number of U.S. firms that report encountering barriers in Canada may still exceed the other three countries. Yet, Canada’s barrier level is still estimated to be lower because the number of firms operating in Canada can be much higher than in other countries.

Compared to the other three countries, China also displays a relatively low barrier level, which is counter-intuitive, as many U.S. firms report encountering barriers in China in my datasets. Again, this result should be driven by the large number of U.S. firms operating in China. However, the estimated difference in barrier level between China and Canada is still considerable, even though both countries host many U.S. firms. Thus, I believe that the model aggregates the firms’ entry and reporting information in a consistent and reasonable manner.

The estimates show a sharp jump in the barrier level across all countries from 2006 to 2007, suggesting a global shock in 2007. Among the four countries, the jump is the largest for Canada, followed by Brazil, China, and Japan. It is difficult to pinpoint the cause of the jump using only the information from the dataset. However, I offer some suggestive evidence regarding the possible causes. In 2006, there were 2,305 incidences of barrier reporting, while that number increased to 2,636 in 2007, a 14.4% increase. A plausible reason could be the 2007-2008 global financial crisis which should impact the regulatory barrier levels globally. It appears that more U.S. firms had financial difficulties in 2007 than in 2006. For example, here is a list of statements on bankruptcy in the 2007 10-Ks, whose number increases by 20% from 2006 to 2007:

  • “Some of our current and former international customers, particularly automobile manufacturers in Europe and Japan, were reluctant to do business with us while we underwent chapter 11 bankruptcy.”

  • “The proposed transaction is subject to approval by the United States Bankruptcy Court, receipt of required regulatory approvals, finalizing the definitive purchase agreement for Akzo Nobel’s Crystex.”

Still, it is worth emphasizing that more systemic analyses are required to understand the financial crisis’s effect on the observed barrier level jump.

Fig. 2
figure 2

Pairwise Comparison of Countries’ Average Barrier Level

Figure 2 presents a cross-country comparison of the estimated regulatory barrier level. Each cell in the heat map is the barrier difference between a pair of countries labeled by the axis ticks. The country’s barrier level is the average of its levels across years. Specifically, a negative value in the cell signifies that the country represented by the Y axis (i.e., party 1) has a lower barrier level than that on the X axis (i.e., party 2). It can be observed that the row of Canada is the bluest among all, which shows that Canada has the lowest average barrier level among all countries.

Similarly, India, Germany, and France also display significantly lower barrier levels than other countries on average. On the other hand, Philippines, Norway, and New Zealand have relatively higher barriers than other countries on average. On the other hand, China has a medium level of barriers compared with the rest of the world.

Figures 1 and 2 serve as preliminary validation tests of the estimated barrier. However, more rigorous results are needed to establish the accuracy and consistency of the proposed index. Thus, I present several statistical analyses in the next section, which compare and contrast my index with other popular measurements in the field.

Validation

Special Trade Concerns (STCs)

Gulotty (2020) has shown in his book that the special trade concerns (STCs) data, and more specifically, the technical barrier to trade (TBT) data collected by the World Trade Organization (WTO) can inform researchers of the regulatory barrier levels of major economies in the world.

As I briefly explained in the introduction section, my proposed index can be superior to the STC-TBT data in two major aspects:

  • Gulotty (2020) noted that: “the choice to raise a foreign regulation as an STC is as much a political process as the choice to impose the regulation in the first place.” Thus, the STC-TBT data is likely to be heavily influenced by international politics. This concern is challenging to eliminate but can be fatal for researchers who study the correlation between regulatory barriers and international relations. However, my estimated barrier is less susceptible to such a concern as it is unlikely that international politics may shape an individual firm’s decision to report in their annual reports.

  • It is known that STC-TBT data tend to target larger markets, as governments need to balance the cost of filing an STC complaint and its benefit on the domestic economy (Fontagné and Orefice 2018; Gulotty 2020). Therefore, the observed STC-TBT report distribution is heavily skewed along the market size dimension: countries with larger market sizes are more likely to be included in STC reports than their smaller counterparts, even if they have the same barrier level. Admittedly, my index cannot eliminate the contaminating effect of market sizes. Still, by accounting for time-invariant dyadic-specific confounders, the proposed index should address the concern more satisfyingly.

Fig. 3
figure 3

Comparison with STC count and FDI Regulatory Restrictiveness Index

I conduct two validation analyses to compare the STC-TBT data and my proposed index. First, I normalize the STCs count that a country is subject to and its estimated barrier by subtracting their mean and dividing by the standard deviation. Then, I take the difference between these two data at the country level and plot the difference against the market size of the countries, measured by GDP. The result is presented in Fig. 3a. The horizontal axis shows the log GDP of each country: countries with a large GDP are placed more on the right. The vertical axis is the normalized difference between the STCs count and the estimated barrier level: a negative value shows that my estimated barrier level is lower than the normalized STCs count. We observe that the difference is more pronounced in countries with large market sizes. More specifically, the barrier level measured by my index is consistently lower than that measured by the STCs count among countries with large market sizes. On the contrary, the two measurements align quite well among small countries. These findings are consistent with the observation that the STC-TBT data often inflate the barrier level of large economies. The proposed index suffers less from such a weakness.

Next, I regress my estimated barrier level on the count of STC-TBT reports filed against each country between 2006 and 2015. Parameters with a p-value less or equal to 0.05 are statistically significant. Column (1), (2), and (3) in Table 3 display the regression results. The bi-variate regression between the proposed index and the STCs count returns a negative coefficient. However, the coefficient becomes positive after accounting for country, year fixed effects, and GDP/GDP per capita. Although the positive correlation fails to achieve statistical significance, the results nonetheless corroborate my claims on the contaminating effect of market sizes and the weakness of the STC-TBT data.

Table 3 Regression Analyses of STCs, FDI Restrictiveness Index and the Estimated Barrier

OECD FDI regulatory restrictiveness index

Next, I compare my barrier index with the popular OECD FDI Regulatory Restrictiveness Index (Koyama et al. 2006; Kalinova et al. 2010). The index covers four types of regulatory measures: (1) foreign equity restrictions, (2) screening and prior approval requirements, (3) rules for key personnel, and (4) other restrictions on the operation of foreign enterprises. In this analysis, I use the aggregate index at the country-year level.

First, I plot log GDP against the normalized difference between FDI regulatory restrictiveness index and the estimated barrier (Fig. 3b). There is a slight negative relationship between the average difference and log GDP, similar to what we observe in the STC-TBT case. China, India, and Canada are still among the countries that enjoy a considerable negative difference. That is, my proposed index assigns a significantly lower barrier level than the FDI regulatory restrictiveness index. However, the observed negative correlation is much less pronounced than it is in the STC-TBT case, which suggests that the FDI regulatory restrictiveness index captures issues that differ from what STC-TBT capture.

I proceed to regress my estimated barrier index on the FDI regulatory restrictiveness index and report the results in column (4), (5), and (6) in Table 3. However, it is surprising that the coefficients of the FDI restrictiveness index are consistently negative, suggesting a strong negative correlation between my proposed barrier estimates and the restrictiveness index. The negative relationship persists even after accounting for market sizes and country/year idiosyncrasies. It shows that countries with a higher FDI regulatory restrictiveness index are often associated with a lower regulatory barrier level, per my proposed estimates. However, readers need to be cautious when interpreting this seemingly contradictory result, as the two indices may simply reflect different aspects of regulatory barriers. The FDI regulatory restrictiveness index focuses on the regulatory restrictions that only affect foreign firms. At the same time, my proposed barrier aims to measure regulation that affects both domestic and foreign firms but may constitute hidden obstacles for foreign firms in practice. The results could demonstrate that countries that use domestic regulations as hidden barriers can adopt less restrictive FDI-targeting measures, as blatant restrictions on foreign ownership may imply significant political costs both domestically and internationally (Kono 2008).

Trade, FDI, and democracy

Finally, I examine the relationship between my proposed barrier estimates and the U.S. trade volume, FDI flow, FDI stock, and regime types. Intuitively, a higher barrier should be correlated with lower trade flow, lower FDI flow, and lower FDI stock. I also revisit the classic debate between regime types and regulatory barrier (Milner and Kubota 2005; Kono 2008; Pandya 2014).

I visualize the results in Fig. 4. Each panel represents the overall correlation between my proposed barrier estimate and trade flow, FDI flow, FDI stock, and the Polity 2 score. It can be clearly observed that a country with a higher barrier level is associated with 1) lower trade volume with the U.S., 2) receives less FDI from the U.S., and 3) has lower FDI stock from the U.S.. However, the correlation between the Polity 2 score and the estimated barrier level is very weak. These results lend further credibility to my proposed barrier estimates.

Fig. 4
figure 4

Correlation with Major Indicators (Polity, Trade, FDI Flow, FDI Stock)

Conclusions

This paper offers a novel measurement of the elusive quantity: regulatory barrier, contributing to the empirical literature on the political economy of regulation and regulatory barrier. I leverage information in the annual reports of U.S. public firms (i.e., 10-Ks forms) to best address two major concerns when measuring regulatory barriers: (1) the confounding effect of international politics and (2) the bias caused by market size. I show in a series of validation analyses that the proposed barrier index shows patterns consistent with our existing knowledge of regulatory barriers. Moreover, the new barrier estimates display signs of better addressing the problems mentioned in the previous text than the existing measurements.

The proposed barrier index serves as an additional measurement of regulatory barriers, which may be superior to existing ones in some research contexts. I intend not to claim that my estimate precisely captures the concept of regulatory barrier and hence is the “best” one. However, my goal is to shed light on new aspects of regulatory barrier that evades researchers’ attention due to their hard-to-observe nature. There are obvious weaknesses in both the information sources and my models. Nonetheless, my estimates offer new insights into this important political and economic phenomenon.

The paper also contributes to international political economy and international relations by proposing the firms’ annual reports as a new source of information for further research. Since high-quality text data has become increasingly important for empirical research in political science, the rich information contained in the 10-Ks firms deserves more scholarly attention in the future.