The Industry-specific Regulatory Constraint Database (IRCD) is a new database that provides an estimate of the degree of federal regulation faced by each industry in the United States from 1997 to 2011. IRCD relies on a transparent and replicable methodology developed by Patrick A. McLaughlin and Omar Al-Ubaydli to quantify federal regulation, as explained in detail in this working paper.
The database actually offers multiple novel and objective measures (link to variables section) of the accumulation of regulations in the economy overall and for all the different industries in the United States. IRCD uses text analysis to count the number of binding constraints in the text of federal regulations, which are codified in the Code of Federal Regulations (CFR). In addition, it measures the degree to which different groups of regulations target specific industries.
File size (available only in the full IRCD 1.0 dataset) – gives the size (in bytes) of each set of files that corresponds to a CFR title.
Industry Regulation Index (also available on the Graph page) – The Industry Regulation Index is a unitless measure of the amount of regulation targeting a given industry. The index is designed to equal 1 in year 1997, and then changes in industry-specific regulation in later years relative to 1997. Values greater than 1 indicate a growth in regulation relative to 1997, and values less than 1 indicate a diminution of regulation relative to 1997. The Industry Regulation Index is constructed dividing Industry Regulation in year y by Industry Regulation in year t. In other words, Industry Regulation Index for industry i in year y is given by:
Industry Regulation (available only in the full IRCD 1.0 dataset) – Industry Regulation is the product of two of the objective measures produced by text analysis of the CFR, restrictions and Industry Relevance, summed across all CFR titles of a given year. So Industry Regulation for industry i in year y would be given by:
Industry Relevance (available only in the full IRCD 1.0 dataset) – Industry Relevance is a measure of how relevant a specific title of the CFR is to a specific industry. It is constructed by searching for a set of phrases that may indicate that the CFR title is targeting a specific industry. The phrases that are searched for are based on the 2007 2-digit and 3-digit industry descriptions given in the North American Industry Classification System, and are developed following the rules given in Appendix A of the IRCD working paper. It is normalized by the number of pages in the CFR title, so that Industry Relevance is given by the number of times an industry’s descriptive phrases were found in a CFR title divided by the number of pages in that title. See the example below.
Page Count (available only in the full IRCD 1.0 dataset) – count of the number of pages in a given CFR title in a given year.
Restrictions (also available on the Graph page) – The variable restrictions is a count of the number of occurrences of words or terms that indicate obligation or restriction. For IRCD 1.0, these include five terms: “shall,” “must,” “may not,” “prohibited,” and “required.”
This section explains in excruciating detail how we constructed the industry relevance metric. First, by decomposing typical NAICS industry descriptions, we describe the structure of industry descriptions. Second, we explain the rules we developed to turn the NAICS industry descriptions into a set of search strings. Third, we cover some shortcomings of our systems and offer possible solutions to the individual user of the database. Finally, we explain how we calculated the industry relevance metric and discuss alternative ways to calculate it.
The NAICS industry description is a collection of words or phrases linked by conjunctions or commas, e.g., “Agriculture, Forestry, Fishing and Hunting,” or “Finance and Insurance” (we discuss some important exceptions below). The full description can be divided into an exhaustive collection of phrases that may have some overlap in shared words. For example, “Oil and Gas Extraction” can be divided into “Oil Extraction” and “Gas Extraction.”
Each individual phrase is a noun phrase. The noun phrase has up to three components.
Head noun: The main word in the phrase. This can be in the form of a present participle [Fishing] or not [Construction].
Pre-modifiers: Words that precede the head noun and modify its meaning. They can be adjectives [Educational in “Educational Services”], nouns [Waste Management in “Waste Management Services”] or a mixture [Electronic Product in “Electronic Product Manufacturing”]. They can also be absent [Construction].
Post-modifiers: Words that follow the head noun and modify its meaning. They can be nouns [Companies in “Management of Companies”] or a mixture of adjectives and nouns [Economic Programs in “Administration of Economic Programs”]. They can also be absent [Construction]. We ignore prepositions.
Each of the following rules applies to each of the full phrases derived from the industry description. All searches are case insensitive.
Rule 1: The full phrase.
Rule 2: The singular form of the full phrase.
Rule 3: The person who does the full phrase (singular).
Rule 4: The person who does the full phrase (plural).
Rule 5: The head noun.
Rule 6: The base form of the head noun.
Rule 7: The pre-modifiers together as a whole string.
Rule 8: The post-modifiers together as a whole string.
Rule 9: Individual words and phrases in pre-modifiers and post-modifiers.
In our database, we begin by dividing the industry description into the individual noun phrases described above; within the industry, each noun phrase is assigned a group number to distinguish its strings from those belonging to the other noun phrases. For example, in the industry “oil and gas extraction,” oil extraction is assigned group 1 and gas extraction is assigned group 2.
The above rules are ineffective in three infrequent classes of NAICS industry descriptions. The first is when the industry description involves a parenthetical comment, typically an exception, such as “mining (except oil and gas).” Our solution is to simply ignore the parenthetical comment. The following industries suffer from this problem:
The second is the case of “other,” “support,” or “related” activities, such as “support activities for mining” or “furniture and related product manufacturing.” We apply the rules in the normal fashion; however, in some of these cases, the outcome is unlikely to fully reflect the spirit of the NAICS industry description. The following industries suffer from this problem:
The final case is that of industry names that contain the word “general” or “miscellaneous,” such as “general merchandise stores.” We apply the rules in the normal fashion. However, in some of these cases, the outcome is unlikely to fully reflect the spirit of the NAICS industry description. The following industries suffer from this problem:
We have omitted industry description 81 (Other Services [Except Public Administration]) because any search for strings based on the words “other services” would return useless results. We have also omitted 423 (Merchant Wholesalers, Durable Goods) and 424 (Merchant Wholesalers, Nondurable Goods); they are the only three-digit industries that fall under 42 (Wholesale Trade), and we cannot think of a sensible way of distinguishing them since they do not follow the phrase structure of the other NAICS industry names. Therefore, we direct the reader to the data on 42 (Wholesale Trade) only.
Each industry description is associated with a collection of strings. The strings are classified according to group and rule. For each group in each industry, each rule in the range 1–8 is associated with at most one string. Rule 9 can yield multiple strings associated with the same group or industry.
As an illustration, consider industry 316 (Leather and Allied Product Manufacturing). The industry name is composed of two phrases: leather manufacturing (group 1) and allied product manufacturing (group 2).
The resulting strings are in table A1.
In this example, based on our discretionary interpretation of the rules, we exclude manufacturing, manufacture, allied, and product. In the final database, there is a variable denoting which strings we recommend including/excluding, though we still measure the occurrence of every string to allow readers to judge for themselves. Though we judge each rule-9 string on individual merit, in the default version of the final database (which we use for the figures and tables in the main text), we exclude all rule-9 strings. In appendix C, we detail strings where we struggled to decide on inclusion or exclusion.
As table A1 shows, some of the smaller strings are contained in the larger strings from the same group. More formally, each string derived from rules 1, 2, 3, or 4 can potentially contain the head noun (string from rule 5), the pre-modifier (string from rule 7) or post-modifier (string from rule 8) from the same group. (We ignore containment of the strings from rule 9 because we are excluding rule 9 strings.) We therefore create three additional dummy variables: contains_head_noun, contains_pre_modifier, and contains_post_modifier. These variables make it easy to use statistical software to eliminate double-counting. For example, every occurrence of the string “leather manufacturing” automatically implies an occurrence of the string “leather,” but we would only want to count such an occurrence once. We provide programming code for Stata that prevents double-counting by using these variables.
In some cases, a string is shared by multiple groups in the same industry, e.g., manufacturing in the example in table A1. We assign such shared strings to the first group that shares them since we are ultimately aggregating at the industry level, and so assigning them to multiple groups within the same industry will result in double-counting.
Once we have eliminated the possibility of double-counting, for each industry or title, we sum the total occurrences of the included strings in that title. We then divide that sum by the number of pages in the title and multiply by 100 to obtain a measure of industry relevance per hundred pages. This measure prevents longer titles from appearing to be more relevant to an industry simply by virtue of their length. Users have the opportunity to undo this act of deflation should they so desire.