The U.S. Bureau of Labor Statistics (BLS) produces a broad range of data products that measure labor market activity. Included in these estimates are measures of employment, wages, and the number of business establishments. Data can be disaggregated by geographic location, industrial specialization, and other distinguishing characteristics. One potential identifying attribute is whether a company can be classified as a 501(c)(3) nonprofit organization. Although nonprofit organizations exist throughout the United States and in many different industries, there are no data collections specifically designed to produce estimates for these organizations.
Over the years, BLS has frequently received requests from academic groups, private industry, and other members of the public to create estimates for nonprofit organizations. Public interest in such data prompted BLS to explore ways to measure this important segment of the U.S. economy.
BLS published its first set of statistics for nonprofit organizations in 2014. Derived from the Quarterly Census of Employment and Wages (QCEW) and data from the Internal Revenue Service (IRS), these data provided estimates of employment, wages, and the number of 501(c)(3) establishments. This initial publication was augmented by two additional releases of nonprofit tabulations, in 2018 and in 2019. Estimates for nonprofit organizations for 2007 to 2017 are available in downloadable format from the BLS website.1
With each release of estimates for nonprofit organizations, BLS expanded the scope of the tabulations. This article describes the data sources and methodology used to create these statistics and briefly discusses how these tabulations developed over time.
Data sources
BLS tabulations of data on nonprofit organizations rely on two data sources: (1) the QCEW, which is a BLS program,2 and (2) the Exempt Organization Business Master File (EOBMF) extract, which is published by the Internal Revenue Service (IRS).3
Quarterly Census of Employment and Wages
The QCEW is a federal–state cooperative program. Federal and state unemployment insurance (UI) laws require that most employers pay quarterly UI taxes for workers covered by these laws, and there is a great deal of legislative similarity across states.4 Because these UI reports are required by law, there is a high degree of compliance and accuracy from employers. State workforce agencies provide BLS with quarterly summaries of the employment and total pay of workers covered by state and federal UI legislation.
Another variable collected through the UI reports describes an establishment’s main business activity. Like other federal agencies, BLS uses the North American Industry Classification System (NAICS) to categorize establishments into industries according to their primary business activity.5 This technique of identifying a business’s primary activity provides a uniform basis on which to tabulate and analyze industry statistics.
The QCEW program supplements UI data by conducting two surveys of private-sector businesses: the Multiple Worksite Report (MWR) and the Annual Refiling Survey (ARS). The MWR is a survey targeted at employers with multiple-location worksites that report their employment and wage information under a single UI account. Information from the MWR allows the QCEW program to disaggregate data according to each business establishment’s location and provides a more granular view of a company’s activity. For example, a company that operates a chain of business establishments throughout a given state would be asked in the MWR to identify the individual worksites and to specify the number of employees and the wages paid at each location. In addition, because each location can specialize in a different activity, the MWR also asks that each establishment be classified into its appropriate industry. The information collected in the MWR is used to ensure an appropriate apportionment of federal funds through grant programs that use county economic indicators as a basis for allocations. No other sources are available to obtain this information.
A company operating a chain of restaurants in a given state helps illustrate the disaggregation process. The company submits a single report for the entire state, even though it has restaurants at several different locations within that state. The MWR asks the company to identify each individual location (or worksite) and the number of employees and wages paid at each location. In addition, it asks the employer to provide a description of each worksite’s primary activity. These descriptions are then used to assign a NAICS code to each worksite, and the codes can vary from one establishment to the next. Although some locations may be considered full-service restaurants, others may be classified as cafeterias. Some large restaurant chains have establishments dedicated to organizational planning, and these worksites are categorized in the management of companies and enterprises industry.
The ARS is a periodic survey of establishments with 3 or more employees that serves to verify the accuracy of existing QCEW records. Included in the data collected are the name of the business, its location, and its contact information. In addition, the ARS also seeks to verify and, if necessary, update the NAICS code of each establishment. The goods, products, and services of any business may evolve over time. The ARS addresses this dynamism by asking companies to describe the primary activities of each establishment. On the basis of these responses, BLS may modify the NAICS classification of a specific worksite. These data from both the ARS and MWR enhance the accuracy of the QCEW for all levels of industry and geographic detail.
BLS conducts a rigorous review process of the QCEW data to ensure that records for individual establishments and the tabulations that summarize the information from multiple establishments for a given industry and geographic area meet exacting standards. State workforce agencies, in collaboration with BLS, review unusual movements in employment and wages. These records are investigated, and errors are corrected, as necessary. For cases in which an employer does not submit its quarterly information in a timely manner, the QCEW program may impute estimates for missing data.6
The final product is a comprehensive list of businesses from which BLS produces quarterly estimates in a timely and accurate manner. The BLS Business Register provides detailed information covering more than 95 percent of civilian employment in the United States. Among the establishment-level data collected by BLS are monthly employment figures, quarterly wages, industry characteristics, and geographic location (the state and county where a business is located and, in many cases, its longitude and latitude coordinates). The database also contains each company’s employer identification number (EIN), a unique 9-digit number issued by the IRS to identify a business entity. A business may be either a single-establishment firm, or it may consist of multiple establishments organized under a single EIN. Despite the inclusion of these and many other attributes, BLS data do not contain any comprehensive metric that can identify a nonprofit entity.
Exempt Organization Business Master File extract
The IRS publishes a list of nonprofit organizations every year. Acquiring tax-exempt status compels an organization to forgo a degree of confidentiality, and it must consent to have company-specific information made available to the public through the EOBMF extract.7
The IRS requires companies seeking tax-exempt status to submit a request through Form 1023, Form 1024, or Form 1024-EZ. After tax-exempt standing is granted, an organization must periodically recertify this status using either Form 990 or Form 990-EZ. The Tax Exempt and Government Entities Division of the IRS compiles data collected through this process and disseminates this information to the public through the EOBMF extract.
The EOBMF extract is a large, cumulative dataset that is updated monthly. Among the 28 data fields in this dataset is a company’s name, EIN, address, assets, and income. Another data element is the subsection code of section 501(c), Title 26 of the U.S. Internal Revenue Code. This subsection code establishes the section under which an entity gains tax-exempt status. Specifically, Section 501(c) delineates 29 different business classifications of tax-exempt organizations. These categories span a broad spectrum of activities and functions, ranging from veterans organizations to local boards of trade to horticulture associations.
The most prevalent form of tax-exempt organizations, those covered by section 501(c)(3), account for 3 out of every 4 such records in the EOBMF extract.8 This category includes an array of businesses, such as the following:
· Charitable organizations
· Educational organizations
· Literary organizations
· Organizations to prevent cruelty to animals
· Organizations to prevent cruelty to children
· Organizations for public safety testing
· Religious organizations
· Scientific organizations
Although the EOMBF extract contains many business attributes, it does not include any information about an organization’s employment or wages.
Methodology
To produce statistics about nonprofit organizations, BLS needed to develop a sound underlying methodology. After careful evaluation, BLS developed a two-step approach. In the first step, BLS used a deterministic matching algorithm to link the BLS Business Register to the EOBMF extract. In the second step, BLS identified additional nonprofit organizations through data elements contained solely in the BLS Business Register.
The linking of existing data sources across federal agencies is an approach that is increasingly being used within the federal government. This practice can lead to new and innovative data products while avoiding additional burden on respondents. In fact, the Foundations for Evidence-Based Policymaking Act of 2018 established a Chief Data Officer Council to promote and encourage data sharing agreements between agencies.9 Under this act, federal agencies are encouraged to collaborate with one another to promote the broader use of administrative data.
When developing its methodology, BLS reviewed the approach used in an earlier study of the nonprofit sector. Researchers from Johns Hopkins University applied an EIN-matching approach to combine the BLS Business Register data with the data from the EOBMF extract.10 That is, the researchers attempted to identify the same entity across two data sources by exactly matching a variable common to both—in this case, the EIN. The John Hopkins researchers analyzed 501(c)(3) entities from 2000 to 2010 and laid the foundation for the BLS tabulations of nonprofit organizations.
Mirroring the Johns Hopkins researchers’ approach, BLS also limited its EIN matching to the most prevalent form of tax-exempt organizations: 501(c)(3) organizations. As the first step in its two-step approach, BLS linked records from the two datasets using the EINs. Only records identified in the EOBMF extract as 501(c)(3) organizations were selected. When a match was obtained, all private-sector establishments found in the BLS Business Register associated with this linked EIN were placed into a preliminary nonprofit data file.
In order to validate this approach, BLS staff performed a thorough, labor-intensive review of all matched EIN records prior to the first publication of nonprofit estimates in 2014. Variables in the BLS Business Register were compared with equivalent variables in the EOBMF extract. Analysts looked for inconsistencies in the business name and address of each organization. BLS staff also looked for differences between the EOBMF-extract activity and taxonomy codes, which specified an organization’s purpose and operations, with the NAICS codes found in the BLS Business Register. When necessary, BLS staff reviewed an organization’s website to help resolve any discrepancies. When an analyst identified a spurious match, the record was removed from the nonprofit dataset and placed into a file of false positives. These records were excluded from the nonprofit tabulations and will be excluded from any future estimates.
The advantages of the deterministic matching approach lie in its simplicity and its relatively low false-positive match rates. The algorithm to link the two data sources is simple and straightforward: an automated process seeks an identical value for the matching variable (the EIN) in each dataset. Links derived in this manner are usually very accurate. A review of the false-positive records generated in the publication of the estimates for 2007 to 2012 showed that the proportion of incorrectly identified entities amounted to less than 0.1 percent of all matched EINs.
This process, however, is not without limitations. A significant shortcoming is the use of an imperfect variable to link the two datasets: the EIN. This identifier was designed for tax-reporting purposes, rather than for generating statistical tabulations. Employers can request more than one EIN; some employers operate multiple units under a single EIN, while others break their activities into separate entities and obtain multiple EINs. Thus, employers may report different EINs to different federal agencies, depending on the reporting requirements. For example, an employer may report employment to its state workforce agency using the EIN of the entity associated with paid employment and report 501(c)(3) information using the EIN of its charitable organization. Another potential shortcoming of this deterministic matching process is that an administrative error may lead to an incorrect value being entered as an EIN. The use of this imperfect matching variable may lead to potential matches being missed by the automated matching algorithm.
To help overcome these limitations, BLS implemented a second step that was not present in the Johns Hopkins study. Under state UI laws, some establishments are not required to submit quarterly UI contributions. Instead, these organizations are allowed to reimburse the UI system when a claim is made. Most states restrict these reimbursable entities to 501(c)(3) organizations.
The BLS Business Register includes a variable that indicates the reimbursable status of an establishment. Therefore, the initial list of EINs was augmented by using the BLS Business Register to identify nonprofit organizations that were missed by the matching algorithm. Specifically, this step involved identifying unmatched records that are termed “reimbursables.” For cases in which a reimbursable could potentially refer to an establishment that might be something other than a 501(c)(3) organization, the BLS national office, with assistance from state workforce agencies, performed a manual review of these units. Any record identified as not being a 501(c)(3) organization was removed from this list of reimbursable nonprofit organizations and was added to the file containing incorrectly identified records (the false-positive file).
The reimbursable variable is limited to a supplemental, but important, role in the process. Although many EINs identified in the deterministic matching phase were also reimbursable units, there was not a complete overlap. A review of records from the most recent year of published estimates, 2017, reveals that reimbursable entities made up only 24 percent of all EINs identified in the first-stage matching process. For this reason, the reimbursable variable can be used to supplement the matched data generated in the first phase, but it cannot be relied upon to create the entire nonprofit data series.
Combining the dataset of matched EIN records with the data file of unmatched reimbursable units produced a final list of identified 501(c)(3) establishments. BLS used this two-step process for all three sets of nonprofit estimates covering the period from 2007 to 2017.
In this article, 501(c)(3) establishments identified through this two-step procedure are labeled as nonprofit organizations. Establishments not identified by this process are labeled as for-profit establishments. However, these for-profit establishments include a small number of tax-exempt organizations that are classified as 501(c) organizations but not as 501(c)(3) organizations. As previously mentioned, there are 29 different business classifications under Section 501(c).11 Businesses that could be considered tax exempt that are not 501(c)(3) organizations include some types of co-ops, civic leagues and social welfare organizations, and domestic fraternal societies. Table 1 lists the number of EINs identified at each step in the creation and review of the 2017 nonprofit estimates.
Data creation and review | Number of EINs |
---|---|
501(c)(3) EINs in Exempt Organization Business Master File extract |
1,297,538 |
Step 1: Match EINs |
158,234 |
Step 2: Add reimbursable units |
8,634 |
Remove false matches |
234 |
Final number of nonprofit organizations in dataset |
166,634 |
The sizable discrepancy between the number of records in the EOBMF extract and the number of EINs that BLS matched to may be due, in part, to the previously mentioned shortcomings of this matching variable. But there are other reasons that also help explain this differential. One notable difference is that many 501(c)(3) organizations—particularly those that are small—are staffed exclusively by volunteers and retain no paid employees.12 Such organizations are not included in the BLS Business Register.
State-specific reporting requirements also contribute to the underrepresentation of small nonprofits in the BLS Business Register. In some states, 501(c)(3) organizations with fewer than 4 employees are not required to provide unemployment insurance coverage and are thus not present in the Business Register.13 Comparing states that use a higher employment threshold with those that use a lower employment threshold suggests that BLS estimates of nonprofit employment are undercounted by less than 40,000, which is quite small, given that total employment in nonprofit establishments was more than 12 million in 2017. (See the appendix for more information.)
Although the EOBMF extract does not contain employment data, one measure of an organization’s size is the level of assets it holds. As noted in the description of the EOBMF extract, the IRS dataset contains information on the assets held by tax-exempt organizations. When BLS match rates for 501(c)(3) entities are broken out by the size of their assets, a clear correlation exists between the size of an organization (in terms of assets held) and the proportion of those records that are matched when using the EIN and the QCEW database. Table 2 breaks down these results. Match rates are markedly lower for organizations with few assets than for those with many assets. About a third of the EOBMF extract file consists of organizations with no assets; the match rate for these organizations is just 1.4 percent. By contrast, the match rate for organizations with assets of $250 million or more is 75.5 percent.
Assets (in dollars) | Number of EINs | Match rate (in percent) |
---|---|---|
0 |
452,165 | 1.4 |
1– 49,999 |
152,185 | 10.7 |
50,000–99,999 |
50,032 | 20.1 |
100,000–499,999 |
118,647 | 28.0 |
500,000–999,999 |
47,639 | 34.1 |
1,000,000–2,499,999 |
51,346 | 39.1 |
2,500,000–4,999,999 |
28,324 | 45.0 |
5,000,000–9,999,999 |
20,381 | 49.8 |
10,000,000–49,999,999 |
23,740 | 55.3 |
50,000,000–99,999,999 |
4,179 | 64.2 |
100,000,000–249,999,999 |
2,954 | 69.5 |
250,000,000–499,999,999 |
1,263 | 75.0 |
500,000,000–999,999,999 |
657 | 76.3 |
1,000,000,000 or more |
664 | 75.8 |
Missing asset data |
343,362 | 3.9 |
Another factor contributing to the low match rate between the EOBMF extract and the BLS Business Register relates to the frequency with which 501(c)(3) organizations certify their tax-exempt status. Most 501(c)(3) organizations are required to verify their nonprofit status once every 3 years. During this interval, an organization may cease business operations. Although such a company would no longer be a viable business, its information may remain in the EOBMF extract for some time after the organization has ceased operations. The QCEW program, by contrast, updates the BLS Business Register every 3 months and thus its data more promptly reflect a business closure.
In addition to undercounting nonprofit establishments, it is also likely that some part-time employees at 501(c)(3) organizations are not included in BLS tabulations. Reporting requirements for the inclusion of part-time workers at nonprofit organizations vary by state. In 2017, 31 states and the District of Columbia excluded certain part-time workers at nonprofits from their reporting requirements.14
Data collected by BLS are subject to the Confidential Information Protection and Statistical Efficiency Act of 2002.15 This law protects the confidentiality of each respondent and ensures that any data collected will only be used for official statistical purposes. These safeguards are essential in establishing and preserving the public’s trust in BLS data collection and dissemination methods. Therefore, prior to publishing these estimates, all tabulations must undergo a rigorous review process to ensure that no published cell could compromise the confidentiality of a business respondent.
Publication and reception by the public
In 2014, BLS released estimates of employment, wages, and the number of establishments for nonprofit organizations in the private sector. All data series were annual averages, and estimates were released for each year from 2007 to 2012. At the national level, BLS created estimates for nonprofit organizations at the total private, industry sector, and industry subsector levels. At the state level, BLS produced estimates at the total private and industry sector levels.
The publication of these estimates was well received by the public, and BLS had many inquiries for additional years of data. Many data users also requested more granular information about nonprofit companies. For example, some users wanted detailed information on where nonprofits were located and more information about the activities that nonprofit organizations engaged in. In 2018, with support from the Center for Civil Society Studies at Johns Hopkins University, and with grant funding from the Charles Stewart Mott Foundation, BLS published estimates for nonprofit organizations for 2016. This additional year of data included some county-level tabulations and expanded detail for selected industries.
The public enthusiastically welcomed the increased level of geographic and industry detail in the 2016 data on nonprofits. Recognizing the value of these estimates to the public, BLS built on this success and, in 2019, released additional estimates for nonprofits for 2013 to 2017. In this most recent release, BLS further expanded the scope of the project to include estimates for total private, industry sectors, industry subsectors, and select industry groups at the national, state, metropolitan statistical area, and county levels. BLS augmented the 2016 data to include the same industry and geographic detail found in the estimates for 2013 to 2015 and those for 2017.
In response to other public comments, BLS modified the format of the publication tables for 2013 to 2017. The data are now presented in a format that is much easier to manage. Additionally, new variables were added to the tabulations to facilitate a comparison of nonprofit estimates with for-profit estimates. Employment, wages, and establishment counts for existing QCEW published estimates (the combination of both nonprofit and for-profit organizations) are now included in these data tables. Data users can easily obtain information on the for-profit portion of a cell by subtracting the nonprofit estimates from these QCEW figures. To further enhance this comparison, BLS now includes the percentage of employees working in 501(c)(3) establishments relative to all private-sector establishments and the ratio of wages at nonprofit establishments relative to those at for-profit establishments.
These modifications greatly increased the amount of data made available to the public. For 2007, BLS published a total of 633 different combinations of data for industry and geography at the national and state level. In contrast, for the most recent year of data, 2017, with its more far-reaching industry and geographic detail, the number of published cells increased by a factor of more than 17. Table 3 provides a comparison of the number of published estimates for 2007 (the initial, limited-publication structure) and those for 2017 (the new, expanded format).
Geographic level | 2007 | 2017 |
---|---|---|
National |
72 | 115 |
State |
561 | 2,021 |
Metropolitan statistical area |
0 | 3,075 |
County |
0 | 5,962 |
Total |
633 | 11,173 |
Data are now available for 11 continuous years (2007 to 2017) for total private employment, wages, and the number of establishments at the national level and for all 50 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands.16 In addition, 11 years of data are also available for certain specific industries at the national and state level, which allows data users to study changes in 501(c)(3) organizations at different points in the business cycle.
Conclusion
BLS estimates for nonprofit organizations provide a valuable addition to the data-user community. This article describes the data sources used and the methodology adopted to generate these results. Throughout this project, BLS frequently made changes and improvements to 501(c)(3) data tabulations. Many of these adjustments were the result of suggestions made by members of the public, and BLS continues to seek input from the data-user community on ways to further improve these data.
BLS continues to evaluate these data and explore new avenues for expanding and improving them. For example, the assets held by an organization constitute only 1 of the 28 variables that the IRS maintains for tax-exempt entities in the Exempt Organization Business Master File extract. Pairing these variables with the comprehensive information available in the BLS Business Register creates a rich framework for future research and data exploration. Other matching techniques, such as probabilistic matching, are being explored as a means of identifying entities not captured by EIN linkage.
BLS is currently planning to publish estimates for nonprofit organizations every 5 years. Previously released data are available on the BLS website.17 BLS welcomes comments and suggestions from the data-user community.
Appendix
The Federal Unemployment Tax Act requires state coverage of unemployment insurance for nonprofit organizations. However, reporting requirements for small nonprofit organizations vary by state and lead to the undercounting of small 501(c)(3) organizations in the BLS Business Register. In 2017, 18 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands provided coverage for (and thus required reporting of) establishments with at least 1 paid employee. A higher reporting threshold (at least 4 paid employees) in the remaining 32 states results in the undercounting of nonprofit employment and wages in those states. (See chart A-1.)
To estimate the employment undercount resulting from the different state reporting requirements, BLS divided total private employment in nonprofit organizations into two groups. Group A used the lower reporting threshold of at least 1 employee and consisted of 18 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands. Group B consisted of the remaining 32 states that used the higher reporting threshold of at least 4 employees. The estimates assume that there were no underlying reporting differences between groups A and B other than the reporting threshold.
Not surprisingly, the average establishment size is smaller for group A than for group B. For each group, the proportion of total private employment attributed to nonprofits with fewer than 4 employees was calculated. In group A, organizations with fewer than 4 employees accounted for 1.5 percent of employment, while the comparable figure for group B was 1.0 percent. The difference between these two estimates (approximately 0.54 percent) was then multiplied by the total private employment figure for group B. This figure came to 38,540, which represents the approximate level of undercounting of employment in group B. Similar calculations were made for the number of establishments, yielding an estimated undercount of 12,149 establishments. More than one-fourth of the undercount was in the religious, civic, professional, and similar organizations subsector. (See table A-1.)
Group | Employment | Establishments | ||||
---|---|---|---|---|---|---|
All establishments | Establishments with fewer than 4 employees | Percent of all establishments | All establishments | Establishments with fewer than 4 employees | Percent of all establishments | |
Group A[1] |
5,346,031 | 82,120 | 1.5 | 135,844 | 47,555 | 35.0 |
Group B[2] |
7,189,190 | 71,893 | 1.0 | 147,265 | 39,404 | 26.8 |
Undercount |
38,540 | [3] | [3] | 12,149 | [3] | [3] |