Competitive Strategy IG: Research Resources

Research Resources: Databases

Each of the following links provides a list and description of databases of interest to the members of the Competitive Strategy IG.

Annual Survey on Entrepreneurs

The Annual Survey of Entrepreneurs provides information on selected economic and demographic characteristics for businesses and business owners by gender, ethnicity, race, and veteran status. This database includes all nonfarm businesses, with paid employees, filing Internal Revenue Service tax forms as individual proprietorships, partnerships, or any type of corporation, and with receipts of $1,000 or more. The ASE is conducted on a company or firm basis rather than an establishment basis. Starting in 2014, the database reports the survey information on a yearly basis and samples approximately 290,000 employer business in operation during the survey year.

More information at:


Annual Survey of Manufacturing

The Annual Survey of Manufacturers provides sample estimates of statistics for all manufacturing establishments with one or more paid employee and has been conducted annually since 2005 except for years ending in 2 and 7. It provides statistics on employment, payroll, supplemental labor costs, cost of materials consumed, operating expenses, value of shipments, value added by manufacturing, detailed capital expenditures, fuels and electric energy used, and inventories. It also provides estimates of value of shipments for 1,390 classes of manufactured products. The ASM is a sample survey of approximately 50,000 establishments selected from the census universe of 346,000 manufacturing establishments.

More information at:


Business Dynamics Statistics

The Business Dynamics Statistics (BDS) provides annual measures from 1976 to 2015 of business dynamics for the economy (standard 2-digit industrial classification sectors)  and aggregated by establishment and firm characteristics. BDS tables show key economic data related to employment (job creation and destruction, job expansions and contractions, number of establishments, establishment openings and closing, and number of startups and firm shutdowns. 2015 statistics were tabulated for according to: economy wide, state, sector, firm age, firm size, firm initial size, and metro/non metro. The goal for the next release is 2018.

More information at:


Statistics of US Businesses (SUSB)

Statistics of U.S. Businesses (SUSB) is an annual series from 1989 that provides national and subnational data on the distribution of economic data by enterprise size and industry. SUSB covers most of the country's economic activity. The series excludes data on non-employer businesses, private households, railroads, agricultural production, and most government entities. The SUSB covers all U.S. business establishments with paid employees. The key variables included are geography, industry, and enterprise size. It provides the only source of annual, complete, and consistent enterprise-level data for U.S. businesses, with industry detail.

More information at:


 Annual (SAS) and Quarterly (QSS) Services Surveys

The purpose of the SAS is to provide estimates of revenue and other measures for most traditional service industries. The survey collects data from companies whose primary business or operation is to provide services to individuals, businesses, and governments. There are current 72,000 selected business services included. The Quarterly Services Survey (QSS) is a source of service industry indicator performance providing estimates of revenue and expenses for selected service industries. The sample includes approximately 19,500 service businesses with paid employees that operate in the covered sectors.

More information at:


Characteristics of Business Owners (CBO)

The purpose of this database is to provide data that describes and compares women, minority, and non-minority male business owners and their businesses. The Minority Business Development Agency and the Small Business Administration partially funded the survey from 1977 through 1992, after which it was discontinued. Individual level characteristics data included: age, marital status, education, work experience, and veteran status. Business acquisition and financing data include method of acquiring ownership, and sources of capital. Business operations data included: owner hours worked, assets and liabilities, net income, employee and customer profiles, and volume of exports.

More information at:

Occupational Employment Statistics

The Occupational Employment Statistics (OES) program produces employment and wage estimates annually for over 800 occupations from 1988 to 2015. These estimates are available for the nation as a whole, for individual States, and for metropolitan and nonmetropolitan areas. National occupational estimates for specific industries are also available. The OES survey method is a semi-annual mail survey of non-farm industry. The sampling frame is derived from the list of establishments maintained by State Workforce Agencies for unemployment insurance purposes.

More information at:


Business Employment Statistics

Business Employment Dynamics is a set of statistics generated from the Quarterly Census of Employment and Wages program. These quarterly data series consist of gross job gains and gross job losses statistics from 1992 forward. Gross job gains and gross job losses are derived from longitudinal histories of over 6.4 million private sector employer reports out of 8.2 million total reports of employment and wages submitted by States to BLS in the fourth quarter 2002. These data help to provide a picture of the dynamic state of the labor market. Business Employment Dynamics are derived from reports of employment and wage data for workers covered by unemployment insurance (UI) and Unemployment Compensation for Federal Employees (UCFE).

More information at:


Current Population Survey

The Current Population Survey (CPS) is a monthly survey of households conducted by the Bureau of Census for the Bureau of Labor Statistics. It provides a comprehensive body of data on the labor force, employment, unemployment, persons not in the labor force, hours of work, earnings, and other demographic and labor force characteristics.

More information at:


National Longitudinal Survey of Youth

The National Longitudinal Surveys (NLS) are a set of surveys designed to gather information at multiple points in time on the labor market activities and other significant life events of several groups of men and women. Data includes variables related to mature women, young women, older men, young men, and children and young adults. Some sets of variables have data from 1979.

More information at:  

BRDIS (Business R&D and Innovation Survey)

The Business R&D and Innovation Survey (BRDIS), sponsored by the National Science Foundation, covers a variety of data from 2008 on the R&D activities of companies operating in the United States. BRDIS provides the U.S. official measure of R&D in the private sector. The five main topic areas (with several specific measures) are financial measures of R&D activity; company R&D activity funded by others; R&D employment; R&D management and strategy; and intellectual property, technology transfer, and innovation. The sample size is approximately 45,000 from a population size of approximately 2,000,000 companies.

More information at:


Survey of Scientists and Engineers (SESTAT)

The SESTAT, sponsored by the National Science Foundation is a unique source of longitudinal information from 1993 on the education, employment, and demographic characteristics of the college-educated U.S. science and engineering workforce. These data are collected through biennial surveys: The National Survey of College Graduates, The National Survey of Recent College Graduates (discontinued after 2010), and The Survey of Doctorate Recipients. It covers those with a bachelor's degree or higher who either work in or are educated in science or engineering, although some data on individuals who are not scientists or engineers are also included.

More information at:



This database holds all the statistics produced by the National Science Foundation's Survey of Industry Research and Development (SIRD) for 1953 to 2007, the last year SIRD was conducted. It has been replaced by the Business Research and Development and Innovation Survey (BRDIS). The IRIS contains tabulated statistics for selected measure such as industry R&D funding and industry R&D personnel. These statistics are broken out by such dimensions as source, industrial sector, character of work, and company size. The sample represents all R&D performing companies, whether publicly or privately held.

More information at:


National Science Foundation (NSF) Survey of Industrial Research and Development (SIRD)

The Survey of Industry Research and Development is the primary source of information on R&D performed by industry within the fifty states and the District of Columbia. Prior to 2001, completion of four items on the questionnaire was mandated by law: sales, total number of employees, total R&D, and Federally funded R&D. Beginning in 2001, response to the item that asks for the distribution of total R&D by state also was required. The final sample size is 32,000 companies. Key variables include: R&D expenditures, NAICS code, sales, company size, total employment, source of financing, character of R&D work (company or federal), geographic location, R&D scientists and engineers FTE, and type of costs.

More information at:

Business Plan Archive

Business Plan Archive is an online repository for business plans and related planning documents available for scholars and students interested in studying high-tech entrepreneurship in the Dot Com Era and beyond. Launched in June 2002 as a public repository for records of firms founded to commercialize the internet from the mid-1990s on, the Business Plan Archive permitted open access to a selection of business plans and related planning documents until December 2007.

More information at:


Global Entrepreneurship Monitor (GEM)

The Global Entrepreneurship Monitor is the world's foremost study of entrepreneurship. In numbers, GEM is: 18 years of data, 200,000+ interviews a year, 100+ economies, 500+ specialists in entrepreneurship research, 300+ academic and research institutions, and 200+ funding institutions. In each economy, GEM looks at two elements: the entrepreneurial behaviour and attitudes of individuals, and the national context and how that impacts entrepreneurship.

More information at:


Kauffman Firm Survey (KFS)

The KFS is a panel study of 4,928 businesses founded in 2004 and tracked over their early years of operation. The survey focuses on the nature of new business formation activity; characteristics of the strategy, offerings, and employment patterns of new businesses; the nature of the financial and organizational arrangements of these businesses; and the characteristics of their founders.

More information at:


Kauffman Index of Entrepreneurial Activity

The Kauffman Index of Entrepreneurship series is an umbrella of annual reports that measure U.S. entrepreneurship across national, state and top 40 metro levels. Rather than focusing on inputs, the Kauffman Index focuses primarily on entrepreneurial outputs—the actual results of entrepreneurial activity, such as new companies, business density and growth rates. The Kauffman Index series consists of three in-depth studies—Startup Activity, Main Street Entrepreneurship and Growth Entrepreneurship. The Kauffman Index of Startup Activity is an early indicator of the beginnings of entrepreneurship in the United States, focusing on new business creation, market opportunity, and startup density.

More information at:


Panel Study of Entrepreneurial Dynamics (PSED I & II) Data

The Panel Study of Entrepreneurial Dynamics (PSED) research program is designed to enhance the scientific understanding of how people start businesses. The projects provide valid and reliable data on the process of business formation based on nationally-representative samples of nascent entrepreneurs, those active in business creation. PSED I began with screening in 1998-2000 to select a cohort of 830 with three follow-up interviews. A control group of those not involved in firm creation is available for comparisons. PSED II began with screening in 2005-2006, followed by six yearly interviews. The information obtained includes data on the nature of those active as nascent entrepreneurs, the activities undertaken during the start-up process, and the characteristics of start-up efforts that become new firms.

More information at:


Small Business Administration

The Dynamic Small Business Search is an International database of firms certified by the SBA under the Business Development and HUBZone programs. Business are qualified as as "small business" based on employment/revenue information. The database provides information related to: location, government certifications, ownership/self-certifications, industry, construction/service bonding level, assurance standards, size, capabilities/exporter for small businesses in the United States.

More information at:

CleanTech PatentEdge

The CleanTech PatentEdge is an online database, updated monthly, of cleantech patents from as early as the 1980s. The patents are pre-sorted into over 150 market and technology categories, such as renewable energy generation (biofuels, wind, solar), energy storage, electric vehicles, and water filtration and desalination. The online database has multiple patent variables, including, but not limited to, top assignees, inventors, and publication date. The database contains patent publications from the US Patent Office, European Patent Office, Japan, and the World Intellectual Property Organization (international PCT patent applications).

More information at:


European Patent Office’s Worldwide Patent Statistical Database (PATSTAT)

PATSTAT contains bibliographical and legal status patent data from leading industrialised and developing countries. This is extracted from the EPO’s databases and is provided as raw data or online. The PATSTAT product line consists of three individual databases. They are available in raw data format or via PATSTAT Online, a web-based interface to the databases. With PATSTAT Online, you can run queries in the databases, conduct statistical analyses, visualise the data and download it for offline use.

More information at:


National Bureau of Economic Research (NBER) Patent Citations Data File

This database comprises detailed information on almost 3 million U.S. patents granted between January 1963 and December 1999, all citations made to these patents between 1975 and 1999 (over 16 million), and a reasonably broad match of patents to Compustat (the data set of all firms traded in the U.S. stock market).

More information at:


OECD Patent Databases

This database has four components: (1) OECD Triadic Patent Families Database covering patents filed for at the EPO, the Japan Patent Office (JPO) and granted by the USPTO that share one or more priority applications. (2) OECD REGPAT Database covering patent applications to the EPO and PCT filings linked to more than 5,500 regions using the inventors/applicants addresses (covering regions from selected countries outside the OECD area). (3) OECD Citations Database, covering citations from patents published by the EPO and the WIPO (PCT). (4) OECD "Harmonised Applicants' Names" database covering a a dictionary of applicants’ names which have been elaborated with business register data, so that it can easily be matched by all users. The data is based on applicant's for patents filed to the EPO and through PCT. The dataset is complementary to Eurostat's method for harmonising applicants names. The data starts at July 2011.

More information for all four databases at:


Patent Network Dataverse

Dataverse Network Project is a virtual web archive that allows researchers to publish, share, reference, extract, and analyze research data. In addition to offering patent information, the database

 includes disambiguation and co-authorship networks of the U.S. patent inventors.   The database covers patents granted between 1975-2010.

More information at:


UGA Patent Litigation Data

This page contains data on patent litigation decisions, the litigated patents themselves, as well as a set of random patents matched to the litigated patents. There are several papers on the webpage that discuss characteristics of the data in detail.  

More information at: 


United States Patent Office (USPTO)

The United States Patent Office provide a wide collection of information on the U.S. patent applications and approvals since 1790. In addition to it full patent text and image searches, it provides information on the initial patent applications, citation patterns, and patent assignment records.

More information at:

IPO Data by Jay Ritter

This database holds several databases covering multiple different aspects of IPO data internationally.

More information at:


Kenny-Patton IPO Database

This database is comprised of all emerging growth, or de novo, initial public offerings (IPOs) on American stock exchanges and filed with the Securities and Exchange Commission (SEC) from June 1996 through December 2010. It covers 2,287 firms. Key variables related to firms include: Name, Locations (street address, city, state, zip code), Exchange and ticker, Auditor, Year of founding, SEC Central Index Key firm identifier, and Firm SIC. Key variables related to offering include: Year of IPO, Share Volume, Initial Share Price, and Shares Outstanding at time of IPO Underwriter discount.

More information at:



EurIPO is a dataset with information about the IPOs on all the European stock exchanges since the ‘90s. The database focuses on the exchanges of the four main European economies: London Stock Exchange, Euronext, Deutsche Börse and Borsa Italiana. For these markets, the EurIPO collects a series of data from the listing prospectus such as: listing date, company establishment date, ISIN code, country of incorporation, stock exchange, market and segment of listing, industry classification, web and e-mail addresses. It also provides additional accounting data before and after IPO, information on the offer, information on ownership structure, corporate governance and intellectual capital, and up to date data for delistings and transfers.

More information at:


EDGAR (U.S. Securities and Exchange Commission)

EDGAR, the Electronic Data Gathering, Analysis, and Retrieval system, performs automated collection, validation, indexing, acceptance, and forwarding of submissions by companies and others who are required by law to file forms with the U.S. Securities and Exchange Commission (SEC). As of 1996, all public domestic companies were required to make their filings on EDGAR. Third-party filings with respect to these companies, such as tender offers and Schedules 13D, are also filed on EDGAR.

More information at:

Firm and Industry Evolution and Entrepreneurship (FIVE)

The FIVES Project aims to accelerate and broaden the reach of research on strategy, entrepreneurship, and firm and industry evolution. Data sets contributed to the FIVES Project are freely available, along with documentation. Current FIVES datasets include: Sorenson Workstation FIVE data, Henderson Photolithography data, Carroll-Swaminathan Brewery data, Thompson Shipbuilding data, Png LinkedIn Patent Inventor data, and Lieberman Chemical data.

More information at:


Duke’s Golden Goose Project

This project was initiated to uncover trends in production and use of science by U.S. corporations. It provides data on a broad range of companies, of different sizes and across many industries over a quarter of a century using novel measures of production and use of science. The data includes company listing from Compustat, scientific publications from Web of Science, and patent & non-patent literature (NPL) citations form the PatStat database from 1986 – 2006. To capture the complexity of large firms innovative activities, which can be typically organized across subsidiaries, the data has been aggregated to the ultimate-owner-parent-company level.

More information at:


Global Alliance Network Data

This data include the structure of the global technology collaboration network and the effect of a technology shock on that structure.  The data, collected and shared by Melissa Schilling, include all technology alliances from the SDC database from 1990 to 2005 (R&D alliances, cross-technology transfer alliance, cross-licensing of technology) from any type of organization, in any country, put into dl files.

More information at:


Innovative Data Sources for Economic Analysis

This website’s aim is to inform economic researchers and policy makers about new and innovative data sources and analytic tools that have the potential to improve understanding of the dynamics of U.S. economy, specifically as it relates to innovation and entrepreneurship.

More information at:


The Kauffman Foundation Data Resources

The Kauffman Foundation creates data overviews that provide guidance on how to use certain datasets. The data overviews serve several purposes:To serve as data reference guides for academics, other data providers, or anyone interested in new research developments in entrepreneurship. To provide short (up to two pages) overviews of data sets in one place that point to lots of relevant information but are highly curated and edited to ensure brevity and clarity. To be updated continually online and also used for in-person trainings – such as doctoral seminars – as a new resource in training and outreach on entrepreneurship research.

More information at:

Other Resrouces

Research Blogs and Wikis

Research Methods

The Fall issues of the Competitive Strategy IG newsletters include a teaching commentary. Please check our Spring 2017 newsletter with a commentary on “statistics require judgment: coefficients, p-values, and judgment.” A few other research methods resources of interest to our members are listed below.