by Matt Marx and Aaron Fuegi

Understanding the scientific heritage of innovation has been an objective of scholars for decades. Where do inventions come from? How is academic research translated into commercial products? For the past quarter century, researchers have used citations from patents to try to answer these and related questions.

Most of the time, scholars have used citations to other patents for this task. Although citations to other patents can yield some insight, most of them are added either by patent attorneys or patent examiners and don’t give a lot of insight into what the inventors themselves were thinking. But patents also contain citations to scientific articles, which are largely added by the inventors themselves.

So why haven’t researchers used patent citations to scientific articles? Citations to patents can easily captured by patent number, but citations to scientific articles are hardly so simple. There’s no standard format or structure, so the citation might have the author first or the title first. The author’s name might be misspelled, or the title might be missing. Even if the title is included, it might be truncated or abbreviated. Page, volume, and issue numbers may or may not be included.

Extracting the citations and matching them to scientific articles represents an enormous computational task, one that few researchers want to or are able to undertake. There have been previous attempts to match patent citations to scientific articles, but these were generally done using proprietary datasets such as the Web of Science or Scopus and can’t necessarily be shared.

Our approach was to extract all of the patent citations to scientific articles and match them to two open-source datasets: PubMed and the Microsoft Academic Graph (MAG). For those not familiar with MAG, think of it as Google Scholar but you can download the data in bulk. The matches can be accessed at and have been downloaded nearly 10,000 times to date.

We hope you find it useful in your work and merely ask that you cite this SMJ paper (

About the authors:

Matt Marx ( is an associate professor at the Boston University Questrom School of Business.

Aaron Fuegi ( is a Senior Graphics Analyst/Consultant at the Research Computing Services Group at Boston University.


Published Date
22 May 2020

Article Type
Article Summary/Abstract


Sign up to receive updates on the latest research, events, and SMS news.