Characterizing in-text citations in scientific articles: A large-scale analysis

Kevin W. Boyack*, Nees Jan van Eck, Giovanni Colavizza, Ludo Waltman

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

116 Citations (Scopus)

Abstract

We report characteristics of in-text citations in over five million full text articles from two large databases – the PubMed Central Open Access subset and Elsevier journals – as functions of time, textual progression, and scientific field. The purpose of this study is to understand the characteristics of in-text citations in a detailed way prior to pursuing other studies focused on answering more substantive research questions. As such, we have analyzed in-text citations in several ways and report many findings here. Perhaps most significantly, we find that there are large field-level differences that are reflected in position within the text, citation interval (or reference age), and citation counts of references. In general, the fields of Biomedical and Health Sciences, Life and Earth Sciences, and Physical Sciences and Engineering have similar reference distributions, although they vary in their specifics. The two remaining fields, Mathematics and Computer Science and Social Science and Humanities, have different reference distributions from the other three fields and between themselves. We also show that in all fields the numbers of sentences, references, and in-text mentions per article have increased over time, and that there are field-level and temporal differences in the numbers of in-text mentions per reference. A final finding is that references mentioned only once tend to be much more highly cited than those mentioned multiple times.

Original languageEnglish
JournalJournal of Informetrics
Volume12
Issue number1
Pages (from-to)59-73
Number of pages15
ISSN1751-1577
DOIs
Publication statusPublished - 2018
Externally publishedYes

Bibliographical note

Funding Information:
Kevin Boyack and Giovanni Colavizza both thank CWTS for hosting them as visiting scholars, during which time most of this work was performed. We thank Mike Patek of SciTech Strategies, Inc. for extraction and fielding of the full text from PubMed Central, and Richard Klavans and Vincent Traag for helpful discussion on our work. Giovanni Colavizza is funded by Swiss National Fund grant number P1ELP2_168489 .

Publisher Copyright:
© 2017 Elsevier Ltd

Keywords

  • Citation counts
  • Citation position analysis
  • Field-level analysis
  • In-text citations
  • Reference age

Cite this