State of Crossref metadata by member country
Crossref is a central pillar of the global research ecosystem, running open infrastructure to create a lasting and reusable scholarly record that underpins open science. While many of us know Crossref for providing Digital Object Identifiers (DOIs), they also maintain a massive repository of metadata, which is the essential data about research that makes it discoverable, linkable, and reusable.
This dataset provides a granular look at how that metadata was populated across the globe from December 2018 to April 2026 (with monthly granularity beginning in January 2025). It is split into two files:
- Member participation statistics by country: A look at the Crossref members registering DOIs, broken down by region and country, highlighting the geographical diversity of the publishing community.
- Metadata coverage statistics: A look at metadata completion at the level of individual outputs, also broken down by region and country, and also by content type. This details the connectedness of research, including adoption metrics for citations, funding, ORCID IDs, and ROR IDs.
By analyzing these files, you can explore themes of global research equity, the adoption of modern publishing standards, and the varying levels of metadata richness across different corners of the world.
By shifting the lens from global aggregates to country-level insights, this dataset provides a new look at the global landscape of scholarly communication. It serves as a critical benchmark for Crossref’s Research Nexus vision — the creation of a rich, reusable open network of relationships connecting research organizations, people, and actions — enabling the community to track progress toward a more transparent and interconnected global research record.
- Which countries or regions show the fastest growth in metadata “richness” over time?
- How does the connectedness of research vary across different work types within a single country?
- Which regions are leading the way in adopting the Research Nexus vision?
Thank you to Alexandre Bédard-Vallée from Crossref for curating this week’s dataset.
The Data
# Using R
# Option 1: tidytuesdayR R package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2026-05-19')
## OR
tuesdata <- tidytuesdayR::tt_load(2026, week = 20)
member_participation_stats_by_country <- tuesdata$member_participation_stats_by_country
metadata_coverage_stats_by_country <- tuesdata$metadata_coverage_stats_by_country
# Option 2: Read directly from GitHub
member_participation_stats_by_country <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/member_participation_stats_by_country.csv')
metadata_coverage_stats_by_country <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/metadata_coverage_stats_by_country.csv')# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday
import pydytuesday
# Download files from the week, which you can then read in locally
pydytuesday.get_date('2026-05-19')
# Option 2: Read directly from GitHub and assign to an object
member_participation_stats_by_country = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/member_participation_stats_by_country.csv')
metadata_coverage_stats_by_country = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/metadata_coverage_stats_by_country.csv')# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
using TidierTuesday
# Download datasets for the week, and load them as a NamedTuple of DataFrames
data = tt_load("2026-05-19")
# Option 2: Read directly from GitHub and assign to an object with TidierFiles
member_participation_stats_by_country = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/member_participation_stats_by_country.csv")
metadata_coverage_stats_by_country = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/metadata_coverage_stats_by_country.csv")
# Option 3: Read directly from Github and assign without Tidier dependencies
member_participation_stats_by_country = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/member_participation_stats_by_country.csv", DataFrame)
metadata_coverage_stats_by_country = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-19/metadata_coverage_stats_by_country.csv", DataFrame)How to Participate
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
- Submit your own dataset!
PydyTuesday: A Posit collaboration with TidyTuesday
- Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
- Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
- Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.
Data Dictionary
member_participation_stats_by_country.csv
| variable | class | description |
|---|---|---|
| current_up_to | date | The latest DOI submission date that was taken into account when computing statistics. All Crossref works submitted or updated until this date are accounted for in these statistics. |
| region_id | character | Three-letter code identifying the region the member is located in. These codes are sourced from the World Bank taxonomy. |
| iso3_code | character | Three-letter code identifying the country the member is located in. These codes are sourced from the ISO 3166-1 alpha-3 taxonomy. |
| total_members | double | Number of members with at least one registered DOI. |
| deposits_ref | double | Number of members that have deposited at least one work with references. |
| deposits_abstract | double | Number of members that have deposited at least one work with an abstract. |
| deposits_license | double | Number of members that have deposited at least one work with license metadata. |
| deposits_crossmark | double | Number of members that have deposited at least one work with a Crossmark update policy. |
| deposits_updates | double | Number of members that have deposited at least one update record. |
| deposits_textmining | double | Number of members that have deposited at least one work with text-mining information. |
| deposits_status_info | double | Number of members that have deposited at least one work with preprint status information. |
| acknowledges_funding | double | Number of members that have deposited at least one work with funding metadata. |
| deposits_funder_id | double | Number of members that have deposited at least one work acknowledging a funder ID. |
| deposits_grant_id | double | Number of members that have deposited at least one work with a grant ID. |
| deposits_orcid | double | Number of members that have deposited at least one work with an ORCID ID. |
| deposits_orcid_for_authors | double | Number of members that have deposited at least one work with an ORCID ID for an author. |
| deposits_orcid_for_chairs | double | Number of members that have deposited at least one work with an ORCID ID for a chair. |
| deposits_orcid_for_editors | double | Number of members that have deposited at least one work with an ORCID ID for an editor. |
| deposits_orcid_for_translators | double | Number of members that have deposited at least one work with an ORCID ID for a translator. |
| deposits_ror_id | double | Number of members that have deposited at least one work with a ROR ID. |
| deposits_ror_id_for_affiliations | double | Number of members that have deposited at least one work with a ROR ID for an affiliation. |
| deposits_ror_id_for_funders | double | Number of members that have deposited at least one work with a ROR ID for a funder. |
| deposits_ror_id_for_institutions | double | Number of members that have deposited at least one work with a ROR ID for an institution. |
metadata_coverage_stats_by_country.csv
| variable | class | description |
|---|---|---|
| current_up_to | date | The latest DOI submission date that was taken into account when computing statistics. All Crossref works submitted or updated until this date are accounted for in these statistics. |
| region_id | character | Three-letter code identifying the region the registering members are located in. These codes are sourced from the World Bank taxonomy. |
| iso3_code | character | Three-letter code identifying the country the registering members are located in. These codes are sourced from the ISO 3166-1 alpha-3 taxonomy. |
| document_type | character | The type of work. Based off the type field of the Crossref REST API, with book types () merged under book. |
| document_subtype | character | The subtype of work for which statistics are computed. Applies to type posted_content only, null for all others. |
| n_dois | double | Total number of works of the specified type and subtype, registered by members in the country. |
| with_ref | double | Number of works with at least one reference. |
| with_doi_ref | double | Number of works with at least one reference including a DOI. |
| references | double | Number of references from these works. |
| references_with_dois | double | Number of references with DOIs from these works. |
| references_with_doi_asserted_by_publisher | double | Number of references with DOIs asserted by the publisher from these works. |
| references_with_doi_asserted_by_crossref | double | Number of references with DOIs asserted by Crossref from these works. |
| citations_received | double | Number of citations received by these works, from other Crossref-registered works only. |
| with_abstract | double | Number of works with an abstract. |
| with_license | double | Number of works with license information. |
| with_crossmark | double | Number of works with a Crossmark update policy registered. |
| with_crossmark_update | double | Number of works that have received at least one Crossmark update. |
| crossmark_updates | double | Number of update DOIs. |
| update_assertions | double | Number of update assertions made by all update DOIs. |
| update_assertions_from_publisher | double | Number of update assertions registered by a Crossref member. |
| update_assertions_from_retraction_watch | double | Number of update assertions registered through Retraction Watch. |
| with_textmining | double | Number of works with at least one registered text-mining link. |
| with_status_info | double | Number of works with preprint status metadata. |
| acknowledges_funding | double | Number of works that include any kind of funding metadata. |
| award_number_assertions | double | Number of acknowledged award numbers. |
| with_grant_id | double | Number of works that include at least one grant ID. |
| grant_id_assertions | double | Number of acknowledged grant IDs. |
| with_funder_id | double | Number of works that include at least one funder ID. |
| funder_id_assertions | double | Number of acknowledged funder IDs. |
| with_orcid | double | Number of works that include at least one ORCID ID. |
| with_orcid_for_authors | double | Number of works that include at least one ORCID ID for an author. |
| with_orcid_for_chairs | double | Number of works that include at least one ORCID ID for a chair. |
| with_orcid_for_editors | double | Number of works that include at least one ORCID ID for an editor. |
| with_orcid_for_translators | double | Number of works that include at least one ORCID ID for a translator. |
| orcid_assertions | double | Number of ORCID ID assertions. |
| orcid_for_authors_assertions | double | Number of ORCID ID assertions for authors. |
| orcid_for_chairs_assertions | double | Number of ORCID ID assertions for chairs. |
| orcid_for_editors_assertions | double | Number of ORCID ID assertions for editors. |
| orcid_for_translators_assertions | double | Number of ORCID ID assertions for translators. |
| with_ror_id | double | Number of works that include at least one ROR ID. |
| with_ror_id_for_affiliations | double | Number of works that include at least one ROR ID for affiliations. |
| with_ror_id_for_institutions | double | Number of works that include at least one ROR ID for institutions. |
| with_ror_id_for_funders | double | Number of works that include at least one ROR ID for funders. |
| ror_id_assertions | double | Number of ROR ID assertions. |
| ror_id_for_affiliations_assertions | double | Number of ROR ID assertions for affiliations. |
| ror_id_for_institutions_assertions | double | Number of ROR ID assertions for institutions. |
| ror_id_for_funders_assertions | double | Number of ROR ID assertions for funders. |
| preprint_to_article_links | double | Number of links registered between a work of type posted-content and subtype preprint and a published work (e.g., journal-article). |
| retractions | double | Number of retracted works. |
Cleaning Script
# Clean data provided by Crossref. No cleaning was necessary.
metadata_coverage_stats_by_country <- readr::read_csv("https://zenodo.org/api/records/19928426/files/metadata_coverage_stats_by_country.csv/content")
member_participation_stats_by_country <- readr::read_csv("https://zenodo.org/api/records/19928426/files/member_participation_stats_by_country.csv/content")