TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Academic Literature on Racial and Ethnic Disparities in Reproductive Medicine in the US
      • The Data
      • How to Participate
      • PydyTuesday: A Posit collaboration with TidyTuesday
        • Data Dictionary
    • article_dat.csv
    • model_dat.csv
      • Cleaning Script

    Academic Literature on Racial and Ethnic Disparities in Reproductive Medicine in the US

    This week we’re exploring data on studies investigating racial and ethnic disparities in reproductive medicine as published in the eight highest impact peer-reviewed Ob/Gyn journals from January 1, 2010 through June 30, 2023. The data were collected as part of a review article Racial and ethnic disparities in reproductive medicine in the United States: a narrative review of contemporary high-quality evidence published in the American Journal of Obstetrics and Gynecology in January 2025.

    “There has been increasing debate around how or if race and ethnicity should be used in medical research—including the conceptualization of race as a biological entity, a social construct, or a proxy for racism. The objectives of this narrative review are to identify and synthesize reported racial and ethnic inequalities in obstetrics and gynecology (ob/gyn) and develop informed recommendations for racial and ethnic inequity research in ob/gyn.”

    A companion interactive website was published alongside the review article, and the data has also been used by college students to create data art to raise awareness about this research.

    The data provides an opportunity to critically examine how racial and ethnic disparities in reproductive medicine are framed, measured, and discussed in the academic literature.

    • Explore how race and ethnicity are categorized in these articles. Which categories are most prominent? Which are missing? How do sample sizes vary across groups?
    • Has the sentiment of article titles, abstracts, keywords, and/or aims statements changed over time?
    • What type of health outcomes have been studied? What disparities have been identified?

    Thank you to Kat Correia from Amherst College for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2025-02-25')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2025, week = 8)
    
    article_dat <- tuesdata$article_dat
    model_dat <- tuesdata$model_dat
    
    # Option 2: Read directly from GitHub
    
    article_dat <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-25/article_dat.csv')
    model_dat <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-25/model_dat.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2025-02-25')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    article_dat = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-25/article_dat.csv')
    model_dat = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-25/model_dat.csv')

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    article_dat.csv

    variable class description
    pmid double The article’s PubMed identification number
    doi character The article’s Digital Object Identifier
    jabbrv character Journal abbreviation
    journal character Journal full name
    year double Year the article was published
    month character Month the article was published
    day character Day the article was published
    title character Title of the article
    abstract character Abstract for the article
    keywords character Keywords for the article
    study_aim character Aim of the study
    study_location character Location of the study
    study_year_start double The year the study started
    study_year_end double The year the study ended
    study_type character Type of study design: prospective cohort, retrospective cohort, case-control, cross-sectional, randomized controlled trial (RCT), registry
    data_source character Source of data
    race1 character Race category that is used as the ‘Referent’ group in the study or, if no Referent group, that is presented first in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race1_ss double Sample size for race1 group
    race2 character Race category that is presented second in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race2_ss double Sample size for race2 group
    race3 character Race category that is presented third in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race3_ss double Sample size for race3 group
    race4 character Race category that is presented fourth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race4_ss double Sample size for race4 group
    race5 character Race category that is presented fifth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race5_ss double Sample size for race5 group
    race6 character Race category that is presented sixth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race6_ss double Sample size for race6 group
    race7 character Race category that is presented seventh in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race7_ss double Sample size for race7 group
    race8 character Race category that is presented eighth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    race8_ss double Sample size for race8 group
    eth1 character Ethnic category that is used as the ‘Referent’ group in the study or, if no Referent group, that is presented first in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth1_ss double Sample size for the eth1 group
    eth2 character Ethnic category that is presented second in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth2_ss double Sample size for the eth2 group
    eth3 character Ethnic category that is presented third in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth3_ss double Sample size for the eth3 group
    eth4 character Ethnic category that is presented fourth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth4_ss double Sample size for the eth4 group
    eth5 character Ethnic category that is presented fifth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth5_ss double Sample size for the eth5 group
    eth6 character Ethnic category that is presented sixth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth6_ss double Sample size for the eth6 group
    eth7 logical Ethnic category that is presented seventh in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth7_ss logical Sample size for the eth7 group
    eth8 logical Ethnic category that is presented eighth in the first table in the article, worded exactly as in the publication including letter case (e.g., capitalization)
    eth8_ss logical Sample size for the eth8 group
    access_to_care double Did the study address disparities in access to care?(1=yes, 0=no)
    treatment_received double Did the study address disparities in treatment received? (1=yes, 0=no)
    health_outcome double Did the study address disparities in health outcomes? (1=yes, 0=no)
    cancer_ovarian double Did the study address ovarian cancer? (1=yes, 0=no)
    cancer_uterine double Did the study address uterine cancer? (1=yes, 0=no)
    cancer_cervical double Did the study address cervical cancer? (1=yes, 0=no)
    cancer_vulvar double Did the study address vulvar cancer? (1=yes, 0=no)
    other_gyn_onc double Did the study address other gynecological cancer? (1=yes, 0=no)
    endo double Did the study address endometriosis? (1=yes, 0=no)
    fibroids double Did the study address fibroids? (1=yes, 0=no)
    other_gyn_surg double Did the study address other gynecologic surgery? (1=yes, 0=no)
    fert double Did the study address fertility? (1=yes, 0=no)
    matmorbmort double Did the study address maternal morbidity and mortality? (1=yes, 0=no)
    other_preg double Did the study address other pregnancy outcome? (1=yes, 0=no)
    phys_div double Did the study address physician diversity and/or training? (1=yes, 0=no)
    other double Did the study address other outcome? (1=yes, 0=no)
    covid double Did the study address COVID-19 related concerns? (1=yes, 0=no)

    model_dat.csv

    variable class description
    doi character The article’s Digital Object Identifier
    model_number double Number identifying the model
    stratified character Was the analysis stratified? (1=yes, 0=no)
    stratgrp character If the analysis was stratified, specifies the stratification group
    subanalysis character Was the analysis a subanalysis? (1=yes, 0=no)
    subgrp character If the analysis was a subanalysis, specifies the subanalysis group
    outcome character Outcome measure for the model as reported in the article
    measure character Estimated measure from the model: incidence, prevalence, odds ratio (OR), risk ratio (RR), hazards ratio (HR), absolute difference, percent, other
    measure_comments character If ‘other’ measure was selected, specifies the estimated measure from the model
    covariates character List of covariates included in the model; if the model is unadjusted, then ‘None’ is recorded
    comparison double Number identifying the comparison made within a given model number
    ref character Reference group, recorded exactly as in the publication including letter case (e.g., capitalization) and without abbreviations; if reference group is not applicable, ‘N/A’ is recorded
    compare character Comparison group, recorded exactly as in the publication including letter case (e.g., capitalization) and without abbreviations
    point double Point estimate for measure
    lower double Lower bound of 95% confidence interval for measure; if a 95% confidence interval was not provided, -99 is recorded
    upper double Upper bound of 95% confidence interval for measure; if a 95% confidence interval was not provided, -99 is recorded

    Cleaning Script

    # Clean data provided by Kat Correia. Student interns extracted this information
    # from reproductive medicine journals using duplicate data entry. 
    # After duplicate data entry discrepancies were resolved, their 
    # individual files were combined, and posted for public access 
    # as Google sheets, e.g.,:
    # https://docs.google.com/spreadsheets/d/1UuWGQxRU0wxVuOZGYnvkwKn-GdmLIj8kzmUm4KLRHQY/edit?gid=1719021104#gid=1719021104
    # No additional cleaning was necessary.
    
    article_dat <- readr::read_csv("https://kcorreia.people.amherst.edu/repro_med_disparities-article-level-data.csv")
    
    model_dat <- readr::read_csv("https://kcorreia.people.amherst.edu/repro_med_disparities-model-level-data.csv")