TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • US Government Grant Opportunities
      • The Data
      • How to Participate
        • Data Dictionary
    • grants.csv
    • grant_opportunity_details.csv
      • Cleaning Script

    US Government Grant Opportunities

    The R4DS Online Learning Community is a community of learners at all skill levels working together to improve our data-science-related skills. We offer free data-related education through book clubs and free live question-answering on our Slack, and by curating a dataset every week here at TidyTuesday.

    (NOTE: Unfortunately, since this post, the Open Collective Foundation dissolved, so this is no longer true.)

    We are now a fiscally sponsored project of Open Collective Foundation (https://opencollective.foundation), a 501(c)(3) public charity. That means donations to the R4DS Online Learning Community are now tax-deductible in the US! It also means that we are now eligible for a number of grants, including some of the grants listed on Grants.gov.

    We have exported all grants past and present from that site, and we are making them available here for you to explore and visualize. We also scraped details for all posted grants. Please let us know if you find anything interesting!

    The Data

    # Option 1: tidytuesdayR package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2023-10-03')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2023, week = 40)
    
    grants <- tuesdata$grants
    grant_opportunity_details <- tuesdata$grant_opportunity_details
    
    # Option 2: Read directly from GitHub
    
    grants <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-10-03/grants.csv')
    grant_opportunity_details <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-10-03/grant_opportunity_details.csv')

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

    Data Dictionary

    grants.csv

    variable class description
    opportunity_id integer Integer ID for this opportunity, which can be used to find details at https://www.grants.gov/web/grants/view-opportunity.html?oppId={opportunity_id}
    opportunity_number character Funding opportunity ID number
    opportunity_title character Title of the opportunity
    agency_code character Abbreviated name for the funding agency
    agency_name character Full name of the funding agency
    estimated_funding double Estimated funding amount in dollars
    expected_number_of_awards integer Expected count of awards
    grantor_contact character Information about how to contact the grantor. Often includes email address
    agency_contact_phone character Phone number for the agency when available
    agency_contact_email character Contact email address for the agency (almost always available)
    estimated_post_date date When the opportunity is/was expected to be posted
    estimated_application_due_date date Date by which applications are/were expected to be received
    posted_date date When the opportunity was posted
    close_date date When the opportunity was closed or will close
    last_updated_date_time datetime Date and time when the opportunity was updated
    version character Integer version number of the opportunity
    opportunity_status character Whether the opportunity is Archived, Closed, Forecasted, or Posted

    grant_opportunity_details.csv

    variable class description
    opportunity_id integer Integer ID for this opportunity, which can be used to find these details at https://www.grants.gov/web/grants/view-opportunity.html?oppId={opportunity_id}
    funding_opportunity_number character Funding opportunity ID number
    funding_opportunity_title character Title of the opportunity
    opportunity_category character “Continuation”, “Discretionary”, “Earmark”, “Mandatory”, or “Other”
    opportunity_category_explanation character More details about why the opportunity is in that category (mostly details about Other)
    expected_number_of_awards integer Expected count of awards
    cost_sharing_or_matching_requirement logical Whether the opportunity requires a cost-sharing or cost-matching agreement
    version integer Integer version number of the opportunity
    posted_date date When the opportunity was posted
    last_updated_date date When the opportunity was updated
    original_closing_date_for_applications date When the opportunity was originally scheduled to close
    current_closing_date_for_applications date When the opportunity is currently scheduled to close
    archive_date date When the opportunity will be archived
    estimated_total_program_funding double Estimated funding amount in dollars
    award_ceiling double Maximum individual award amount in dollars
    award_floor double Minimum individual award amount in dollars
    agency_name character Full name of the granting agency
    description character Free text description of the opportunity. Sometimes includes tables or potentially other figures, which did not necessarily scrape accurately
    link_to_additional_information character The text of any links to additional information (unlikely to be useful in this format)
    grantor_contact_information character Information about who to contact about the grant; may have contained links, which are not included in the scraped data
    eligibility_individuals logical Are individuals eligible?
    eligibility_state_governments logical Are state governments eligible?
    eligibility_county_governments logical Are county governments eligible?
    eligibility_independent_school_districts logical Are independent school districts eligible?
    eligibility_city_or_township_governments logical Are city or township governments eligible?
    eligibility_special_district_governments logical Are special district governments eligible?
    eligibility_native_american_tribal_governments_federally_recognized logical Are Native American tribal governments (Federally recognized) eligible?
    eligibility_native_american_tribal_organizations_other logical Are Native American tribal organizations (other than Federally recognized tribal governments) eligible?
    eligibility_nonprofits_501c3 logical Are nonprofits having a 501(c)(3) status with the IRS, other than institutions of higher education eligible?
    eligibility_nonprofits_non_501c3 logical Are nonprofits that do not have a 501(c)(3) status with the IRS, other than institutions of higher education eligible?
    eligibility_for_profit logical Are for profit organizations other than small businesses eligible?
    eligibility_small_businesses logical Are small businesses eligible?
    eligibility_private_institutions_of_higher_education logical Are private institutions of higher education eligible?
    eligibility_public_institutions_of_higher_education logical Are public and State controlled institutions of higher education eligible?
    eligibility_public_indian_housing_authorities logical Are public housing authorities and Indian housing authorities eligible?
    eligibility_others logical Are other groups eligible?
    eligibility_unrestricted logical Is eligibility unrestricted?
    additional_information_on_eligibility character Additional details about eligibility
    funding_cooperative_agreement logical Is the opportunity funded via a cooperative agreement?
    funding_grant logical Is the opportunity funded via a grant?
    funding_procurement_contract logical Is the opportunity funded via a procurement contract?
    funding_other logical Is the opportunity funded via some other instrument?
    cfda_numbers character Catalog of Federal Domestic Assistance number(s) (see https://sam.gov/content/assistance-listings)
    category_agriculture logical Category: Agriculture
    category_arts logical Category: Arts (see “Cultural Affairs” in CFDA)
    category_business logical Category: Business and Commerce
    category_community_development logical Category: Community Development
    category_consumer_protection logical Category: Consumer Protection
    category_disaster logical Category: Disaster Prevention and Relief
    category_education logical Category: Education
    category_employment logical Category: Employment, Labor and Training
    category_energy logical Category: Energy
    category_environment logical Category: Environment
    category_food logical Category: Food and Nutrition
    category_health logical Category: Health
    category_housing logical Category: Housing
    category_humanities logical Category: Humanities (see “Cultural Affairs” in CFDA)
    category_iija logical Category: Infrastructure Investment and Jobs Act (IIJA)
    category_income_security logical Category: Income Security and Social Services
    category_info logical Category: Information and Statistics
    category_law logical Category: Law, Justice and Legal Services
    category_natural_resources logical Category: Natural Resources
    category_opportunity_zone logical Category: Opportunity Zone Benefits
    category_regional_development logical Category: Regional Development
    category_science logical Category: Science and Technology and other Research and Development
    category_transportation logical Category: Transportation
    category_other logical Category: Other (see category_explanation for clarification)
    category_explanation character More details about the funding category or categories

    Cleaning Script

    library(tidyverse)
    library(janitor)
    library(here)
    library(fs)
    
    # Requires dev rvest from this draft pull request:
    # https://github.com/tidyverse/rvest/pull/362
    #
    # pak::pak("tidyverse/rvest#362")
    library(rvest)
    library(chromote)
    
    working_dir <- here::here("data", "2023", "2023-10-03")
    
    # I wanted to be able to download this CSV periodically, so I found a way to do
    # it with {chromote}.
    
    url <- "https://www.grants.gov/web/grants/search-grants.html"
    # This probably SHOULD be done in chromote directly, but I'm using this
    # rvest::read_html_live() function later and became familiar-enough with it.
    live_page <- rvest::read_html_live(url)
    # I used sleep()s to make sure the page was ready to continue.
    Sys.sleep(10)
    
    js_export_dataset <- readLines(
      fs::path(working_dir, "export_dataset.js")
    ) |> 
      paste(collapse = "\n")
    
    live_page$session$Browser$setDownloadBehavior(behavior = "allow", downloadPath = tempdir())
    
    live_page$session$Runtime$evaluate(
      js_export_dataset,
      wait_ = TRUE,
      awaitPromise = TRUE
    )
    
    # I can't figure out how to await promises with JS just yet, but I can make sure
    # the file is there and isn't continuing to save.
    grants_path <- fs::dir_info(tempdir(), glob = "*/grants-gov*.csv") |> 
      dplyr::arrange(desc(modification_time)) |> 
      head(1) |> 
      dplyr::pull(path)
    while (!length(grants_path)) {
      Sys.sleep(1)
      grants_path <- fs::dir_info(tempdir(), glob = "*/grants-gov*.csv") |> 
        dplyr::arrange(desc(modification_time)) |> 
        head(1) |> 
        dplyr::pull(path)
    }
    grants_size <- fs::file_size(grants_path)
    grants_ready <- FALSE
    while (!grants_ready) {
      Sys.sleep(1)
      grants_ready <- grants_size == fs::file_size(grants_path)
    }
    
    live_page$session$close()
    
    if (grants_size < 20000000) {
      cli::cli_abort("Grants csv did not download properly.")
    }
    
    # Many rows have extra commas at the end, which cause confusion but otherwise
    # don't damage the data. You can probably safely ignore the warnings.
    grants <- 
      grants_path |> 
      readr::read_csv(
        col_types = cols(
          `OPPORTUNITY NUMBER` = col_character(),
          `OPPORTUNITY TITLE` = col_character(),
          `AGENCY CODE` = col_character(),
          `AGENCY NAME` = col_character(),
          `ESTIMATED FUNDING` = col_character(),
          `EXPECTED NUMBER OF AWARDS` = col_character(),
          `GRANTOR CONTACT` = col_character(),
          `AGENCY CONTACT PHONE` = col_character(),
          `AGENCY CONTACT EMAIL` = col_character(),
          `ESTIMATED POST DATE` = col_character(),
          `ESTIMATED APPLICATION DUE DATE` = col_character(),
          `POSTED DATE` = col_character(),
          `CLOSE DATE` = col_character(),
          `LAST UPDATED DATE/TIME` = col_character(),
          VERSION = col_character(),
          `OPPORTUNITY STATUS` = col_character()
        )
      ) |> 
      janitor::clean_names() |> 
      dplyr::mutate(
        estimated_funding = case_match(
          estimated_funding,
          "Not available" ~ NA, 
          .default = estimated_funding
        ) |> 
          stringr::str_remove_all(",") |> 
          as.double(),
        last_updated_date_time = lubridate::mdy_hms(last_updated_date_time),
        opportunity_status = stringr::str_remove_all(opportunity_status, ",")
      ) |> 
      dplyr::mutate(
        dplyr::across(
          dplyr::ends_with("_date"),
          lubridate::mdy
        )
      ) |> 
      tidyr::separate_wider_regex(
        opportunity_number,
        c(
          ".+oppId=",
          opportunity_id = "\\d+",
          "\",\"",
          opportunity_number = "[^\"]+",
          "\"\\)"
        )
      ) |> 
      dplyr::mutate(opportunity_id = as.integer(opportunity_id))
    
    # Create a couple helper functions to get the grant details.
    extract_synopsis_table <- function(html_document, div_id) {
      headings <- 
        html_document |> 
        rvest::html_elements(glue::glue("#{div_id} > table > tbody > tr > th")) |> 
        rvest::html_text2()
      bodies <- 
        html_document |> 
        rvest::html_elements(glue::glue("#{div_id} > table > tbody > tr > td")) |> 
        rvest::html_text2()
      bodies  <- as.list(bodies)
      names(bodies) <- headings
      tibble::as_tibble(bodies)
    }
    
    get_grant_details <- function(opportunity_id, sleep = 0.5) {
      # Let's put an escape hatch in, in case something just won't load.
      if (sleep > 10) {
        return(
          tibble::tibble(
            opportunity_id = opportunity_id,
            synopsis_failed = TRUE
          )
        )
      }
      
      url <- glue::glue(
        "https://www.grants.gov/web/grants/view-opportunity.html?oppId={opportunity_id}"
      )
      
      live_page <- rvest::read_html_live(url)
      
      # If anybody can help me figure out how to make this wait for promise
      # evaluation more correctly, please let me know!
      Sys.sleep(sleep)
      
      iframe_html <- 
        live_page$session$Runtime$evaluate(
          "document.querySelector('iframe').contentDocument.documentElement.innerHTML",
          wait_ = TRUE,
          awaitPromise = TRUE,
          returnByValue = TRUE
        )
      
      html_document <- 
        iframe_html$result$value |> 
        rvest::read_html()
      
      general_info_left <- 
        html_document |> 
        extract_synopsis_table("synopsisDetailsGeneralInfoTableLeft")
      
      general_info_right <-
        html_document |>
        extract_synopsis_table("synopsisDetailsGeneralInfoTableRight")
    
      eligibility <-
        html_document |>
        extract_synopsis_table("synopsisDetailsEligibilityTable")
    
      additional_info <-
        html_document |>
        extract_synopsis_table("synopsisDetailsAdditionalInfoTable")
    
      synopsis <-
        dplyr::bind_cols(
          general_info_left,
          general_info_right,
          eligibility,
          additional_info
        ) |>
        janitor::clean_names()
      
      live_page$session$close()
      
      if(nrow(synopsis)) {
        synopsis <- dplyr::bind_cols(
          tibble::tibble(opportunity_id = opportunity_id),
          synopsis
        )
        return(synopsis)
      }
      
      return(
        get_grant_details(opportunity_id, sleep = sleep + 0.5)
      )
    }
    
    known_details <- tibble::tibble(opportunity_id = integer(), version = integer())
    to_parse <- 
      grants |> 
      dplyr::filter(opportunity_status == "Posted") |> 
      dplyr::pull(opportunity_id)
    
    if (fs::file_exists(fs::path(working_dir, "grant_opportunity_details.csv"))) {
      known_details <- 
        readr::read_csv(
          fs::path(working_dir, "grant_opportunity_details.csv"),
          show_col_types = FALSE
        ) |> 
        dplyr::mutate(
          opportunity_id = as.integer(opportunity_id)
        )
      to_parse <- 
        grants |> 
        dplyr::filter(opportunity_status == "Posted") |> 
        dplyr::mutate(
          version = stringr::str_remove(version, "Synopsis ") |> 
            readr::parse_number()
        ) |> 
        dplyr::anti_join(known_details, by = c("opportunity_id", "version")) |> 
        dplyr::pull(opportunity_id)
    }
    
    grant_opportunity_details <- 
      to_parse |>
      purrr::map(get_grant_details) |> 
      purrr::list_rbind()
    
    if (nrow(grant_opportunity_details)) {
      grant_opportunity_details <- 
        grant_opportunity_details |> 
        dplyr::mutate(
          category_explanation = stringr::str_squish(category_explanation),
          expected_number_of_awards = readr::parse_integer(expected_number_of_awards),
          cost_sharing_or_matching_requirement = cost_sharing_or_matching_requirement == "Yes",
          version = stringr::str_remove(version, "Synopsis ") |> 
            readr::parse_integer(),
          dplyr::across(
            dplyr::contains("date"),
            \(x) {
              x |> 
                stringr::str_extract("\\w+ \\d+, \\d+") |>
                stringr::str_squish() |> 
                lubridate::mdy()
            }
          ),
          dplyr::across(
            c(
              "estimated_total_program_funding",
              "award_ceiling",
              "award_floor"
            ),
            readr::parse_number
          )
        ) |> 
        # Extract information about the various eligibility groups for easier filtering.
        dplyr::mutate(
          eligibility_individuals = stringr::str_detect(eligible_applicants, "Individuals"),
          eligibility_state_governments = stringr::str_detect(eligible_applicants, "State governments"),
          eligibility_county_governments = stringr::str_detect(eligible_applicants, "County governments"),
          eligibility_independent_school_districts = stringr::str_detect(eligible_applicants, "County governments"),
          eligibility_city_or_township_governments = stringr::str_detect(eligible_applicants, "City or township governments"),
          eligibility_special_district_governments = stringr::str_detect(eligible_applicants, "Special district governments"),
          eligibility_native_american_tribal_governments_federally_recognized = stringr::str_detect(eligible_applicants, stringr::fixed("Native American tribal governments (Federally recognized)")),
          eligibility_native_american_tribal_organizations_other = stringr::str_detect(eligible_applicants, stringr::fixed("Native American tribal organizations (other than Federally recognized tribal governments)")),
          eligibility_nonprofits_501c3 = stringr::str_detect(eligible_applicants, stringr::fixed("Nonprofits having a 501(c)(3) status with the IRS, other than institutions of higher education")),
          eligibility_nonprofits_non_501c3 = stringr::str_detect(eligible_applicants, stringr::fixed("Nonprofits that do not have a 501(c)(3) status with the IRS, other than institutions of higher education")),
          eligibility_for_profit = stringr::str_detect(eligible_applicants, "For profit organizations other than small businesses"),
          eligibility_small_businesses = stringr::str_detect(eligible_applicants, "Small businesses"),
          eligibility_private_institutions_of_higher_education = stringr::str_detect(eligible_applicants, "Private institutions of higher education"),
          eligibility_public_institutions_of_higher_education = stringr::str_detect(eligible_applicants, "Public and State controlled institutions of higher education"),
          eligibility_public_indian_housing_authorities = stringr::str_detect(eligible_applicants, stringr::fixed("Public housing authorities/Indian housing authorities")),
          eligibility_others = stringr::str_detect(eligible_applicants, stringr::fixed("Others (see text field entitled \"Additional Information on Eligibility\" for clarification)")),
          eligibility_unrestricted = stringr::str_detect(eligible_applicants, stringr::fixed("Unrestricted (i.e., open to any type of entity above), subject to any clarification in text field entitled \"Additional Information on Eligibility\""))
        ) |> 
        dplyr::relocate(additional_information_on_eligibility, .after = eligibility_unrestricted) |>
        dplyr::select(-eligible_applicants) |> 
        # Extract information about the various funding_instrument_types for easier filtering.
        dplyr::mutate(
          funding_cooperative_agreement = stringr::str_detect(funding_instrument_type, "Cooperative Agreement"),
          funding_grant = stringr::str_detect(funding_instrument_type, "Grant"),
          funding_procurement_contract = stringr::str_detect(funding_instrument_type, "Procurement Contract"),
          funding_other = stringr::str_detect(funding_instrument_type, "Other")
        ) |> 
        dplyr::select(-funding_instrument_type) |> 
        # Clean up the CFDA numbers, at least somewhat.
        dplyr::mutate(
          cfda_numbers = stringr::str_extract_all(cfda_number_s, "\\d{2}\\.\\d{3} -- \\D+") |> 
            purrr::map_chr(paste, collapse = " | ")
        ) |> 
        dplyr::select(-cfda_number_s) |>
        # Clean up the category_of_funding_activity, at least somewhat.
        dplyr::mutate(
          category_agriculture = stringr::str_detect(category_of_funding_activity, "Agriculture"),
          category_arts = stringr::str_detect(category_of_funding_activity, stringr::fixed("Arts (see \"Cultural Affairs\" in CFDA)")),
          category_business = stringr::str_detect(category_of_funding_activity, "Business and Commerce"),
          category_community_development = stringr::str_detect(category_of_funding_activity, "Community Development"),
          category_consumer_protection = stringr::str_detect(category_of_funding_activity, "Consumer Protection"),
          category_disaster = stringr::str_detect(category_of_funding_activity, "Disaster Prevention and Relief"),
          category_education = stringr::str_detect(category_of_funding_activity, "Education"),
          category_employment = stringr::str_detect(category_of_funding_activity, "Employment, Labor and Training"),
          category_energy = stringr::str_detect(category_of_funding_activity, "Energy"),
          category_environment = stringr::str_detect(category_of_funding_activity, "Environment"),
          category_food = stringr::str_detect(category_of_funding_activity, "Food and Nutrition"),
          category_health = stringr::str_detect(category_of_funding_activity, "Health"),
          category_housing = stringr::str_detect(category_of_funding_activity, "Housing"),
          category_humanities = stringr::str_detect(category_of_funding_activity, stringr::fixed("Humanities (see \"Cultural Affairs\" in CFDA)")),
          category_iija = stringr::str_detect(category_of_funding_activity, stringr::fixed("Infrastructure Investment and Jobs Act (IIJA)")),
          category_income_security = stringr::str_detect(category_of_funding_activity, "Income Security and Social Services"),
          category_info = stringr::str_detect(category_of_funding_activity, "Information and Statistics"),
          category_law = stringr::str_detect(category_of_funding_activity, "Law, Justice and Legal Services"),
          category_natural_resources = stringr::str_detect(category_of_funding_activity, "Natural Resources"),
          category_opportunity_zone = stringr::str_detect(category_of_funding_activity, "Opportunity Zone Benefits"),
          category_regional_development = stringr::str_detect(category_of_funding_activity, "Regional Development"),
          category_science = stringr::str_detect(category_of_funding_activity, "Science and Technology and other Research and Development"),
          category_transportation = stringr::str_detect(category_of_funding_activity, "Transportation"),
          category_other = stringr::str_detect(category_of_funding_activity, stringr::fixed("Other (see text field entitled \"Explanation of Other Category of Funding Activity\" for clarification)"))
        ) |> 
        dplyr::relocate(category_explanation, .after = category_other) |> 
        dplyr::select(-category_of_funding_activity) |> 
        dplyr::select(-document_type)
    }
    
    if (nrow(known_details)) {
      grant_opportunity_details <- 
        known_details |> 
        dplyr::bind_rows(grant_opportunity_details) |> 
        dplyr::arrange(opportunity_id, desc(version)) |> 
        dplyr::distinct(opportunity_id, .keep_all = TRUE)
    }
    
    readr::write_csv(
      grants,
      fs::path(working_dir, "grants.csv")
    )
    readr::write_csv(
      grant_opportunity_details,
      fs::path(working_dir, "grant_opportunity_details.csv")
    )