TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Extreme Weather Attribution Studies
      • The Data
      • How to Participate
        • PydyTuesday: A Posit collaboration with TidyTuesday
      • Data Dictionary
        • attribution_studies.csv
        • attribution_studies_raw.csv
      • Cleaning Script

    Extreme Weather Attribution Studies

    This week we’re exploring extreme weather attribution studies. The dataset comes from Carbon Brief’s article Mapped: How climate change affects extreme weather around the world. An in-depth exploration of the evolution of extreme weather attribution science can be found in this Q & A article.

    The data was last updated in November 2024 and single studies that cover multiple events or locations are separated out into individual entries when possible.

    Attribution studies calculate whether, and by how much, climate change affected the intensity, frequency or impact of extremes - from wildfires in the US and drought in South Africa through to record-breaking rainfall in Pakistan and typhoons in Taiwan.

    Some questions you might explore:

    • How do attribution study publications evolve over time? What about rapid attribution studies?
    • What type of extreme event is more frequently the subject of an attribution study?
    • In which regions are most studies focused?
    • Is there a trend in how climate change influences different types of extreme weather?

    Thank you to Rajo for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2025-08-12')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2025, week = 32)
    
    attribution_studies <- tuesdata$attribution_studies
    attribution_studies_raw <- tuesdata$attribution_studies_raw
    
    # Option 2: Read directly from GitHub
    
    attribution_studies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv')
    attribution_studies_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2025-08-12')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    attribution_studies = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv')
    attribution_studies_raw = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv')
    # Using Julia
    # Option 1: TidierTuesday.jl library
    ## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
    
    using TidierTuesday
    
    # Download files from the week, which you can then read in locally
    download_dataset('2025-08-12')
    
    # Option 2: Read directly from GitHub and assign to an object with TidierFiles
    
    attribution_studies = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv")
    attribution_studies_raw = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv")
    
    # Option 3: Read directly from Github and assign without Tidier dependencies
    attribution_studies = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv", DataFrame)
    attribution_studies_raw = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv", DataFrame)

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    attribution_studies.csv

    variable class description
    event_name character The name or description of the extreme weather event studied.
    event_period character The specific time period when the event occurred (extracted from the raw event name).
    event_year character The year(s) or year range when the event occurred.
    study_focus character Whether the study focused on a specific event or long-term trends.
    iso_country_code character Three-character ISO country code(s), with multiple countries separated by commas for multi-country studies (e.g.: “KEN, SOM”).
    cb_region character The geographic region classification used by Carbon Brief (Based on UN classification).
    event_type character The type of extreme weather event or trend discussed in the study.
    classification character How climate change has affected the studied event: “More severe or more likely to occur”, “No discernible human influence”, “Insufficient data/inconclusive”, “Decrease, less severe or less likely to occur”.
    summary_statement character The authors’ key findings.
    publication_year double The year when the study was published.
    citation character The full citation for the study.
    source character The source where the study was published.
    rapid_study character Whether this was a rapid attribution study or not: analysis completed within days of the event occurring (“yes” or “no”).
    link character The URL link to the original article.

    attribution_studies_raw.csv

    variable class description
    name character The name or description of the extreme weather event studied and period when it occurred if available.
    event_year_trend character Time period when the event occurred (e.g., “2014”, “2004-05”, “mid-1990s”) or indication if the study focuses on long-term trends.
    iso_country_code character Three-character ISO country code(s), with multiple countries separated by commas for multi-country studies (e.g.: “KEN, SOM”).
    cb_region character The geographic region classification used by Carbon Brief (Based on UN classification).
    event_type character The type of extreme weather event or trend discussed in the study.
    classification character How climate change has affected the studied event: “More severe or more likely to occur”, “No discernible human influence”, “Insufficient data/inconclusive”, “Decrease, less severe or less likely to occur”.
    summary_statement character The authors’ key findings.
    publication_year double The year when the study was published.
    citation character The full citation for the study.
    source character The source where the study was published.
    rapid_study character Whether this was a rapid attribution study or not: analysis completed within days of the event occurring (“yes” or “no”).
    link character The URL link to the original article.

    Cleaning Script

    # Data provided by Carbon Brief
    # Data available at https://interactive.carbonbrief.org/attribution-studies/data/papers-download.csv
    
    library(tidyverse)
    library(here)
    library(janitor)
    
    # Import Carbon Brief's Climate attribution studies dataset
    attribution_studies_raw <- readr::read_csv(
      "https://interactive.carbonbrief.org/attribution-studies/data/papers-download.csv"
    ) |>
      janitor::clean_names()
    
    
    # Helper functions -------------------------------------------------------
    
    # Function to standardize year spans to consistent yyyy-yyyy format
    clean_yearspan <- function(match) {
      # Split the span using "-" as a delimiter
      parts <- stringr::str_split(match, "-")[[1]]
      start_year <- as.numeric(parts[1])
      # Extract century from start year (e.g. 20 from 2020)
      century <- start_year %/% 100
      # Combine the years
      glue::glue("{parts[1]}-{century}{parts[2]}")
    }
    
    # Function to clean and standardize data strings
    clean_date_string <- function(col) {
      col |>
        stringr::str_replace_all("–", "-") |>
        # Find yyyy-yy patters and convert to yyyy-yyyy
        stringr::str_replace_all("(\\d{4})-(\\d{2}$)", \(match) {
          clean_yearspan(match)
        }) |>
        stringr::str_replace_all(" & ", ", ")
    }
    
    # Data Cleaning ----------------------------------------------------------
    
    attribution_studies <- attribution_studies_raw |>
      janitor::clean_names() |>
      # Separate event names from time periods
      # and split them into separate 'event_name' and 'event_period' columns
      tidyr::separate_wider_regex(
        name,
        patterns = c(
          event_name = ".*",
          ", ",
          event_period = ".*",
          "\\s\\(.*"
        ),
        too_few = "align_start"
      ) |>
      # Create standardized variables
      dplyr::mutate(
        event_year_trend = clean_date_string(event_year_trend),
        event_period = dplyr::case_when(
          is.na(event_period) & event_year_trend != "Trend" ~
            dplyr::coalesce(event_period, event_year_trend),
          TRUE ~ event_period
        ) |>
          clean_date_string(),
        event_year = dplyr::case_when(
          event_year_trend != "Trend" ~ event_year_trend,
          TRUE ~ NA_character_
        ),
        study_focus = dplyr::case_when(
          event_year_trend == "Trend" ~ "Trend",
          TRUE ~ "Event",
        ),
        .before = iso_country_code
      ) |>
      dplyr::select(!event_year_trend)