Extreme Weather Attribution Studies

This week we’re exploring extreme weather attribution studies. The dataset comes from Carbon Brief’s article Mapped: How climate change affects extreme weather around the world. An in-depth exploration of the evolution of extreme weather attribution science can be found in this Q & A article.

The data was last updated in November 2024 and single studies that cover multiple events or locations are separated out into individual entries when possible.

Attribution studies calculate whether, and by how much, climate change affected the intensity, frequency or impact of extremes - from wildfires in the US and drought in South Africa through to record-breaking rainfall in Pakistan and typhoons in Taiwan.

Some questions you might explore:

How do attribution study publications evolve over time? What about rapid attribution studies?
What type of extreme event is more frequently the subject of an attribution study?
In which regions are most studies focused?
Is there a trend in how climate change influences different types of extreme weather?

Thank you to Rajo for curating this week’s dataset.

The Data

# Using R
# Option 1: tidytuesdayR R package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2025-08-12')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 32)

attribution_studies <- tuesdata$attribution_studies
attribution_studies_raw <- tuesdata$attribution_studies_raw

# Option 2: Read directly from GitHub

attribution_studies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv')
attribution_studies_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv')

# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday

import pydytuesday

# Download files from the week, which you can then read in locally
pydytuesday.get_date('2025-08-12')

# Option 2: Read directly from GitHub and assign to an object

attribution_studies = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv')
attribution_studies_raw = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv')

# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")

using TidierTuesday

# Download files from the week, which you can then read in locally
download_dataset('2025-08-12')

# Option 2: Read directly from GitHub and assign to an object with TidierFiles

attribution_studies = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv")
attribution_studies_raw = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv")

# Option 3: Read directly from Github and assign without Tidier dependencies
attribution_studies = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies.csv", DataFrame)
attribution_studies_raw = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-12/attribution_studies_raw.csv", DataFrame)

How to Participate

Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
Submit your own dataset!

PydyTuesday: A Posit collaboration with TidyTuesday

Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

Data Dictionary

`attribution_studies.csv`

variable	class	description
event_name	character	The name or description of the extreme weather event studied.
event_period	character	The specific time period when the event occurred (extracted from the raw event name).
event_year	character	The year(s) or year range when the event occurred.
study_focus	character	Whether the study focused on a specific event or long-term trends.
iso_country_code	character	Three-character ISO country code(s), with multiple countries separated by commas for multi-country studies (e.g.: “KEN, SOM”).
cb_region	character	The geographic region classification used by Carbon Brief (Based on UN classification).
event_type	character	The type of extreme weather event or trend discussed in the study.
classification	character	How climate change has affected the studied event: “More severe or more likely to occur”, “No discernible human influence”, “Insufficient data/inconclusive”, “Decrease, less severe or less likely to occur”.
summary_statement	character	The authors’ key findings.
publication_year	double	The year when the study was published.
citation	character	The full citation for the study.
source	character	The source where the study was published.
rapid_study	character	Whether this was a rapid attribution study or not: analysis completed within days of the event occurring (“yes” or “no”).
link	character	The URL link to the original article.

`attribution_studies_raw.csv`

variable	class	description
name	character	The name or description of the extreme weather event studied and period when it occurred if available.
event_year_trend	character	Time period when the event occurred (e.g., “2014”, “2004-05”, “mid-1990s”) or indication if the study focuses on long-term trends.
iso_country_code	character	Three-character ISO country code(s), with multiple countries separated by commas for multi-country studies (e.g.: “KEN, SOM”).
cb_region	character	The geographic region classification used by Carbon Brief (Based on UN classification).
event_type	character	The type of extreme weather event or trend discussed in the study.
classification	character	How climate change has affected the studied event: “More severe or more likely to occur”, “No discernible human influence”, “Insufficient data/inconclusive”, “Decrease, less severe or less likely to occur”.
summary_statement	character	The authors’ key findings.
publication_year	double	The year when the study was published.
citation	character	The full citation for the study.
source	character	The source where the study was published.
rapid_study	character	Whether this was a rapid attribution study or not: analysis completed within days of the event occurring (“yes” or “no”).
link	character	The URL link to the original article.

Cleaning Script

# Data provided by Carbon Brief
# Data available at https://interactive.carbonbrief.org/attribution-studies/data/papers-download.csv

library(tidyverse)
library(here)
library(janitor)

# Import Carbon Brief's Climate attribution studies dataset
attribution_studies_raw <- readr::read_csv(
  "https://interactive.carbonbrief.org/attribution-studies/data/papers-download.csv"
) |>
  janitor::clean_names()


# Helper functions -------------------------------------------------------

# Function to standardize year spans to consistent yyyy-yyyy format
clean_yearspan <- function(match) {
  # Split the span using "-" as a delimiter
  parts <- stringr::str_split(match, "-")[[1]]
  start_year <- as.numeric(parts[1])
  # Extract century from start year (e.g. 20 from 2020)
  century <- start_year %/% 100
  # Combine the years
  glue::glue("{parts[1]}-{century}{parts[2]}")
}

# Function to clean and standardize data strings
clean_date_string <- function(col) {
  col |>
    stringr::str_replace_all("–", "-") |>
    # Find yyyy-yy patters and convert to yyyy-yyyy
    stringr::str_replace_all("(\\d{4})-(\\d{2}$)", \(match) {
      clean_yearspan(match)
    }) |>
    stringr::str_replace_all(" & ", ", ")
}

# Data Cleaning ----------------------------------------------------------

attribution_studies <- attribution_studies_raw |>
  janitor::clean_names() |>
  # Separate event names from time periods
  # and split them into separate 'event_name' and 'event_period' columns
  tidyr::separate_wider_regex(
    name,
    patterns = c(
      event_name = ".*",
      ", ",
      event_period = ".*",
      "\\s\\(.*"
    ),
    too_few = "align_start"
  ) |>
  # Create standardized variables
  dplyr::mutate(
    event_year_trend = clean_date_string(event_year_trend),
    event_period = dplyr::case_when(
      is.na(event_period) & event_year_trend != "Trend" ~
        dplyr::coalesce(event_period, event_year_trend),
      TRUE ~ event_period
    ) |>
      clean_date_string(),
    event_year = dplyr::case_when(
      event_year_trend != "Trend" ~ event_year_trend,
      TRUE ~ NA_character_
    ),
    study_focus = dplyr::case_when(
      event_year_trend == "Trend" ~ "Trend",
      TRUE ~ "Event",
    ),
    .before = iso_country_code
  ) |>
  dplyr::select(!event_year_trend)