TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Papal Encyclicals: Industrial Revolution vs. AI Revolution
      • The Data
      • How to Participate
        • PydyTuesday: A Posit collaboration with TidyTuesday
      • Data Dictionary
        • encyclicals.csv
        • papal_encyclicals.csv
        • scripture_references.csv
      • Cleaning Script

    Papal Encyclicals: Industrial Revolution vs. AI Revolution

    This week we’re exploring papal encyclicals — the most authoritative form of papal teaching in the Catholic Church. The primary dataset contains the full paragraph-level text of two encyclicals that bookend 135 years of technological revolution: Pope Leo XIII’s Rerum Novarum (1891), which addressed the Industrial Revolution’s impact on workers, and Pope Leo XIV’s Magnifica Humanitas (2026), which addresses artificial intelligence’s impact on human dignity. Both were signed on May 15 of their respective years.

    The data comes from Vatican.va, the official website of the Holy See. A supplementary dataset catalogs all 213 papal encyclicals from 1878 to 2026 with metadata about each pope.

    “Humanity, created by God in all its grandeur, is today facing a pivotal choice: either to construct a new Tower of Babel or to build the city in which God and humanity dwell together.” — Pope Leo XIV, Magnifica Humanitas §1

    • How does the vocabulary of Catholic Social Teaching evolve from the Industrial Revolution to the AI Revolution?
    • Which books of the Bible does each Pope draw upon, and what does that reveal about their theological emphasis?
    • Can a machine learning model reliably distinguish which Pope wrote a paragraph? What features does it rely on — specific words, or writing style?
    • How has encyclical output changed over time? Leo XIII wrote 86 encyclicals; Francis wrote 4. What does that tell us about how papal communication has evolved?
    • Which passages of Magnifica Humanitas are most textually similar to Rerum Novarum, suggesting direct intellectual lineage?

    Thank you to Tony Galvan, Golden Dome Data Science for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2026-06-23')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2026, week = 25)
    
    encyclicals <- tuesdata$encyclicals
    papal_encyclicals <- tuesdata$papal_encyclicals
    scripture_references <- tuesdata$scripture_references
    
    # Option 2: Read directly from GitHub
    
    encyclicals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/encyclicals.csv')
    papal_encyclicals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/papal_encyclicals.csv')
    scripture_references <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/scripture_references.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2026-06-23')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    encyclicals = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/encyclicals.csv')
    papal_encyclicals = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/papal_encyclicals.csv')
    scripture_references = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/scripture_references.csv')
    # Using Julia
    # Option 1: TidierTuesday.jl library
    ## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
    
    using TidierTuesday
    
    # Download datasets for the week, and load them as a NamedTuple of DataFrames
    data = tt_load("2026-06-23")
    
    # Option 2: Read directly from GitHub and assign to an object with TidierFiles
    
    encyclicals = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/encyclicals.csv")
    papal_encyclicals = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/papal_encyclicals.csv")
    scripture_references = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/scripture_references.csv")
    
    # Option 3: Read directly from Github and assign without Tidier dependencies
    encyclicals = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/encyclicals.csv", DataFrame)
    papal_encyclicals = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/papal_encyclicals.csv", DataFrame)
    scripture_references = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-06-23/scripture_references.csv", DataFrame)

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    encyclicals.csv

    variable class description
    encyclical character Name of the encyclical (Rerum Novarum or Magnifica Humanitas).
    pope character Papal name of the author (Leo XIII or Leo XIV).
    year integer Year the encyclical was published.
    paragraph integer Paragraph number within the encyclical (1-indexed).
    text character Full text of the paragraph.
    word_count integer Number of words in the paragraph.
    sentence_count integer Number of sentences in the paragraph (counted by terminal punctuation).

    papal_encyclicals.csv

    variable class description
    title character Title of the encyclical (usually in Latin).
    pope character Papal name of the author.
    year integer Year the encyclical was published.
    papal_number integer Ordinal number of the pope (e.g., Leo XIII = 256, Francis = 266).
    birth_name character Birth name of the pope before taking the papal name.
    birth_country character Country of birth of the pope.
    pontificate_start date Date the pope’s pontificate began.
    pontificate_end date Date the pope’s pontificate ended (NA if currently reigning).
    pontificate_year integer Year of the pontificate in which the encyclical was published (1 = first year).

    scripture_references.csv

    variable class description
    encyclical character Name of the encyclical containing the citation.
    pope character Papal name of the author.
    year integer Year the encyclical was published.
    paragraph integer Paragraph number where the citation appears.
    reference character The scripture citation as written in the text (e.g., “Gen 1:28”, “Matt 25:40”).
    book character Standardized English name of the biblical book (e.g., “Genesis”, “Matthew”).
    testament character Old Testament or New Testament.

    Cleaning Script

    library(tidyverse)
    library(rvest)
    library(httr)
    library(janitor)
    
    # ============================================================
    # Dataset 1: encyclicals
    # Paragraph-level text from two papal encyclicals
    # ============================================================
    
    # Fetch Rerum Novarum (Leo XIII, 1891)
    rn_url <- "https://www.vatican.va/content/leo-xiii/en/encyclicals/documents/hf_l-xiii_enc_15051891_rerum-novarum.html"
    rn_response <- GET(rn_url, user_agent("Mozilla/5.0"))
    rn_html <- read_html(content(rn_response, as = "text", encoding = "UTF-8"))
    rn_raw <- rn_html |> html_nodes("p") |> html_text2()
    
    # Fetch Magnifica Humanitas (Leo XIV, 2026)
    mh_url <- "https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html"
    mh_response <- GET(mh_url, user_agent("Mozilla/5.0"))
    mh_html <- read_html(content(mh_response, as = "text", encoding = "UTF-8"))
    mh_raw <- mh_html |> html_nodes("p") |> html_text2()
    
    # Parse numbered paragraphs from Rerum Novarum
    rn_numbered <- grep("^[0-9]+[.]", rn_raw, value = TRUE)
    rn_first <- rn_raw[8]  # Opening paragraph (unnumbered)
    
    rn_df <- tibble(
      paragraph = c(1L, as.integer(str_extract(rn_numbered, "^[0-9]+"))),
      text = c(rn_first, str_replace(rn_numbered, "^[0-9]+[.] ?", ""))
    )
    
    # Parse numbered paragraphs from Magnifica Humanitas
    mh_numbered <- grep("^[0-9]+[.]", mh_raw, value = TRUE)
    
    mh_df <- tibble(
      paragraph = as.integer(str_extract(mh_numbered, "^[0-9]+")),
      text = str_replace(mh_numbered, "^[0-9]+[.] ?", "")
    )
    
    # Combine into final dataset
    encyclicals <- bind_rows(
      rn_df |> mutate(encyclical = "Rerum Novarum", pope = "Leo XIII", year = 1891L),
      mh_df |> mutate(encyclical = "Magnifica Humanitas", pope = "Leo XIV", year = 2026L)
    ) |>
      mutate(
        word_count = as.integer(str_count(text, "\\S+")),
        sentence_count = as.integer(str_count(text, "[.!?]+"))
      ) |>
      select(encyclical, pope, year, paragraph, text, word_count, sentence_count)
    
    # ============================================================
    # Dataset 2: scripture_references
    # Biblical citations extracted from both encyclicals
    # ============================================================
    
    # Magnifica Humanitas: parenthetical refs like (cf. Gen 11:1-9)
    mh_paren_pattern <- "\\(cf[.]\\s*([^)]+)\\)"
    
    mh_scripture <- mh_df |>
      mutate(paren_refs = str_extract_all(text, mh_paren_pattern)) |>
      unnest(paren_refs, keep_empty = FALSE) |>
      mutate(paren_refs = str_replace_all(paren_refs, "^\\(cf[.]\\s*|\\)$", "")) |>
      separate_longer_delim(paren_refs, delim = ";") |>
      mutate(reference = str_trim(paren_refs)) |>
      select(paragraph, reference) |>
      mutate(encyclical = "Magnifica Humanitas", pope = "Leo XIV", year = 2026L)
    
    # Magnifica Humanitas: inline refs like Jn 10:10
    mh_inline_pattern <- "(?<!\\()(?:Jn|Mt|Mc|Lc|Is|Ps|Ap|Rm|Gn|Ex|Ac|Ef|Col|Ga|Ph)\\s+[0-9]+[,:][0-9]+"
    
    mh_inline <- mh_df |>
      mutate(inline_refs = str_extract_all(text, mh_inline_pattern)) |>
      unnest(inline_refs, keep_empty = FALSE) |>
      mutate(reference = str_trim(inline_refs)) |>
      select(paragraph, reference) |>
      mutate(encyclical = "Magnifica Humanitas", pope = "Leo XIV", year = 2026L)
    
    mh_all_scripture <- bind_rows(mh_scripture, mh_inline) |>
      distinct(encyclical, pope, year, paragraph, reference)
    
    # Rerum Novarum: manually mapped from the footnotes section
    rn_scripture <- tribble(
      ~paragraph, ~reference,
      11L, "Deut 5:21",
      12L, "Gen 1:28",
      17L, "Gen 3:17",
      20L, "James 5:4",
      21L, "2 Tim 2:12",
      21L, "2 Cor 4:17",
      22L, "Matt 19:23-24",
      22L, "Luke 6:24-25",
      22L, "Luke 11:41",
      22L, "Acts 20:35",
      22L, "Matt 25:40",
      23L, "2 Cor 8:9",
      23L, "Mark 6:3",
      24L, "Matt 5:3",
      24L, "Matt 11:28",
      25L, "Rom 8:17",
      28L, "1 Tim 6:10",
      29L, "Acts 4:34",
      40L, "Gen 1:28",
      40L, "Rom 10:12",
      41L, "Exod 20:8",
      41L, "Gen 2:2",
      44L, "Gen 3:19",
      50L, "Eccl 4:9-10",
      50L, "Prov 18:19",
      57L, "Matt 16:26",
      57L, "Matt 6:32-33",
      63L, "1 Cor 13:4-7"
    ) |>
      mutate(encyclical = "Rerum Novarum", pope = "Leo XIII", year = 1891L)
    
    # Combine and standardize book names
    scripture_references <- bind_rows(
      rn_scripture |> select(encyclical, pope, year, paragraph, reference),
      mh_all_scripture
    ) |>
      mutate(
        book = case_when(
          str_detect(reference, "^Gen") ~ "Genesis",
          str_detect(reference, "^Gn") ~ "Genesis",
          str_detect(reference, "^Exod") ~ "Exodus",
          str_detect(reference, "^Ex\\b") ~ "Exodus",
          str_detect(reference, "^Deut") ~ "Deuteronomy",
          str_detect(reference, "^Neh") ~ "Nehemiah",
          str_detect(reference, "^Prov") ~ "Proverbs",
          str_detect(reference, "^Eccl") ~ "Ecclesiastes",
          str_detect(reference, "^Ps") ~ "Psalms",
          str_detect(reference, "^Is") ~ "Isaiah",
          str_detect(reference, "^Jer") ~ "Jeremiah",
          str_detect(reference, "^Ezek") ~ "Ezekiel",
          str_detect(reference, "^Dan") ~ "Daniel",
          str_detect(reference, "^Matt") ~ "Matthew",
          str_detect(reference, "^Mt\\b") ~ "Matthew",
          str_detect(reference, "^Mark") ~ "Mark",
          str_detect(reference, "^Mk\\b") ~ "Mark",
          str_detect(reference, "^Mc\\b") ~ "Mark",
          str_detect(reference, "^Luke") ~ "Luke",
          str_detect(reference, "^Lc\\b") ~ "Luke",
          str_detect(reference, "^Lk\\b") ~ "Luke",
          str_detect(reference, "^John") ~ "John",
          str_detect(reference, "^Jn\\b") ~ "John",
          str_detect(reference, "^Acts") ~ "Acts",
          str_detect(reference, "^Ac\\b") ~ "Acts",
          str_detect(reference, "^Rom") ~ "Romans",
          str_detect(reference, "^Rm\\b") ~ "Romans",
          str_detect(reference, "^1 Cor") ~ "1 Corinthians",
          str_detect(reference, "^1\\s*Co\\b") ~ "1 Corinthians",
          str_detect(reference, "^2 Cor") ~ "2 Corinthians",
          str_detect(reference, "^2\\s*Co\\b") ~ "2 Corinthians",
          str_detect(reference, "^Gal") ~ "Galatians",
          str_detect(reference, "^Eph") ~ "Ephesians",
          str_detect(reference, "^Ef\\b") ~ "Ephesians",
          str_detect(reference, "^Phil") ~ "Philippians",
          str_detect(reference, "^Ph\\b") ~ "Philippians",
          str_detect(reference, "^Col") ~ "Colossians",
          str_detect(reference, "^1 Tim") ~ "1 Timothy",
          str_detect(reference, "^2 Tim") ~ "2 Timothy",
          str_detect(reference, "^Heb") ~ "Hebrews",
          str_detect(reference, "^James") ~ "James",
          str_detect(reference, "^1 Pet") ~ "1 Peter",
          str_detect(reference, "^1\\s*P\\b") ~ "1 Peter",
          str_detect(reference, "^Rev") ~ "Revelation",
          str_detect(reference, "^Ap\\b") ~ "Revelation",
          .default = NA_character_
        ),
        testament = case_when(
          book %in% c("Genesis", "Exodus", "Leviticus", "Numbers", "Deuteronomy",
                      "Nehemiah", "Psalms", "Proverbs", "Ecclesiastes", "Isaiah",
                      "Jeremiah", "Ezekiel", "Daniel") ~ "Old Testament",
          !is.na(book) ~ "New Testament",
          .default = NA_character_
        )
      ) |>
      select(encyclical, pope, year, paragraph, reference, book, testament)
    
    # ============================================================
    # Dataset 3: papal_encyclicals
    # Catalog of all papal encyclicals from 1878-2026
    # ============================================================
    
    # Scrape the Vatican's alphabetical papal documents list
    catalog_url <- "https://www.vatican.va/offices/papal_docs_list.html"
    catalog_resp <- GET(catalog_url, user_agent("Mozilla/5.0"))
    catalog_html <- read_html(content(catalog_resp, as = "text", encoding = "UTF-8"))
    
    # Extract all 4-column HTML tables (Title, Author, Year, Type)
    tables <- catalog_html |> html_table(fill = TRUE)
    four_col_tables <- keep(tables, ~ ncol(.x) == 4)
    
    all_docs <- bind_rows(four_col_tables, .id = "table_id") |>
      rename(title = X1, author = X2, year_raw = X3, type = X4) |>
      filter(title != "Title", title != "", nchar(title) > 2, type == "Encyclical") |>
      mutate(year = as.integer(str_extract(year_raw, "[0-9]{4}"))) |>
      filter(!is.na(year)) |>
      distinct(title, author, year, .keep_all = TRUE) |>
      mutate(
        pope = case_when(
          str_detect(author, "Leo XIII") ~ "Leo XIII",
          str_detect(author, "Pius X($|[^I])") ~ "Pius X",
          str_detect(author, "Pius XI($|[^I])") ~ "Pius XI",
          str_detect(author, "Pius XII") ~ "Pius XII",
          str_detect(author, "Benedict XV($|[^I])") ~ "Benedict XV",
          str_detect(author, "Benedict XVI") ~ "Benedict XVI",
          str_detect(author, "John XXIII") ~ "John XXIII",
          str_detect(author, "John Paul II") ~ "John Paul II",
          str_detect(author, "Paul VI") ~ "Paul VI",
          .default = author
        )
      ) |>
      select(title, pope, year)
    
    # Add encyclicals missing from the Vatican's legacy list
    missing_encyclicals <- tribble(
      ~title, ~pope, ~year,
      "Lumen Fidei", "Francis", 2013L,
      "Laudato Si'", "Francis", 2015L,
      "Fratelli Tutti", "Francis", 2020L,
      "Dilexit Nos", "Francis", 2024L,
      "Magnifica Humanitas", "Leo XIV", 2026L
    )
    
    # Papal biographical metadata
    papal_metadata <- tribble(
      ~pope, ~papal_number, ~birth_name, ~birth_country, ~pontificate_start, ~pontificate_end,
      "Leo XIII", 256L, "Vincenzo Gioacchino Pecci", "Italy", "1878-02-20", "1903-07-20",
      "Pius X", 257L, "Giuseppe Melchiorre Sarto", "Italy", "1903-08-04", "1914-08-20",
      "Benedict XV", 258L, "Giacomo della Chiesa", "Italy", "1914-09-03", "1922-01-22",
      "Pius XI", 259L, "Achille Ratti", "Italy", "1922-02-06", "1939-02-10",
      "Pius XII", 260L, "Eugenio Maria Giuseppe Pacelli", "Italy", "1939-03-02", "1958-10-09",
      "John XXIII", 261L, "Angelo Giuseppe Roncalli", "Italy", "1958-10-28", "1963-06-03",
      "Paul VI", 262L, "Giovanni Battista Montini", "Italy", "1963-06-21", "1978-08-06",
      "John Paul II", 264L, "Karol Jozef Wojtyla", "Poland", "1978-10-16", "2005-04-02",
      "Benedict XVI", 265L, "Joseph Aloisius Ratzinger", "Germany", "2005-04-19", "2013-02-28",
      "Francis", 266L, "Jorge Mario Bergoglio", "Argentina", "2013-03-13", "2025-04-21",
      "Leo XIV", 267L, "Robert Francis Prevost", "United States", "2025-05-08", NA_character_
    ) |>
      mutate(
        pontificate_start = as.Date(pontificate_start),
        pontificate_end = as.Date(pontificate_end)
      )
    
    # Combine and enrich
    papal_encyclicals <- bind_rows(all_docs, missing_encyclicals) |>
      distinct(title, pope, year) |>
      left_join(papal_metadata, by = "pope") |>
      mutate(
        pontificate_year = as.integer(year - as.integer(format(pontificate_start, "%Y")) + 1L)
      ) |>
      arrange(year, title) |>
      select(title, pope, year, papal_number, birth_name, birth_country,
             pontificate_start, pontificate_end, pontificate_year)