TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Italian industrial production
      • The Data
      • How to Participate
        • PydyTuesday: A Posit collaboration with TidyTuesday
      • Data Dictionary
        • food_beverages.csv
        • textiles.csv
        • transport.csv
      • Cleaning Script

    Italian industrial production

    ISTAT (Istituto nazionale di statistica / Italian National Institute of Statistics) is the primary source of official statistics in Italy. They publish long-running time series data on industrial production of food and beverages, transport equipment, and textiles covering over 100 years.

    Information on the volumes of some industrial products has been also collected over time by public and private Institutions and published by Istat in the previous Sommari di statistiche storiche (Historical statistics summaries), for the period between 1861 to 1985. This information does not provide a coherent reconstruction of the development of the Italian industrial system; however, for the purposes of completing the historical overview of the statistics in the sector, this chapter does present some parts of the material collected over the years, with specific reference to the food, textiles and transport industries

    • Which industries have increased production, and which have decreased?
    • How has the average weight per ship changed over time?

    Thank you to Nicola Rennie for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2026-05-05')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2026, week = 18)
    
    food_beverages <- tuesdata$food_beverages
    textiles <- tuesdata$textiles
    transport <- tuesdata$transport
    
    # Option 2: Read directly from GitHub
    
    food_beverages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/food_beverages.csv')
    textiles <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/textiles.csv')
    transport <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/transport.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2026-05-05')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    food_beverages = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/food_beverages.csv')
    textiles = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/textiles.csv')
    transport = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/transport.csv')
    # Using Julia
    # Option 1: TidierTuesday.jl library
    ## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
    
    using TidierTuesday
    
    # Download datasets for the week, and load them as a NamedTuple of DataFrames
    data = tt_load("2026-05-05")
    
    # Option 2: Read directly from GitHub and assign to an object with TidierFiles
    
    food_beverages = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/food_beverages.csv")
    textiles = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/textiles.csv")
    transport = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/transport.csv")
    
    # Option 3: Read directly from Github and assign without Tidier dependencies
    food_beverages = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/food_beverages.csv", DataFrame)
    textiles = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/textiles.csv", DataFrame)
    transport = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-05/transport.csv", DataFrame)

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    food_beverages.csv

    variable class description
    Year integer Year. The figures for the years between 1871 and 1950 refer to the fiscal year, which does not necessarily coincide with the calendar year; in particular, for the years between 1931 to 1950 the fiscal year began on 1st July. From 1951 onwards, the figures refer to the calendar year.
    Sugar integer Sugar produced (tons)
    Glucose integer Glucose, maltose, invert sugar produced (quintals)
    Coffee_substitute integer Coffee substitute produced (quintals)
    Seed_oil integer Seed oil produced (quintals)
    Ethyl_alcohol_1 integer Ethyl alcohol (1st category) produced (Ettanidro, 100 litres pure alcohol).
    Ethyl_alcohol_2 integer Ethyl alcohol (2nd category) produced (Ettanidro, 100 litres pure alcohol).
    Beer integer Beer produced (hectolitres)

    textiles.csv

    variable class description
    Year integer Year.
    Cotton_yarn integer Cotton yarn produced (tons).
    Flock_yarn integer Flock cotton yarn produced (tons).
    Other_yarn integer Other fibres and fibre blend cotton yarn produced (tons).
    Total_yarn integer Total cotton yarn produced (tons).
    Cotton_textiles integer Cotton textiles produced (tons).
    Flock_textiles integer Flock cotton textiles produced (tons).
    Other_textiles integer Other fibres and fibre blend cotton textiles produced (tons).
    Total_textiles integer Total cotton textiles produced (tons).
    Raw_silk integer Raw silk produced (tons).

    transport.csv

    variable class description
    Year integer Year.
    Ships_launched integer Number of ships launched.
    Ships_weight integer Tons of gross tonnage.
    Steam_and_electric_engine integer Number of steam and electric engine rolling stock in the property of the Italian Railway Service.
    Rail_cars_and_electric_locomotives integer Number of rail cars and electric locomotives rolling stock in the property of the Italian Railway Service.
    Carriages_and_trailers integer Number of carriages and trailers rolling stock in the property of the Italian Railway Service.
    Mail_luggage_vans_and_carriages integer Number of mail luggage vans and carriages rolling stock in the property of the Italian Railway Service.
    Passenger_cars integer Number of passenger cars.
    Other integer Number of other motor vehicles (e.g. lorries, vans and pick-ups, buses, trolleybuses, or special motor vehicles).

    Cleaning Script

    library(tidyverse)
    library(readxl)
    
    # Download files ----------------------------------------------------------
    
    download.file("https://seriestoriche.istat.it/fileadmin/documenti/Table_14.4.xls", "italian-industry-production/Table_14.4.xls")
    download.file("https://seriestoriche.istat.it/fileadmin/documenti/Table_14.5.xls", "italian-industry-production/Table_14.5.xls")
    download.file("https://seriestoriche.istat.it/fileadmin/documenti/Table_14.6.xls", "italian-industry-production/Table_14.6.xls")
    
    # Food and beverages ------------------------------------------------------
    
    raw_data1 <- read_xls("italian-industry-production/Table_14.4.xls",
                          skip = 5, sheet = 1)
    raw_data2 <- read_xls("italian-industry-production/Table_14.4.xls",
                          skip = 5, sheet = 2)
    data1 <- raw_data1 |>
      drop_na(YEARS) |>
      rename(Year = YEARS,
             Sugar = `Sugar (tons)`,
             Glucose = `Glucose, maltose, invert sugar \n(quintals)`,
             Coffee_substitute = `Coffee substitute (quintals)`,
             Seed_oil = `Seed oil (quintals)`,
             Ethyl_alcohol_1 = `Ethyl alcohol  (ettanidri) (c)`,
             Ethyl_alcohol_2 = `...11`,
             Beer = `Beer (hectolitres)`
      ) |>
      select(!starts_with("...")) |>
      mutate(
        across(everything(), as.numeric)
      ) |>
      drop_na(Year)
    
    data2 <- raw_data2 |>
      rename(Year = YEARS,
             Sugar = `Sugar (tons)`,
             Glucose = `Glucose, maltose, invert sugar\n (quintals)`,
             Coffee_substitute = `Coffee substitute (quintals)`,
             Seed_oil = `Seed oil \n(quintals)`,
             Ethyl_alcohol_1 = `Ethyl alcohol  (ettanidri) (c)`,
             Ethyl_alcohol_2 = `...11`,
             Beer = `Beer \n(hectolitres)`
      ) |>
      select(!starts_with("...")) |>
      mutate(
        across(everything(), as.numeric)
      ) |>
      drop_na(Year)
    food_beverages <- rbind(data1, data2) |>
      mutate(across(
        everything(), as.integer
      ))
    write_csv(food_beverages, "italian-industry-production/food_beverages.csv")
    
    # Textiles ----------------------------------------------------------------
    
    raw_data1 <- read_xls("italian-industry-production/Table_14.5.xls",
                          skip = 6, sheet = 1)
    raw_data2 <- read_xls("italian-industry-production/Table_14.5.xls",
                          skip = 7, sheet = 2)
    
    data1 <- raw_data1 |>
      rename(Year = "...1",
             Raw_silk = "...12",
             Cotton_yarn = Cotton...2,
             Flock_yarn = Flock...3,
             Other_yarn = `Other fibres and fibre blend...4`,
             Total_yarn = Total...5,
             Cotton_textiles = Cotton...7,
             Flock_textiles = Flock...8,
             Other_textiles = `Other fibres and fibre blend...9`,
             Total_textiles = Total...10
      ) |>
      select(!starts_with("...")) |>
      mutate(
        across(everything(), as.numeric)
      ) |>
      drop_na(Year)
    
    data2 <- raw_data2 |>
      rename(Year = "...1",
             Raw_silk = "...12",
             Cotton_yarn = Cotton...2,
             Flock_yarn = Flock...3,
             Other_yarn = `Other fibres and fibre blend...4`,
             Total_yarn = Total...5,
             Cotton_textiles = Cotton...7,
             Flock_textiles = Flock...8,
             Other_textiles = `Other fibres and fibre blend...9`,
             Total_textiles = Total...10
      ) |>
      select(!starts_with("...")) |>
      mutate(
        across(everything(), as.numeric)
      ) |>
      drop_na(Year)
    
    textiles <- rbind(data1, data2) |>
      mutate(across(
        everything(), as.integer
      ))
    write_csv(textiles, "italian-industry-production/textiles.csv")
    
    # Transport ---------------------------------------------------------------
    
    raw_data1 <- read_xls("italian-industry-production/Table_14.6.xls",
                          skip = 7, sheet = 1)
    raw_data2 <- read_xls("italian-industry-production/Table_14.6.xls",
                          skip = 7, sheet = 2)
    
    data1 <- raw_data1 |>
      rename(Year = "...1",
             Ships_launched = Number,
             Ships_weight  = `Tons of gross\n tonnage`
      ) |>
      select(!starts_with("...")) |>
      mutate(
        across(everything(), as.numeric)
      ) |>
      drop_na(Year) |>
      pivot_longer(-Year) |>
      mutate(name = str_remove(name, "\\(c\\)"),
             name = str_trim(name),
             name = str_replace_all(name, "\n", ""),
             name = str_replace_all(name, " ", "_")) |>
      pivot_wider()
    
    data2 <- raw_data2 |>
      rename(Year = "...1",
             Ships_launched = Number,
             Ships_weight  = `Tons of gross tonnage`
      ) |>
      select(!starts_with("...")) |>
      mutate(
        across(everything(), as.numeric)
      ) |>
      drop_na(Year) |>
      pivot_longer(-Year) |>
      mutate(name = str_remove(name, "\\(c\\)"),
             name = str_trim(name),
             name = str_replace_all(name, "\n", ""),
             name = str_replace_all(name, " ", "_")) |>
      pivot_wider()
    
    transport <- rbind(data1, data2) |>
      mutate(across(
        everything(), as.integer
      ))
    write_csv(transport, "italian-industry-production/transport.csv")