This week we’re getting ready for the 2026 Winter Olympics!

This week we’re exploring the event schedule for the 2026 Winter Olympics in Milan-Cortina, Italy. The dataset contains detailed information about all 1,866 Olympic events, including both competition and training sessions across various winter sport disciplines.

The dataset provides comprehensive scheduling information with start and end times in both local and UTC timezones, venue details, and metadata about each event such as whether it’s a medal event or training session. This dataset captures the full scope of Olympic events taking place from early February through the closing ceremonies, including the new ski mountaineering event.

Ciao from Milano Cortina 2026

Some questions to explore: - Which sport disciplines have the most events scheduled? - How are medal events distributed across the days of the Olympics? - What is the typical duration of different types of events? - Which venues host the most events? - How does the schedule vary by day of the week? - What proportion of scheduled sessions are training versus competition?

For more information about how the data was collected and example code for creating the table in R or Python you can go to this repository: https://github.com/chendaniely/olympics-2026

If you want the code that generated the example table:

from pathlib import Path

from great_tables import GT, style, loc
import polars as pl

# Load the events data
dataset_url = "https://raw.githubusercontent.com/chendaniely/olympics-2026/refs/heads/main/data/final/olympics/olympics_events.csv"
df = pl.read_csv(data_file)

schedule = (
    df
    .group_by("date", "discipline_name")
    .agg(
        pl.len().alias("total_events"),
        pl.col("is_medal_event").sum().alias("medal_events"),
    )
    .sort("date", "discipline_name")
)

print(f"Schedule shape: {schedule.shape}")
print("\nFirst 20 rows:")
schedule.head(20)

schedule = schedule.with_columns(pl.col("date").str.to_date("%Y-%m-%d"))

print("Updated schema:")
print(schedule.schema)
print("\nFirst few rows:")
schedule.head(10)

# Pivot the schedule to have disciplines as rows and dates as columns
schedule_pivot = schedule.pivot(
    on="date",
    index="discipline_name",
    values="total_events",
).sort("discipline_name")

# Create the table
gt_table = (
    GT(schedule_pivot)
    .tab_header(
        title="Olympics 2026 Event Schedule",
        subtitle="Total events per sport by date",
    )
    .cols_label(discipline_name="Sport")
)

gt_table

library(tidyverse)
library(gt)

# Load the events data

dataset_url <- "https://raw.githubusercontent.com/chendaniely/olympics-2026/refs/heads/main/data/final/olympics/olympics_events.csv"
df <- readr::read_csv(dataset_url)

# Create schedule summary
schedule <- df |>
  group_by(date, discipline_name) |>
  summarize(
    total_events = n(),
    medal_events = sum(is_medal_event),
    .groups = "drop"
  ) |>
  arrange(date, discipline_name)

cat("Schedule shape:", nrow(schedule), "x", ncol(schedule), "\n")
cat("\nFirst 20 rows:\n")
print(head(schedule, 20))

# Convert date column to Date type
schedule <- schedule |>
  mutate(date = as.Date(date))

cat("\nUpdated column types:\n")
print(str(schedule))
cat("\nFirst few rows:\n")
print(head(schedule, 10))

# Pivot the schedule to have disciplines as rows and dates as columns
schedule_pivot <- schedule |>
  select(date, discipline_name, total_events) |>
  pivot_wider(
    names_from = date,
    values_from = total_events,
    id_cols = discipline_name
  ) |>
  arrange(discipline_name)

# Create the table
gt_table <- schedule_pivot |>
  gt() |>
  tab_header(
    title = "Olympics 2026 Event Schedule",
    subtitle = "Total events per sport by date"
  ) |>
  cols_label(
    discipline_name = "Sport"
  )

gt_table

Thank you to Daniel Chen, Posit PBC, University of British Columbia for curating this week’s dataset.

The Data

# Using R
# Option 1: tidytuesdayR R package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2026-02-10')
## OR
tuesdata <- tidytuesdayR::tt_load(2026, week = 6)

schedule <- tuesdata$schedule

# Option 2: Read directly from GitHub

schedule <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-10/schedule.csv')

# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday

import pydytuesday

# Download files from the week, which you can then read in locally
pydytuesday.get_date('2026-02-10')

# Option 2: Read directly from GitHub and assign to an object

schedule = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-10/schedule.csv')

# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")

using TidierTuesday

# Download datasets for the week, and load them as a NamedTuple of DataFrames
data = tt_load("2026-02-10")

# Option 2: Read directly from GitHub and assign to an object with TidierFiles

schedule = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-10/schedule.csv")

# Option 3: Read directly from Github and assign without Tidier dependencies
schedule = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-10/schedule.csv", DataFrame)

How to Participate

Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
Submit your own dataset!

PydyTuesday: A Posit collaboration with TidyTuesday

Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

Data Dictionary

`schedule.csv`

variable	class	description
date	date	Date of the Olympic event.
discipline_code	character	Abbreviated code for the sport discipline (e.g., ALP for Alpine Skiing, CUR for Curling).
discipline_name	character	Full name of the sport discipline.
event_code	character	Unique identifier code for the specific event.
event_description	character	Descriptive name of the event including gender, type, and round.
start_datetime_local	datetime	Event start date and time in local timezone.
end_datetime_local	datetime	Event end date and time in local timezone.
start_datetime_utc	datetime	Event start date and time in UTC timezone.
end_datetime_utc	datetime	Event end date and time in UTC timezone.
is_medal_event	logical	Whether the event awards medals (TRUE) or not (FALSE).
is_training	logical	Whether the event is a training session (TRUE) or competition (FALSE).
venue_code	character	Abbreviated code for the venue location.
venue_name	character	Full name of the venue where the event takes place.
venue_slug	character	URL-friendly identifier for the venue.
location_name	character	Specific location or area within the venue (e.g., sheet, course).
location_code	character	Abbreviated code for the specific location within the venue.
session_code	character	Unique code identifying the event session.
estimated_start	logical	Whether the start time is estimated (TRUE) or confirmed (FALSE).
day_of_week	character	Day of the week the event occurs on.
start_time	time	Event start time without date component.
end_time	time	Event end time without date component.

Cleaning Script

# Clean data provided by @chendaniely. No cleaning was necessary.
import pandas as pd

dataset_url = "https://raw.githubusercontent.com/chendaniely/olympics-2026/refs/heads/main/data/final/olympics/olympics_events.csv"
schedule = pd.read_csv(dataset_url)