TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Please add alt text to your posts
    • Formula 1 Races
      • License
      • Data
      • Get the data here
      • Data Dictionary
      • circuits.csv
      • constructor_results table
      • constructor_standings table
      • constructors table
      • driver_standings table
      • drivers table
      • lap_times table
      • pit_stops table
      • qualifying table
      • races table
      • results table
      • seasons table
      • status table
      • Cleaning Script

    Please add alt text to your posts

    Please add alt text (alternative text) to all of your posted graphics for #TidyTuesday.

    Twitter provides guidelines for how to add alt text to your images.

    The DataViz Society/Nightingale by way of Amy Cesal has an article on writing good alt text for plots/graphs.

    Here’s a simple formula for writing alt text for data visualization: ### Chart type It’s helpful for people with partial sight to know what chart type it is and gives context for understanding the rest of the visual. Example: Line graph ### Type of data What data is included in the chart? The x and y axis labels may help you figure this out. Example: number of bananas sold per day in the last year ### Reason for including the chart Think about why you’re including this visual. What does it show that’s meaningful. There should be a point to every visual and you should tell people what to look for. Example: the winter months have more banana sales ### Link to data or source Don’t include this in your alt text, but it should be included somewhere in the surrounding text. People should be able to click on a link to view the source data or dig further into the visual. This provides transparency about your source and lets people explore the data. Example: Data from the USDA

    Penn State has an article on writing alt text descriptions for charts and tables.

    Charts, graphs and maps use visuals to convey complex images to users. But since they are images, these media provide serious accessibility issues to colorblind users and users of screen readers. See the examples on this page for details on how to make charts more accessible.

    The {rtweet} package includes the ability to post tweets with alt text programatically.

    Need a reminder? There are extensions that force you to remember to add Alt Text to Tweets with media.

    The start of the 2018 Austrian Grand Prix - the image is of two long lines of F1 cars racing around the corner of the Austrian Grand Prix track.

    Formula 1 Races

    The data this week comes from the Ergast API, which has a CC-BY license. H/t to Sara Stoudt for sharing the link to the data by way of Data is Plural!

    FiveThirtyEight published a nice article on “Who’s The Best Formula One Driver Of All Time?”. While the ELO data is not present in this dataset, you could calculate your own rating or using the {elo} package to create ELO scores.

    Per Wikipedia, Formula 1:

    Formula One (also known as Formula 1 or F1) is the highest class of international auto racing for single-seater formula racing cars sanctioned by the Fédération Internationale de l’Automobile (FIA). The World Drivers’ Championship, which became the FIA Formula One World Championship in 1981, has been one of the premier forms of racing around the world since its inaugural season in 1950. The word formula in the name refers to the set of rules to which all participants’ cars must conform. A Formula One season consists of a series of races, known as Grands Prix, which take place worldwide on both purpose-built circuits and closed public roads.

    The results of each race are evaluated using a points system to determine two annual World Championships: one for drivers, the other for constructors. Each driver must hold a valid Super Licence, the highest class of racing licence issued by the FIA. The races must run on tracks graded “1” (formerly “A”), the highest grade-rating issued by the FIA. Most events occur in rural locations on purpose-built tracks, but several events take place on city streets.

    Each team can be called a “constructor” and they have two drivers. For example, Lewis Hamilton is the primary (driver) for the Mercedes team (constructor).

    License

    Complete images of the Ergast database are published shortly after each race under the Attribution-NonCommercial-ShareAlike 3.0 Unported Licence.

    Data

    There is an option for raw CSVs (which is what is included in this repo), a SQL database, or querying the raw API. This is a great dataset to practice with using the httr package to query an API, SQL against the database or dbplyr against the database! You can also work with the raw CSVs and practice your dplyr::left_join() and friends. Read more about dplyr joins in the dplyr “joins” documentation.

    If you wish to query the raw API, you can check the docs for example, to get a table of all drivers who have ever finished #1 in the championship: http://ergast.com/mrd/methods/status/. There is also an option to download the SQL database itself.

    download.file(
      "http://ergast.com/downloads/f1db_ansi.sql.gz", 
      destfile = "f1db-mysql.zip"
    )

    Working with this data will require you to do several left joins, for example to get the standings for each race/driver. Each of the tables listed in the data dictionary have their keys for joining. If you don’t want to dig too deep into the data, then I would recommend starting here. This is a good dataset of results by race, driver, season!

    driver_results_df <- driver_standings %>% 
      left_join(races, by = "raceId") %>% 
      rename(driver_url = url) %>% 
      left_join(drivers, by = "driverId")
      
    glimpse(driver_results_df)
    
    #> Rows: 33,206
    #> Columns: 22
    #> $ driverStandingsId <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, #> 14, 1…
    #> $ raceId            <dbl> 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, #> 19, …
    #> $ driverId          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, #> 8, …
    #> $ points            <dbl> 10, 8, 6, 5, 4, 3, 2, 1, 14, 11, 6, 6, 10, 3, #> 2,…
    #> $ position          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 1, 3, 6, 7, 4, 9, 10, #> 2,…
    #> $ positionText      <chr> "1", "2", "3", "4", "5", "6", "7", "8", "1", #> "3"…
    #> $ wins              <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, #> 1, …
    #> $ year              <dbl> 2008, 2008, 2008, 2008, 2008, 2008, 2008, #> 2008, …
    #> $ round             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, #> 2, …
    #> $ circuitId         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, #> 2, …
    #> $ name              <chr> "Australian Grand Prix", "Australian Grand #> Prix"…
    #> $ date              <date> 2008-03-16, 2008-03-16, 2008-03-16, #> 2008-03-16,…
    #> $ time              <chr> "04:30:00", "04:30:00", "04:30:00", #> "04:30:00", …
    #> $ driver_url        <chr> #> "http://en.wikipedia.org/wiki/2008_Australian_Gr…
    #> $ driverRef         <chr> "hamilton", "heidfeld", "rosberg", "alonso", #> "ko…
    #> $ number            <chr> "44", "\\N", "6", "14", "\\N", "\\N", "\\N", #> "7"…
    #> $ code              <chr> "HAM", "HEI", "ROS", "ALO", "KOV", "NAK", #> "BOU",…
    #> $ forename          <chr> "Lewis", "Nick", "Nico", "Fernando", #> "Heikki", "…
    #> $ surname           <chr> "Hamilton", "Heidfeld", "Rosberg", "Alonso", #> "Ko…
    #> $ dob               <date> 1985-01-07, 1977-05-10, 1985-06-27, #> 1981-07-29,…
    #> $ nationality       <chr> "British", "German", "German", "Spanish", #> "Finni…
    #> $ url               <chr> #> "http://en.wikipedia.org/wiki/Lewis_Hamilton", "…

    To query the raw API, you can use httr, just make sure to end the call/url in .json to return JSON data.

    library(tidyverse)
    library(jsonlite)
    library(httr)
    
    standing <- 1
    raw_json <- httr::GET(url = glue::glue(
      "http://ergast.com/api/f1/driverStandings/{standing}/drivers.json")) %>% 
      content(type = "text", encoding = "UTF-8") %>% 
      jsonlite::parse_json(simplifyVector = FALSE) 
      
    raw_json %>% 
      View()
      
    winner_table <- raw_json$MRData$DriverTable$Drivers %>%
      tibble(data = .) %>%
      unnest_wider(data)
      
    winner_table %>% glimpse()
    #> Rows: 30
    #> Columns: 8
    #> $ driverId        <chr> "alonso", "mario_andretti", "ascari", "j…
    #> $ permanentNumber <chr> "14", NA, NA, NA, "22", NA, NA, NA, NA, …
    #> $ code            <chr> "ALO", NA, NA, NA, "BUT", NA, NA, NA, NA…
    #> $ url             <chr> "http://en.wikipedia.org/wiki/Fernando_A…
    #> $ givenName       <chr> "Fernando", "Mario", "Alberto", "Jack", …
    #> $ familyName      <chr> "Alonso", "Andretti", "Ascari", "Brabham…
    #> $ dateOfBirth     <chr> "1981-07-29", "1940-02-28", "1918-07-13"…
    #> $ nationality     <chr> "Spanish", "American", "Italian", "Austr…

    Get the data here

    # Get the Data
    
    # Read in with tidytuesdayR package 
    # Install from CRAN via: install.packages("tidytuesdayR")
    # This loads the readme and all the datasets for the week of interest
    
    # Either ISO-8601 date or year/week works!
    
    tuesdata <- tidytuesdayR::tt_load('2021-09-07')
    tuesdata <- tidytuesdayR::tt_load(2021, week = 37)
    
    results <- tuesdata$results
    
    # Or read in the data manually
    
    circuits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/circuits.csv')
    constructor_results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/constructor_results.csv')
    constructor_standings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/constructor_standings.csv')
    constructors <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/constructors.csv')
    driver_standings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/driver_standings.csv')
    drivers <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/drivers.csv')
    lap_times <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/lap_times.csv')
    pit_stops <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/pit_stops.csv')
    qualifying <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/qualifying.csv')
    races <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/races.csv')
    results <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/results.csv')
    seasons <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/seasons.csv')
    status <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-09-07/status.csv')

    Data Dictionary

    List of Tables
    circuits
    constructorResults
    constructorStandings
    constructors
    driverStandings
    drivers
    lapTimes
    pitStops
    qualifying
    races
    results
    seasons
    status
    General Notes
    Dates, times and durations are in ISO 8601 format
    Dates and times are UTC
    Strings use UTF-8 encoding
    Primary keys are for internal use only
    Fields ending with “Ref” are unique identifiers for external use
    A grid position of ‘0’ is used for starting from the pitlane
    Labels used in the positionText fields:
    “D” - disqualified
    “E” - excluded
    “F” - failed to qualify
    “N” - not classified
    “R” - retired
    “W” - withdrew

    circuits.csv

    Field Type Null Key Default Extra Description
    circuitId int(11) NO PRI NULL auto_increment Primary key
    circuitRef varchar(255) NO Unique circuit identifier
    name varchar(255) NO Circuit name
    location varchar(255) YES NULL Location name
    country varchar(255) YES NULL Country name
    lat float YES NULL Latitude
    lng float YES NULL Longitude
    alt int(11) YES NULL Altitude (metres)
    url varchar(255) NO UNI Circuit Wikipedia page

    constructor_results table

    Field Type Null Key Default Extra Description
    constructorResultsId int(11) NO PRI NULL auto_increment Primary key
    raceId int(11) NO 0 Foreign key link to races table
    constructorId int(11) NO 0 Foreign key link to constructors table
    points float YES NULL Constructor points for race
    status varchar(255) YES NULL “D” for disqualified (or null)

    constructor_standings table

    Field Type Null Key Default Extra Description
    constructorStandingsId int(11) NO PRI NULL auto_increment Primary key
    raceId int(11) NO 0 Foreign key link to races table
    constructorId int(11) NO 0 Foreign key link to constructors table
    points float NO 0 Constructor points for season
    position int(11) YES NULL Constructor standings position (integer)
    positionText varchar(255) YES NULL Constructor standings position (string)
    wins int(11) NO 0 Season win count

    constructors table

    Field Type Null Key Default Extra Description
    constructorId int(11) NO PRI NULL auto_increment Primary key
    constructorRef varchar(255) NO Unique constructor identifier
    name varchar(255) NO UNI Constructor name
    nationality varchar(255) YES NULL Constructor nationality
    url varchar(255) NO Constructor Wikipedia page

    driver_standings table

    Field Type Null Key Default Extra Description
    driverStandingsId int(11) NO PRI NULL auto_increment Primary key
    raceId int(11) NO 0 Foreign key link to races table
    driverId int(11) NO 0 Foreign key link to drivers table
    points float NO 0 Driver points for season
    position int(11) YES NULL Driver standings position (integer)
    positionText varchar(255) YES NULL Driver standings position (string)
    wins int(11) NO 0 Season win count

    drivers table

    Field Type Null Key Default Extra Description
    driverId int(11) NO PRI NULL auto_increment Primary key
    driverRef varchar(255) NO Unique driver identifier
    number int(11) YES NULL Permanent driver number
    code varchar(3) YES NULL Driver code e.g. “ALO”
    forename varchar(255) NO Driver forename
    surname varchar(255) NO Driver surname
    dob date YES NULL Driver date of birth
    nationality varchar(255) YES NULL Driver nationality
    url varchar(255) NO UNI Driver Wikipedia page

    lap_times table

    Field Type Null Key Default Extra Description
    raceId int(11) NO PRI NULL Foreign key link to races table
    driverId int(11) NO PRI NULL Foreign key link to drivers table
    lap int(11) NO PRI NULL Lap number
    position int(11) YES NULL Driver race position
    time varchar(255) YES NULL Lap time e.g. “1:43.762”
    milliseconds int(11) YES NULL Lap time in milliseconds

    pit_stops table

    Field Type Null Key Default Extra Description
    raceId int(11) NO PRI NULL Foreign key link to races table
    driverId int(11) NO PRI NULL Foreign key link to drivers table
    stop int(11) NO PRI NULL Stop number
    lap int(11) NO NULL Lap number
    time time NO NULL Time of stop e.g. “13:52:25”
    duration varchar(255) YES NULL Duration of stop e.g. “21.783”
    milliseconds int(11) YES NULL Duration of stop in milliseconds

    qualifying table

    Field Type Null Key Default Extra Description
    qualifyId int(11) NO PRI NULL auto_increment Primary key
    raceId int(11) NO 0 Foreign key link to races table
    driverId int(11) NO 0 Foreign key link to drivers table
    constructorId int(11) NO 0 Foreign key link to constructors table
    number int(11) NO 0 Driver number
    position int(11) YES NULL Qualifying position
    q1 varchar(255) YES NULL Q1 lap time e.g. “1:21.374”
    q2 varchar(255) YES NULL Q2 lap time
    q3 varchar(255) YES NULL Q3 lap time

    races table

    Field Type Null Key Default Extra Description
    raceId int(11) NO PRI NULL auto_increment Primary key
    year int(11) NO 0 Foreign key link to seasons table
    round int(11) NO 0 Round number
    circuitId int(11) NO 0 Foreign key link to circuits table
    name varchar(255) NO Race name
    date date NO 0000-00-00 Race date e.g. “1950-05-13”
    time time YES NULL Race start time e.g.”13:00:00”
    url varchar(255) YES UNI NULL Race Wikipedia page

    results table

    Field Type Null Key Default Extra Description
    resultId int(11) NO PRI NULL auto_increment Primary key
    raceId int(11) NO 0 Foreign key link to races table
    driverId int(11) NO 0 Foreign key link to drivers table
    constructorId int(11) NO 0 Foreign key link to constructors table
    number int(11) YES NULL Driver number
    grid int(11) NO 0 Starting grid position
    position int(11) YES NULL Official classification, if applicable
    positionText varchar(255) NO Driver position string e.g. “1” or “R”
    positionOrder int(11) NO 0 Driver position for ordering purposes
    points float NO 0 Driver points for race
    laps int(11) NO 0 Number of completed laps
    time varchar(255) YES NULL Finishing time or gap
    milliseconds int(11) YES NULL Finishing time in milliseconds
    fastestLap int(11) YES NULL Lap number of fastest lap
    rank int(11) YES 0 Fastest lap rank, compared to other drivers
    fastestLapTime varchar(255) YES NULL Fastest lap time e.g. “1:27.453”
    fastestLapSpeed varchar(255) YES NULL Fastest lap speed (km/h) e.g. “213.874”
    statusId int(11) NO 0 Foreign key link to status table

    seasons table

    Field Type Null Key Default Extra Description
    year int(11) NO PRI 0 Primary key e.g. 1950
    url varchar(255) NO UNI Season Wikipedia page

    status table

    Field Type Null Key Default Extra Description
    statusId int(11) NO PRI NULL auto_increment Primary key
    status varchar(255) NO Finishing status e.g. “Retired”

    Cleaning Script

    Not a real cleaning script, just me exploring the data structures.

    library(tidyverse)
    library(fs)
    library(httr)
    
    # if you want SQL with tables
    # download.file(
    #   "http://ergast.com/downloads/f1db_ansi.sql.gz", 
    #   destfile = "2021/2021-09-07/f1db-mysql.zip"
    #   )
    
    download.file(
      "http://ergast.com/downloads/f1db_csv.zip", 
      destfile = "2021/2021-09-07/f1db.zip"
    )
    
    unzip("2021/2021-09-07/f1db.zip", exdir = "2021/2021-09-07/")
    
    raw_data <- map(
      fs::dir_ls("2021/2021-09-07/", glob = "*.csv"),
      read_csv
      ) %>% 
      set_names(nm = str_remove(names(.), "2021/2021-09-07/"))
    
    raw_data %>% 
      str(max.level = 1)
    
    # Example of JSON/HTTR
    raw_json <- httr::GET(url = glue::glue(
      "http://ergast.com/api/f1/driverStandings/{standing}/drivers.json")) %>% 
      content(type = "text", encoding = "UTF-8")
    
    raw_json%>% 
      View()
    
    raw_json$MRData$DriverTable$Drivers %>%
      tibble(data = .) %>%
      unnest_wider(data)
    
    driver_standings %>% 
      left_join(raw_data$races.csv, by = "raceId") %>% 
      rename(driver_url = url) %>% 
      left_join(raw_data$drivers.csv, by = "driverId")
    
    file_names <- fs::dir_ls("2021/2021-09-07/", glob = "*.csv") %>% 
      str_remove("2021/2021-09-07/") %>% 
      str_remove(".csv")
      
    file_names