TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • API Specs
      • The Data
      • How to Participate
        • PydyTuesday: A Posit collaboration with TidyTuesday
      • Data Dictionary
        • api_categories.csv
        • api_info.csv
        • api_logos.csv
        • api_origins.csv
        • apisguru_apis.csv
      • Cleaning Script

    API Specs

    This week we’re exploring Web APIs! The lead volunteer for TidyTuesday (Jon Harmon) is writing a book about working with Web APIs with R as well as a series of R packages to make it easier to create API-wrapping R packages. On Thursday, 2025-06-19, Jon will present a talk on this package ecosystem at the Ghana R Conference 2025. While working on the packages and the talk, Jon explored a list of APIs from the website APIs.guru. That dataset is provided here.

    [APIs.guru’s] goal is to create a machine-readable Wikipedia for Web APIs in the OpenAPI Specification format.

    • What API specs are provided by APIs.guru? Are these the same as the origin specs?
    • How many different APIs (“services”) do providers provide?
    • What licenses do APIs use?
    • Are any APIs listed more than once in the dataset?

    Thank you to Jon Harmon, Data Science Learning Community for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2025-06-17')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2025, week = 24)
    
    api_categories <- tuesdata$api_categories
    api_info <- tuesdata$api_info
    api_logos <- tuesdata$api_logos
    api_origins <- tuesdata$api_origins
    apisguru_apis <- tuesdata$apisguru_apis
    
    # Option 2: Read directly from GitHub
    
    api_categories <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv')
    api_info <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv')
    api_logos <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv')
    api_origins <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv')
    apisguru_apis <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2025-06-17')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    api_categories = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv')
    api_info = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv')
    api_logos = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv')
    api_origins = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv')
    apisguru_apis = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv')
    # Using Julia
    # Option 1: TidierTuesday.jl library
    ## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
    
    using TidierTuesday
    
    # Download files from the week, which you can then read in locally
    download_dataset('2025-06-17')
    
    # Option 2: Read directly from GitHub and assign to an object with TidierFiles
    
    api_categories = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv")
    api_info = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv")
    api_logos = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv")
    api_origins = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv")
    apisguru_apis = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv")
    
    # Option 3: Read directly from Github and assign without Tidier dependencies
    api_categories = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv", DataFrame)
    api_info = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv", DataFrame)
    api_logos = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv", DataFrame)
    api_origins = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv", DataFrame)
    apisguru_apis = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv", DataFrame)

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    api_categories.csv

    variable class description
    name character apis.guru’s designation for this API.
    apisguru_category character Sorting category of this API on apis.guru. If an API is listed in multiple categories, it has more than one row in this table.

    api_info.csv

    variable class description
    name character apis.guru’s designation for this API.
    contact_name character The name of the person or entity responsible for this API.
    contact_url character The url to check for information about this API.
    description character A brief description of this API.
    title character The title of this API..
    provider_name character The provider of this API. This is more meaningful if a provider has multiple APIs in the apis.guru database.
    service_name character The service that this API covers within the provider. Pressent when a provider has multiple APIs in the apis.guru database.
    license_name character The name of the license associated with this API, if available.
    license_url character The URL of the license, if available.
    terms_of_service character The url of terms of service for this API, if available.

    api_logos.csv

    variable class description
    name character apis.guru’s designation for this API.
    background_color character Hex code (in the format #RRGGBB) intended to show behind the logo. Missing values either mean that no color is expected (the background can be transparent), or that there isn’t a valid logo available for this API.
    url character Path to the logo on apis.guru.
    alt_text character Text to provide for this logo visually impaired users.

    api_origins.csv

    variable class description
    name character apis.guru’s designation for this API.
    format character The format of the original API spec, if available. One of “apiBlueprint”, “google”, “openapi”, “postman”, “swagger”, or “wadl”.
    url character The path to the original API spec. Note: Some of these paths are no longer valid.
    version character The version of the format used by this API spec. Each format has its own list of possible values.

    apisguru_apis.csv

    variable class description
    name character apis.guru’s designation for this API.
    version character Version of the API. Data is filtered to only the “preferrred” versions.
    added datetime When the API was added to apis.guru.
    updated datetime When the API was updated on apis.guru.
    swagger_url character The path to this API spec on apis.guru.
    openapi_ver character The version of the OpenAPI (or swagger) spec of this API on apis.guru.
    link character The path to this information (plus some fields we don’t include here) for this API on apis.guru.
    external_docs_description character A short description of external documentation, if provided.
    external_docs_url character The location of the external documentation, if provided.

    Cleaning Script

    # This dataset was compiled using funcionality from my in-progress ecosystem of
    # packages described at https://beekeeper.api2r.org/. On Thursday, 2025-06-19, I
    # will present a talk on this ecosystem at the Ghana R Conference 2025
    # (https://ghana-rusers.org/ghana-r-conference-2025/).
    
    # I use information about APIs from https://apis.guru to test various aspects of
    # {beekeeper} and related packages. Here we use {nectar}
    # (https://nectar.api2r.org/) and {tibblify}
    # (https://mgirlich.github.io/tibblify/) to download and process the list of
    # APIs from apis.guru.
    
    # {nectar} is not available on CRAN. The rest of the packages can be installed
    # via install.packages.
    # install.packages("pak")
    # pak::pak("jonthegeek/nectar")
    
    library(nectar)
    library(tibblify)
    library(dplyr)
    library(tidyr)
    library(janitor)
    
    # This section replicates functionality from the in-progress package {apisguru}
    # (https://jonthegeek.github.io/apisguru/). At the time that this dataset was
    # compiled, {apisguru} was not yet updated for the current version of {nectar},
    # so I'm not using it directly. Watch its progress as the {beekeeper} ecosystem
    # solidifies!
    
    .schema_api_spec <- function() {
      tibblify::tspec_df(
        .tib_datetime("added"),
        tibblify::tib_chr("preferred"),
        tibblify::tib_df(
          "versions",
          .names_to = "version",
          .schema_api_version_spec()
        )
      )
    }
    
    .schema_api_version_spec <- function() {
      tibblify::tspec_df(
        .tib_datetime("added"),
        tibblify::tib_variant("info"),
        .tib_datetime("updated"),
        tibblify::tib_chr("swaggerUrl"),
        tibblify::tib_chr("swaggerYamlUrl"),
        tibblify::tib_chr("openapiVer"),
        tibblify::tib_chr("link", required = FALSE),
        tibblify::tib_variant("externalDocs", required = FALSE)
      )
    }
    
    .tib_datetime <- function(key, ..., required = TRUE) {
      tibblify::tib_scalar(
        key = key,
        ptype = vctrs::new_datetime(tzone = "UTC"),
        required = required,
        ptype_inner = character(),
        transform = .quick_datetime,
        ...
      )
    }
    
    .quick_datetime <- function(x, tzone = "UTC") {
      as.POSIXct(gsub("T", " ", x), tz = tzone)
    }
    
    req <- nectar::req_prepare(
      "https://api.apis.guru/v2",
      path = "/list.json",
      tidy_fn = nectar::resp_tidy_json,
      tidy_args = list(
        spec = tibblify::tspec_df(
          .names_to = "name",
          .schema_api_spec()
        )
      )
    )
    resp <- nectar::req_perform_opinionated(req, max_reqs = Inf)
    apisguru_apis <- nectar::resp_tidy(resp) |>
      dplyr::select("name", "preferred", "versions") |>
      tidyr::unnest("versions") |>
      dplyr::filter(
        .data$preferred == .data$version
      ) |>
      tidyr::unnest_wider("externalDocs", names_sep = "_") |>
      dplyr::select(-"preferred", -"externalDocs_x-sha1", -"swaggerYamlUrl") |>
      janitor::clean_names()
    
    dplyr::glimpse(apisguru_apis)
    
    api_info <- apisguru_apis |>
      dplyr::select("name", "info") |>
      tidyr::unnest_wider("info") |>
      tidyr::unnest_wider("contact", names_sep = "_") |>
      tidyr::unnest_wider("license", names_sep = "_") |>
      dplyr::select(
        "name",
        "contact_name",
        "contact_url",
        "description",
        "title",
        apisguru_category = "x-apisguru-categories",
        logo = "x-logo",
        origin = "x-origin",
        provider_name = "x-providerName",
        service_name = "x-serviceName",
        "license_name",
        "license_url",
        terms_of_service = "termsOfService"
      )
    apisguru_apis$info <- NULL
    dplyr::glimpse(api_info)
    
    api_categories <- api_info |>
      dplyr::select("name", "apisguru_category") |>
      tidyr::unnest_longer("apisguru_category")
    api_info$apisguru_category <- NULL
    
    api_logos <- api_info |>
      dplyr::select("name", "logo") |>
      tidyr::unnest_wider("logo") |>
      janitor::clean_names() |>
      dplyr::select(-"href")
    api_info$logo <- NULL
    
    api_origins <- api_info |>
      dplyr::select("name", "origin") |>
      tidyr::unnest_longer("origin") |>
      tidyr::unnest_wider("origin") |>
      dplyr::select("name":"version") |>
      # Some of the entries are duplicated.
      dplyr::distinct()
    api_info$origin <- NULL