API Specs
This week we’re exploring Web APIs! The lead volunteer for TidyTuesday (Jon Harmon) is writing a book about working with Web APIs with R as well as a series of R packages to make it easier to create API-wrapping R packages. On Thursday, 2025-06-19, Jon will present a talk on this package ecosystem at the Ghana R Conference 2025. While working on the packages and the talk, Jon explored a list of APIs from the website APIs.guru. That dataset is provided here.
[APIs.guru’s] goal is to create a machine-readable Wikipedia for Web APIs in the OpenAPI Specification format.
- What API specs are provided by APIs.guru? Are these the same as the origin specs?
- How many different APIs (“services”) do providers provide?
- What licenses do APIs use?
- Are any APIs listed more than once in the dataset?
Thank you to Jon Harmon, Data Science Learning Community for curating this week’s dataset.
The Data
# Using R
# Option 1: tidytuesdayR R package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2025-06-17')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 24)
api_categories <- tuesdata$api_categories
api_info <- tuesdata$api_info
api_logos <- tuesdata$api_logos
api_origins <- tuesdata$api_origins
apisguru_apis <- tuesdata$apisguru_apis
# Option 2: Read directly from GitHub
api_categories <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv')
api_info <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv')
api_logos <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv')
api_origins <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv')
apisguru_apis <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv')# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday
import pydytuesday
# Download files from the week, which you can then read in locally
pydytuesday.get_date('2025-06-17')
# Option 2: Read directly from GitHub and assign to an object
api_categories = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv')
api_info = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv')
api_logos = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv')
api_origins = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv')
apisguru_apis = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv')# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
using TidierTuesday
# Download files from the week, which you can then read in locally
download_dataset('2025-06-17')
# Option 2: Read directly from GitHub and assign to an object with TidierFiles
api_categories = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv")
api_info = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv")
api_logos = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv")
api_origins = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv")
apisguru_apis = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv")
# Option 3: Read directly from Github and assign without Tidier dependencies
api_categories = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_categories.csv", DataFrame)
api_info = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_info.csv", DataFrame)
api_logos = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_logos.csv", DataFrame)
api_origins = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/api_origins.csv", DataFrame)
apisguru_apis = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-06-17/apisguru_apis.csv", DataFrame)How to Participate
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
- Submit your own dataset!
PydyTuesday: A Posit collaboration with TidyTuesday
- Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
- Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
- Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.
Data Dictionary
api_categories.csv
| variable | class | description |
|---|---|---|
| name | character | apis.guru’s designation for this API. |
| apisguru_category | character | Sorting category of this API on apis.guru. If an API is listed in multiple categories, it has more than one row in this table. |
api_info.csv
| variable | class | description |
|---|---|---|
| name | character | apis.guru’s designation for this API. |
| contact_name | character | The name of the person or entity responsible for this API. |
| contact_url | character | The url to check for information about this API. |
| description | character | A brief description of this API. |
| title | character | The title of this API.. |
| provider_name | character | The provider of this API. This is more meaningful if a provider has multiple APIs in the apis.guru database. |
| service_name | character | The service that this API covers within the provider. Pressent when a provider has multiple APIs in the apis.guru database. |
| license_name | character | The name of the license associated with this API, if available. |
| license_url | character | The URL of the license, if available. |
| terms_of_service | character | The url of terms of service for this API, if available. |
api_logos.csv
| variable | class | description |
|---|---|---|
| name | character | apis.guru’s designation for this API. |
| background_color | character | Hex code (in the format #RRGGBB) intended to show behind the logo. Missing values either mean that no color is expected (the background can be transparent), or that there isn’t a valid logo available for this API. |
| url | character | Path to the logo on apis.guru. |
| alt_text | character | Text to provide for this logo visually impaired users. |
api_origins.csv
| variable | class | description |
|---|---|---|
| name | character | apis.guru’s designation for this API. |
| format | character | The format of the original API spec, if available. One of “apiBlueprint”, “google”, “openapi”, “postman”, “swagger”, or “wadl”. |
| url | character | The path to the original API spec. Note: Some of these paths are no longer valid. |
| version | character | The version of the format used by this API spec. Each format has its own list of possible values. |
apisguru_apis.csv
| variable | class | description |
|---|---|---|
| name | character | apis.guru’s designation for this API. |
| version | character | Version of the API. Data is filtered to only the “preferrred” versions. |
| added | datetime | When the API was added to apis.guru. |
| updated | datetime | When the API was updated on apis.guru. |
| swagger_url | character | The path to this API spec on apis.guru. |
| openapi_ver | character | The version of the OpenAPI (or swagger) spec of this API on apis.guru. |
| link | character | The path to this information (plus some fields we don’t include here) for this API on apis.guru. |
| external_docs_description | character | A short description of external documentation, if provided. |
| external_docs_url | character | The location of the external documentation, if provided. |
Cleaning Script
# This dataset was compiled using funcionality from my in-progress ecosystem of
# packages described at https://beekeeper.api2r.org/. On Thursday, 2025-06-19, I
# will present a talk on this ecosystem at the Ghana R Conference 2025
# (https://ghana-rusers.org/ghana-r-conference-2025/).
# I use information about APIs from https://apis.guru to test various aspects of
# {beekeeper} and related packages. Here we use {nectar}
# (https://nectar.api2r.org/) and {tibblify}
# (https://mgirlich.github.io/tibblify/) to download and process the list of
# APIs from apis.guru.
# {nectar} is not available on CRAN. The rest of the packages can be installed
# via install.packages.
# install.packages("pak")
# pak::pak("jonthegeek/nectar")
library(nectar)
library(tibblify)
library(dplyr)
library(tidyr)
library(janitor)
# This section replicates functionality from the in-progress package {apisguru}
# (https://jonthegeek.github.io/apisguru/). At the time that this dataset was
# compiled, {apisguru} was not yet updated for the current version of {nectar},
# so I'm not using it directly. Watch its progress as the {beekeeper} ecosystem
# solidifies!
.schema_api_spec <- function() {
tibblify::tspec_df(
.tib_datetime("added"),
tibblify::tib_chr("preferred"),
tibblify::tib_df(
"versions",
.names_to = "version",
.schema_api_version_spec()
)
)
}
.schema_api_version_spec <- function() {
tibblify::tspec_df(
.tib_datetime("added"),
tibblify::tib_variant("info"),
.tib_datetime("updated"),
tibblify::tib_chr("swaggerUrl"),
tibblify::tib_chr("swaggerYamlUrl"),
tibblify::tib_chr("openapiVer"),
tibblify::tib_chr("link", required = FALSE),
tibblify::tib_variant("externalDocs", required = FALSE)
)
}
.tib_datetime <- function(key, ..., required = TRUE) {
tibblify::tib_scalar(
key = key,
ptype = vctrs::new_datetime(tzone = "UTC"),
required = required,
ptype_inner = character(),
transform = .quick_datetime,
...
)
}
.quick_datetime <- function(x, tzone = "UTC") {
as.POSIXct(gsub("T", " ", x), tz = tzone)
}
req <- nectar::req_prepare(
"https://api.apis.guru/v2",
path = "/list.json",
tidy_fn = nectar::resp_tidy_json,
tidy_args = list(
spec = tibblify::tspec_df(
.names_to = "name",
.schema_api_spec()
)
)
)
resp <- nectar::req_perform_opinionated(req, max_reqs = Inf)
apisguru_apis <- nectar::resp_tidy(resp) |>
dplyr::select("name", "preferred", "versions") |>
tidyr::unnest("versions") |>
dplyr::filter(
.data$preferred == .data$version
) |>
tidyr::unnest_wider("externalDocs", names_sep = "_") |>
dplyr::select(-"preferred", -"externalDocs_x-sha1", -"swaggerYamlUrl") |>
janitor::clean_names()
dplyr::glimpse(apisguru_apis)
api_info <- apisguru_apis |>
dplyr::select("name", "info") |>
tidyr::unnest_wider("info") |>
tidyr::unnest_wider("contact", names_sep = "_") |>
tidyr::unnest_wider("license", names_sep = "_") |>
dplyr::select(
"name",
"contact_name",
"contact_url",
"description",
"title",
apisguru_category = "x-apisguru-categories",
logo = "x-logo",
origin = "x-origin",
provider_name = "x-providerName",
service_name = "x-serviceName",
"license_name",
"license_url",
terms_of_service = "termsOfService"
)
apisguru_apis$info <- NULL
dplyr::glimpse(api_info)
api_categories <- api_info |>
dplyr::select("name", "apisguru_category") |>
tidyr::unnest_longer("apisguru_category")
api_info$apisguru_category <- NULL
api_logos <- api_info |>
dplyr::select("name", "logo") |>
tidyr::unnest_wider("logo") |>
janitor::clean_names() |>
dplyr::select(-"href")
api_info$logo <- NULL
api_origins <- api_info |>
dplyr::select("name", "origin") |>
tidyr::unnest_longer("origin") |>
tidyr::unnest_wider("origin") |>
dplyr::select("name":"version") |>
# Some of the entries are duplicated.
dplyr::distinct()
api_info$origin <- NULL