Agencies from the FBI Crime Data API
This week we’re exploring data from the FBI Crime Data API! Specifically, we’re looking at agency-level data across all 50 states in the USA. This dataset provides details on law enforcement agencies that have submitted data to the FBI’s Uniform Crime Reporting (UCR) Program and are displayed on the Crime Data Explorer (CDE).
Currently, the FBI produces four annual publications from data provided by more than 18,000 federal, state, county, city, university and college, and tribal law enforcement agencies voluntarily participating in the UCR program.
Crime data is dynamic. Offenses occur, arrests are made, and property is recovered every day. The FBI’s Crime Data Explorer, the digital front door for UCR data, is an attempt to reflect that fluidity in crime. The data presented there is updated regularly in a way that UCR publications previously could not be. Launched in 2017, the CDE’s content and features are updated and expanded continuously. CDE enables law enforcement agencies, researchers, journalists, and the public to more easily use and understand the massive amounts of UCR data using charts and graphs.
How do agency types vary? How are agencies distributed geographically within each state?
What percentage of agencies in each state participate in NIBRS reporting? Are there any trends in NIBRS adoption?
Thank you to Ford Johnson for curating this week’s dataset.
The Data
# Using R
# Option 1: tidytuesdayR R package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2025-02-18')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 7)
agencies <- tuesdata$agencies
# Option 2: Read directly from GitHub
agencies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-18/agencies.csv')# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday
import pydytuesday
# Download files from the week, which you can then read in locally
pydytuesday.get_date("2025-02-18")
# Option 2: Read directly from GitHub and assign to an object
agencies = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-18/agencies.csv')How to Participate
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
- Submit your own dataset!
PydyTuesday: A Posit collaboration with TidyTuesday
- Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
- Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
- Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.
Data Dictionary
agencies.csv
| variable | class | description |
|---|---|---|
| ori | character | Unique ID used to identify an agency. |
| county | character | The county associated with the agency’s jurisdiction within a state. |
| latitude | double | The approximate latitude of the agency. |
| longitude | double | The approximate latitude of the agency. |
| state_abbr | character | The abbreviated two letter state code for the agency’s location. |
| state | character | The full name of the state where the agency is located. |
| agency_name | character | The official name of the agency. |
| agency_type | character | The type or category of the agency, such as city or county. |
| is_nibrs | logical | Indicates whether the agency participates in the FBI’s National Incident-Based Reporting System (NIBRS). |
| nibrs_start_date | character | The date on which the agency began reporting data to NIBRS. |
Cleaning Script
# "The Crime Data Explorer (CDE) provides select datasets for download. Incident-based data by state,
# summary data estimates, and data about other specific topics may be downloaded in CSV files from
# the selections below. Data are also available via the Crime Data API, a read-only web service that
# returns JSON or CSV data, and provides experienced users access to large amounts of UCR data to use
# and share." - From https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/downloads
# To re-create this dataset:
# 1. Get a FBI Crime Data API Key from the docs here: https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/docApi
# 2. Set an environment variable via: `Sys.setenv(API_KEY = "{YOUR_API_KEY}")`
# 3. Run the script which will access your environment variable via: `Sys.getenv("API_KEY")`
library(httr2)
library(jsonlite)
library(dplyr)
library(purrr)
state_abbrs <- c(
"AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"
)
api_key <- Sys.getenv("API_KEY")
parse_nibrs_date <- function(date_val) {
if (is.null(date_val)) {
NA_character_
} else if (is.list(date_val)) {
if (length(date_val) > 0) as.character(date_val[[1]]) else NA_character_
} else {
as.character(date_val)
}
}
fetch_agency_data <- function(state_abbr, api_key) {
url <- sprintf(
"https://api.usa.gov/crime/fbi/cde/agency/byStateAbbr/%s?API_KEY=%s",
state_abbr, api_key
)
response_text <- request(url) %>%
req_perform() %>%
resp_body_string()
agency_data <- fromJSON(response_text, flatten = TRUE)
agency_df <- if (is.data.frame(agency_data)) {
agency_data
} else if (is.list(agency_data)) {
bind_rows(agency_data)
} else {
stop("Unexpected JSON structure: not a list or data frame.")
}
if ("nibrs_start_date" %in% names(agency_df)) {
agency_df <- agency_df %>%
mutate(nibrs_start_date = map_chr(nibrs_start_date, parse_nibrs_date))
} else {
agency_df$nibrs_start_date <- NA_character_
}
agency_df$state <- state_abbr
agency_df
}
agency_data_list <- list()
qa_data_list <- list()
for (state in state_abbrs) {
cat("Fetching data for state:", state, "\n")
state_agency_df <- fetch_agency_data(state, api_key)
agency_data_list[[state]] <- state_agency_df
qa_data_list[[state]] <- tibble(
state = state,
response_length = nrow(state_agency_df)
)
}
agencies <- bind_rows(agency_data_list) |>
mutate(agency_type = agency_type_name, county = counties, state = state_name) |>
select(ori, county, latitude, longitude, state_abbr, state, agency_name, agency_type, is_nibrs, nibrs_start_date)
# QA checks
response_qa <- bind_rows(qa_data_list)
agencies_qa <- agencies %>%
group_by(state) %>%
summarise(n_ori = n_distinct(ori), .groups = "drop")
qa_comparison <- inner_join(response_qa, agencies_qa, by = "state") %>%
mutate(match = response_length == n_ori) %>%
filter(match != TRUE)
print(qa_comparison)