TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Water Sources
      • Get the data here
      • Data Dictionary
    • water.csv
      • Cleaning Script

    Logo for the Water Point Data Exchange

    Water Sources

    The data this week comes from Water Point Data Exchange. Note that the data is limited to some core columns as the full dataset is ~300 Mb. Also data is largely filtered down to African sources.

    Katy Sill and Adam Kariv are speaking about this data source at csv,conf.

    Per WPDX

    The amount of water point data being collected is growing rapidly as governments and development partners increasingly monitor water points over time. Without harmonization among these different data sources, the opportunity for learning will be limited, with the true potential of this information remaining untapped.

    By establishing a platform for sharing water point data throughout the global water sector, WPDx adds value to the data already being collected. By bringing together diverse data sets, the water sector can establish an unprecedented understanding of water services.

    Sharing this data has the potential to improve water access for millions of people as a result of better information available to governments, service providers, researchers, NGOs, and others.

    Get the data here

    # Get the Data
    
    # Read in with tidytuesdayR package 
    # Install from CRAN via: install.packages("tidytuesdayR")
    # This loads the readme and all the datasets for the week of interest
    
    # Either ISO-8601 date or year/week works!
    
    tuesdata <- tidytuesdayR::tt_load('2021-05-04')
    tuesdata <- tidytuesdayR::tt_load(2021, week = 19)
    
    water <- tuesdata$water
    
    # Or read in the data manually
    
    water <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-05-04/water.csv')

    Data Dictionary

    water.csv

    variable class description
    row_id double Unique ID
    lat_deg double Latitude
    lon_deg double Longitude
    report_date character Date water source was reported on
    status_id character Identify if any water is available on the day of the visit, recognizing that it may be a limited flow
    water_source_clean character Describe the water source (e.g. shallow well, spring, borehole, river, pond, etc.)
    water_tech character Describe the system being used to transport the water from the source to the point of collection (e.g. Handpump (include manufacturer, i.e. Afridev, IndiaMark II, Malda,etc.),Kiosk, Tapstand,etc.)
    facility_type character Categorized facility type based on JMP definitions
    country_name character Country name
    install_year double Install year
    installer character Installer
    pay character Provide the payment amount and basis (e.g. monthly, per jerry can, when broken,etc.). If no amount is provided, the basis can be provided alone. An amount without a payment basis cannot be included
    status character Provide a status of the physical/mechanical condition of thewater point

    Cleaning Script

    library(tidyverse)
    
    raw_df <- read_csv("2021/2021-05-04/Water_Point_Data_Exchange__WPDx-Basic_.csv")
    
    clean_df <- raw_df %>% 
      janitor::clean_names() %>%
      rename_with(~str_remove(.x, "number_")) %>% 
      select(
        row_id:status_id, water_source_clean:subjective_quality,rehab_year,
        install_year,
        -contains("adm"), -contains("rehab"), -contains("fecal"),
        -count, -new_georeferenced_column, -source, -management,-subjective_quality
          ) %>% 
      mutate(
        status_id = case_when(
          status_id == "Yes"~ "y",
          status_id == "No" ~ "n",
          status_id == "Unknown" ~ "u"
        ),
        report_date = str_sub(report_date, 1, 10)
        ) %>% 
      filter(
        !country_name %in% c(
          "NA", "Slovenia", "El Salvador", "Iraq", "Malawi", "Bangladesh",
          "Benin", "India", "Bolivia", "Honduras", "Guatemala", "Nicaragua",
          "Afghanistan", "Nepal", "Haiti", "Mexico", "Indonesia", "Sri Lanka",
          "Dominican Republublic", "Belize", "Myanmar", "Vanuatu", "Jordan"
          )
        ) %>% 
      select(-water_tech) %>% 
      rename(water_source = water_source_clean, water_tech = water_tech_clean)
    
    clean_df %>% 
      write_csv("2021/2021-05-04/water.csv")