TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • European energy
      • Get the data here
      • Data Dictionary
    • energy_types
    • country_totals
      • Cleaning Script

    Wind farm

    European energy

    The data this week comes from Eurostat.

    H/t to Karim Douïeb who created a very nice graphic in Observable (D3), based off a Washington post article for US Energy. Their graphic can be found below.

    Additional data can be found via the OECD.

    There’s also a nice report for 2017 form the EU Power sector. Lots of graphics inside.

    Thermal power according to Wikipedia:

    A thermal power station is a power station in which heat energy is converted to electric power. In most, a steam-driven turbine converts heat to mechanical power as an intermediate to electrical power. Water is heated, turns into steam and drives a steam turbine which drives an electrical generator.

    Clean vs renewable vs fossil fuels by Peninsula Energy.

    Clean energy is carbon-free energy that creates little to no greenhouse gas emissions. This is in contrast to fossil fuels, which produce a significant amount of greenhouse gas emissions, including carbon dioxide and methane. Renewable energy is energy that comes from resources that are naturally replenished such as sunlight, wind, water, and geothermal heat. Unlike fossil fuels, such as oil, natural gas and coal, which cannot be replaced, renewable energy regenerates naturally in a short period of time.

    Overall, this dataset was an exploration of pulling data from an Excel-based data product. This was a bit messy to start but applying techniques to extract the data out of repeated tables and into a tidy format is possible programatically with a bit of logic + readxl! Check out the cleaning script for the details.

    Get the data here

    # Get the Data
    
    # Read in with tidytuesdayR package 
    # Install from CRAN via: install.packages("tidytuesdayR")
    # This loads the readme and all the datasets for the week of interest
    
    # Either ISO-8601 date or year/week works!
    
    tuesdata <- tidytuesdayR::tt_load('2020-08-04')
    tuesdata <- tidytuesdayR::tt_load(2020, week = 32)
    
    energy_types <- tuesdata$energy_types
    
    # Or read in the data manually
    
    energy_types <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-08-04/energy_types.csv')
    country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-08-04/country_totals.csv')

    Data Dictionary

    Limited to Level 1 or 2 production by type - either conventional thermal (fossil fuels), nuclear, hydro, wind, solar, geothermal, or other.

    energy_types

    variable class description
    country character Country ID
    country_name character Country name
    type character Type of energy production
    level character Level - either total, level 1 or level 2. Where level 2 makes up level 1 that makes up the total.
    2016 double Energy in GWh (Gigawatt hours)
    2017 double Energy in GWh (Gigawatt hours)
    2018 double Energy in GWh (Gigawatt hours)

    country_totals

    Limited to total net production, along with imports, exports, energy lost, and energy supplied (net + import - export - energy absorbed by pumping).

    variable class description
    country character Country ID
    country_name character Country name
    type character Type of energy production
    level character Level - either total, level 1 or level 2. Where level 2 makes up level 1 that makes up the total.
    2016 double Energy in GWh (Gigawatt hours)
    2017 double Energy in GWh (Gigawatt hours)
    2018 double Energy in GWh (Gigawatt hours)

    Cleaning Script

    library(tidyverse)
    library(readxl)
    library(countrycode)
    
    raw_code <- countrycode::codelist %>% 
        select(country_name = country.name.en, country = eurostat)
    
    raw_excel <- read_excel("2020/2020-08-04/Electricity_generation_statistics_2019.xlsx", sheet = 3)
      
    raw_excel %>% 
        filter(!is.na(...4)) %>% 
        mutate(country = str_remove_all(...4, "[:digit:]"), .before = ...1) %>% 
        mutate(country = if_else(
          str_length(country) > 1, country, NA_character_), 
          country = str_extract(country, "[:alpha:]+")
          ) %>% 
        fill(country) %>% 
      select(-c(...1, ...2, ...14:...18))
    
    row_stat <- read_excel("2020/2020-08-04/Electricity_generation_statistics_2019.xlsx", 
                           sheet = 3,
                           range = "C48:C61", col_names = FALSE)[[1]][c(1,3:14)] %>% 
      str_remove("[:digit:]") %>% 
      str_remove("of which: ") %>% 
      str_remove("\\.") %>% str_trim()
    
    country_range <- tibble(row_start = seq(from = 46, to = 454, by = 34), 
           row_end = seq(from = 61, to = 469, by = 34)) %>% 
      mutate(col1 = 4, col2 = col1 + 5, col3 = col2 + 5) %>% 
      pivot_longer(cols = col1:col3, names_to = "col_var", values_to = "col_start") %>% 
      mutate(col_end = col_start + 2) %>% 
      select(-col_var) %>% 
      slice(-n(), -(n()-1)) %>% 
      mutate(row_stat = list(row_stat))
    
    
    get_country_stats <- function(row_start, row_end, col_start, col_end, row_stat){
      
      # # pull the row_stat names
      # row_stat <- row_stat
    
      # create the range programatically
      col_range <- glue::glue("{LETTERS[col_start]}{row_start}:{LETTERS[col_end]}{row_end}")
      
      # read in the data section quietly
      raw_data <- suppressMessages(
        read_excel("2020/2020-08-04/Electricity_generation_statistics_2019.xlsx", 
                             sheet = 3,
                             col_names = FALSE,
                             range = col_range))
      
      
      country_data <-  raw_data %>% 
        # set appropriate names
        set_names(nm = c(2016:2018)) %>% 
        # drop the year ranges
        filter(!is.na(`2016`), `2016` != "2016") %>% 
        # get the country into a column rather than a header
        mutate(country = if_else(
          is.na(`2017`), 
          `2016`, 
          NA_character_), 
          .before = `2016`) %>% 
        # fill country down
        fill(country) %>% 
        # drop old country header
        filter(!is.na(`2017`)) %>% 
        # add row stat in
        mutate(type = row_stat, 
               .after = country, 
               # add levels of the stats
               level = c("Total", "Level 1", "Level 1", "Level 1", "Level 2", 
                         "Level 1", "Level 1", "Level 1", "Level 1", "Total", 
                         "Total", "Total", "Total")) %>% 
        # format as double
        mutate(across(c(`2016`:`2018`), as.double))
      
      # return data
      country_data
    }
    
    all_countries <- country_range %>% 
      pmap_dfr(get_country_stats) %>% 
      left_join(raw_code, by = "country") %>% 
      select(country, country_name, everything())
    
    country_totals <- all_countries %>% 
      filter(level == "Total")
    
    country_production <- all_countries %>% 
      filter(level != "Total")
    
    # sanity check
    country_totals %>% 
      # filter(type == "Total net production") %>% 
      pivot_longer(cols = `2016`:`2018`, names_to = "year", values_to = "value") %>% 
      filter(type == "Total net production") %>%
      # count(type)%>% 
      ggplot(aes(y = value, x = year, color = country, group = country)) +
      geom_line()
    
    write_csv(country_totals, "2020/2020-08-04/country_totals.csv")
    
    write_csv(country_production, "2020/2020-08-04/energy_types.csv")