TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Hotels
      • Get the data here
      • Data Dictionary
    • hotels.csv
      • Cleaning Script

    Picture of hotel room bed overlooking window

    Hotels

    The data this week comes from an open hotel booking demand dataset from Antonio, Almeida and Nunes, 2019.

    Also shoutout to a series of packages for time-series analysis and plotting - tidyverts!

    • tsibble - a time-series tibble
    • feasts - Feature Extraction And Statistics for Time Series.
    • fable - Commonly used time-series forecasting

    Get the data here

    # Get the Data
    
    hotels <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-02-11/hotels.csv')
    
    # Or read in with tidytuesdayR package (https://github.com/dslc-io/tidytuesdayR)
    # PLEASE NOTE TO USE 2020 DATA YOU NEED TO USE tidytuesdayR version ? from GitHub
    
    # Either ISO-8601 date or year/week works!
    
    # Install via pak::pak("dslc-io/tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2020-02-11')
    tuesdata <- tidytuesdayR::tt_load(2020, week = 7)
    
    
    hotels <- tuesdata$hotels

    Data Dictionary

    variable class description
    hotel character Hotel (H1 = Resort Hotel or H2 = City Hotel)
    is_canceled double Value indicating if the booking was canceled (1) or not (0)
    lead_time double Number of days that elapsed between the entering date of the booking into the PMS and the arrival date
    arrival_date_year double Year of arrival date
    arrival_date_month character Month of arrival date
    arrival_date_week_number double Week number of year for arrival date
    arrival_date_day_of_month double Day of arrival date
    stays_in_weekend_nights double Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
    stays_in_week_nights double Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
    adults double Number of adults
    children double Number of children
    babies double Number of babies
    meal character Type of meal booked. Categories are presented in standard hospitality meal packages:
    Undefined/SC – no meal package;
    BB – Bed & Breakfast;
    HB – Half board (breakfast and one other meal – usually dinner);
    FB – Full board (breakfast, lunch and dinner)
    country character Country of origin. Categories are represented in the ISO 3155–3:2013 format
    market_segment character Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”
    distribution_channel character Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”
    is_repeated_guest double Value indicating if the booking name was from a repeated guest (1) or not (0)
    previous_cancellations double Number of previous bookings that were cancelled by the customer prior to the current booking
    previous_bookings_not_canceled double Number of previous bookings not cancelled by the customer prior to the current booking
    reserved_room_type character Code of room type reserved. Code is presented instead of designation for anonymity reasons
    assigned_room_type character Code for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type due to hotel operation reasons (e.g. overbooking) or by customer request. Code is presented instead of designation for anonymity reasons
    booking_changes double Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation
    deposit_type character Indication on if the customer made a deposit to guarantee the booking. This variable can assume three categories:
    No Deposit – no deposit was made;
    Non Refund – a deposit was made in the value of the total stay cost;
    Refundable – a deposit was made with a value under the total cost of stay.
    agent character ID of the travel agency that made the booking
    company character ID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for anonymity reasons
    days_in_waiting_list double Number of days the booking was in the waiting list before it was confirmed to the customer
    customer_type character Type of booking, assuming one of four categories:
    Contract - when the booking has an allotment or other type of contract associated to it;
    Group – when the booking is associated to a group;
    Transient – when the booking is not part of a group or contract, and is not associated to other transient booking;
    Transient-party – when the booking is transient, but is associated to at least other transient booking
    adr double Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights
    required_car_parking_spaces double Number of car parking spaces required by the customer
    total_of_special_requests double Number of special requests made by the customer (e.g. twin bed or high floor)
    reservation_status character Reservation last status, assuming one of three categories:
    Canceled – booking was canceled by the customer;
    Check-Out – customer has checked in but already departed;
    No-Show – customer did not check-in and did inform the hotel of the reason why
    reservation_status_date double Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel

    hotels.csv

    Cleaning Script

    library(tidyverse)
    library(feasts)
    
    # resort hotel
    h1 <- read_csv(here::here("2020", "2020-02-11", "H1.csv")) %>% 
      janitor::clean_names() %>% 
      mutate(hotel = "Resort Hotel") %>% 
      select(hotel, everything())
    
    # city hotel
    h2 <- read_csv(here::here("2020", "2020-02-11", "H2.csv")) %>% 
      janitor::clean_names() %>% 
      mutate(hotel = "City Hotel") %>% 
      select(hotel, everything())
    
    hotel_df <- bind_rows(h1, h2)
    
    hotel_plot <- hotel_df %>% 
      filter(hotel == "City Hotel") %>%
      mutate(date = glue::glue("{arrival_date_year}-{arrival_date_month}-{arrival_date_day_of_month}"),
             date = parse_date(date, format = "%Y-%B-%d")) %>% 
      select(date, everything()) %>% 
      arrange(date) %>% 
      count(date) %>% 
      rename(daily_bookings = n) %>% 
      tsibble::as_tsibble() %>% 
      model(STL(daily_bookings ~ season(window = Inf))) %>% 
      components() %>% autoplot()
    
    hotel_plot
    
    ggsave("hotel_bookings.png", hotel_plot, path = here::here("2020", "2020-02-11"), dpi = "retina")