Getting public holidays by country
I’ve been recently working on a time series model to which I wanted to include the public holidays of Spain and Portugal. After trying different approaches I decided to move forward with prophet
which, by the way, I strongly recommend it.
But this post comes to my mind because I’d like to tell some options we have to get the public holidays by country. I don’t want to go into details about the specificities of what a public holidays mean (regional or local ones are excluded in this analysis, for example).
The first thing I did was to search for an existing R package and I couldn’t find anything. As a colleague pointed me this package would likely suffer from a strong maintenance. However, as prophet
has a built-in function to include holidays, I considered to look into the code and I found that the package provides a data.frame with the holidays from 1995 to 2044 for many countries (there are around 100 different country names but I think half of them are country codes).
For many purposes this dataframe would suffice but it’s weird for me to load prophet
just for taking advantage of this data. So I decided to keep exploring and I found holidayapi.com which provides an API to access the data but I realized that the free account is limited so I didn’t deepen here.
Fortunately date.nager.at provides the same information but also an open API so with the following simple function we can access to the data:
library(httr)
library(dplyr)
library(magrittr)
library(purrr)
get_holidays <- function(country_code, year) {
# Build URL
url <- parse_url("http://date.nager.at")
url$path <- paste0("api/v1/get/", country_code, "/", year)
base_url <- build_url(url)
# Get content from the site
content_json <- content(GET(base_url))
# Extract only relevant fields
df <- map_df(content_json, extract, c("countryCode", "name", "date"))
df
}
And the output:
get_holidays(country_code = "AT", year = 2019)
## # A tibble: 13 x 3
## countryCode name date
## <chr> <chr> <chr>
## 1 AT New Year's Day 2019-01-01
## 2 AT Epiphany 2019-01-06
## 3 AT Easter Monday 2019-04-22
## 4 AT National Holiday 2019-05-01
## 5 AT Ascension Day 2019-05-30
## 6 AT Whit Monday 2019-06-10
## 7 AT Corpus Christi 2019-06-20
## 8 AT Assumption Day 2019-08-15
## 9 AT National Holiday 2019-10-26
## 10 AT All Saints' Day 2019-11-01
## 11 AT Immaculate Conception 2019-12-08
## 12 AT Christmas Day 2019-12-25
## 13 AT St. Stephen's Day 2019-12-26
And for several years:
years <- c("2016", "2017", "2018", "2019")
map_df(years, function(x) get_holidays("AT", x))
## # A tibble: 52 x 3
## countryCode name date
## <chr> <chr> <chr>
## 1 AT New Year's Day 2016-01-01
## 2 AT Epiphany 2016-01-06
## 3 AT Easter Monday 2016-03-28
## 4 AT National Holiday 2016-05-01
## 5 AT Ascension Day 2016-05-05
## 6 AT Whit Monday 2016-05-16
## 7 AT Corpus Christi 2016-05-26
## 8 AT Assumption Day 2016-08-15
## 9 AT National Holiday 2016-10-26
## 10 AT All Saints' Day 2016-11-01
## # … with 42 more rows
After all, I still find two main drawbacks.
- I haven’t analyzed the data quality and I don’t know exactly if someone is maintaining this website.
- It’d be interesting to include regional and local holidays and, in addition, a label with relevant days (Black Friday, for example).
Does anyone have a better approach?