The System in Detail

The Holiday Oracle system automatically generates a "holiday rule" which is a series of dates for a particular holiday celebration for a country or subdivision from open source data.

To determine countries and subdivisions (states, provinces etc), Holiday Oracle uses the language and approach of ISO 3166 Country Codes and ISO 3166-2:2013 Codes for the representation of names of countries and their subdivisions - Part 2: Country subdivision code. Open data projects used by Holiday Oracle need to be in this format, or easily transmutable.

Data Sources
In some cases, there is only one source of open data for a particular country or subdivision, in other cases there are five or more. In each case, Holiday Oracle will use all available open data to create holiday rules and its date predictions - and in order for you to assess the level of reliance you should place on Holiday Oracle’s date predictions, we will tell you the number of data sources used. The Holiday Oracle API returns the numeric count of open data sources or open source projects which have been used in the consensus algorithm for each holiday rule. As more data sources become available for use in Holiday Oracle in the future, or the existing data sources are updated, we will use this data to re-generate holidays rules and date predictions.
The holiday rules created by Holiday Oracle use machine learning techniques to resolve holiday naming inconsistencies such whether "Easter Sunday" is the same as "Ostersonntag" and "Ostern". The system also recognises that many holidays around the world follow certain patterns such as Thanksgiving Day in the United States of America is celebrated on the last Thursday in November, and that Christmas Day is on 25th December. Of course, Holiday Oracle needs to be able to deal with exceptions - such as that one year in which a holiday is moved to another date due to another important event, or when a holiday is created and celebrated in only one year, such as on the appointment of a new leader. Holiday Oracle takes these exceptions into account in generating its candidate rules when present in the underlying data.

Holiday rules generated by machine learning

Holiday Oracle stress-tests its candidate holiday rules and their corresponding dates predictions in a cross-fold validation which provides greater insight into the level of consensus in the underlying dataset and outputs a final consensus score for each candidate holiday rule.

Scoring

The consensus score is between 0 and 1. The system generates the score by measuring the level of false positives, false negatives, true positives and true negatives (as well as the variation in the data points) when the rule is tested against the underlying data. The system selects the candidate holiday rule with the highest score. If the score of two candidate rules is tied, the system runs through a tie-breaker routine which considers other metrics about the rule and the data to select a winner.

In rare circumstances, the sources of open data dates are so inconsistent that Holiday Oracle can not automatically generate a rule. There is no consensus. In these cases, no holiday rule is returned by the Holiday Oracle API. This is uncommon, and we will continue to seek more sources of open data which should enable Holiday Oracle’s consensus mechanism to resolve these issues.

For a country or subdivision, each winning holiday rule and its corresponding date predictions for the years 2019 to 2030 with the highest consensus score across all of the datasets is returned via the API.