Abstract
In this tutorial, we will step-by-step build a model to forecast daily electricity consumption in France for the next day (D+1). The goal is not only to achieve high performance but also to deeply understand the underlying mechanisms that make it possible, moving from the simplest approaches to a robust, interpretable hybrid architecture.
The data primarily comes from RTE (Eco2mix – Definitive Annual, 2012–2024). We aggregate the 15-minute observations into daily consumption series (MWh) and apply a strict time-series validation: training on 2012–2021 and final testing on 2022–2024. This realistic split simulates operational conditions and heavily mitigates data leakage risks.
The pedagogical journey follows an incremental logic: we begin with a persistence model (MAE ≈ 56,400 MWh), then introduce a linear autoregressive model. Adding calendar variables (one-hot encoded day of the week, weekends, holidays) reduces the MAE to approximately 26,300 MWh. Next, we demonstrate the limitations of a global linear approach and propose a two-stage architecture: a base model (AR + calendar) supplemented by a residual corrector.
For this corrector, we use ERA5 data (Copernicus reanalysis) which we transform into seven spatially aggregated thermal variables (mean, min, and max temperature, mean HDD and CDD, as well as their quadratic terms). These aggregates, enriched with a 14-day lag memory, are specifically used to correct the base model's errors. Over the test period, the final MAE drops to about 24,300 MWh—an improvement of roughly 7.6% compared to the best weather-free model, and a performance level competitive with the D-1 proxy forecast published by RTE.
This tutorial emphasizes methodological rigor (strict time validation, physically motivated feature engineering, modular architecture) and the importance of properly structuring the problem rather than relying on algorithmic complexity. It provides all the necessary code and discusses the adjustments required for operational deployment.
It is tailored for students, data scientists, and energy sector professionals looking to master a progressive, interpretable approach applied to time-series forecasting on a comprehensive, real-world case study.