Appendices

Source Code and Data Repository

The source code is available in the following GitHub repository: [Link].

Question: Since the thermal.db database is 1.4 GB, should it also be hosted on GitHub, or should it be published elsewhere (e.g., Zenodo or another platform)?

SQLite Database DDL

--
-- File generated with SQLiteStudio v3.4.17 on mar. mai 12 14:36:21 2026
--
-- Text encoding used: System
--
PRAGMA foreign_keys = off;
BEGIN TRANSACTION;

-- Table: feature_series
CREATE TABLE IF NOT EXISTS feature_series (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    site_id INTEGER NOT NULL,
    var_name TEXT NOT NULL,
    var_description TEXT,
    unit TEXT,
    short_var_name TEXT,
    series BLOB NOT NULL,
    start_date TEXT NOT NULL,
    end_date TEXT NOT NULL,
    total_values INTEGER NOT NULL,
    original_nan_count INTEGER NOT NULL,
    nan_percentage REAL NOT NULL,
    max_consecutive_nans INTEGER NOT NULL,
    nans_interpolated INTEGER NOT NULL,
    interpolation_method TEXT,
    data_quality TEXT NOT NULL,
    quality_score REAL NOT NULL,
    import_timestamp TEXT NOT NULL,
    origin TEXT,
    FOREIGN KEY (site_id) REFERENCES sites(site_id),
    UNIQUE(site_id, var_name)
);

-- Table: import_metadata
CREATE TABLE IF NOT EXISTS import_metadata (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    import_date TEXT NOT NULL,
    period_start TEXT NOT NULL,
    period_end TEXT NOT NULL,
    max_nan_threshold REAL NOT NULL,
    total_files_processed INTEGER NOT NULL,
    total_series_imported INTEGER NOT NULL,
    processing_time_seconds REAL NOT NULL
);

-- Table: sites
CREATE TABLE IF NOT EXISTS sites (
    site_id INTEGER PRIMARY KEY AUTOINCREMENT,
    latitude REAL NOT NULL,
    longitude REAL NOT NULL,
    UNIQUE(latitude, longitude)
);

-- Index: idx_sites_coords
CREATE INDEX IF NOT EXISTS idx_sites_coords
ON sites(latitude, longitude);

-- Index: var_name_idx
CREATE INDEX IF NOT EXISTS var_name_idx
ON feature_series(var_name);

COMMIT TRANSACTION;
PRAGMA foreign_keys = on;

Transitioning from Tutorial to Production

We cannot emphasize this enough: the model presented in this tutorial remains an offline prototype, trained and evaluated using perfect ERA5 reanalysis data. To utilize it in operational conditions (Daily D+1 forecasting), several concrete adaptations are required:

Replace ERA5 temperatures with actual D+1 weather forecasts (from Météo-France, ECMWF, or a data provider). This will inevitably result in a slight degradation in performance, varying according to the quality of that day's weather forecast. This degradation could potentially be offset by using a larger number of meteorological variables (wind speed, wind chill, solar radiation, cloud cover, \(T_{max}\), \(T_{min}\), etc.). Using several of these variables, potentially across different geographical sites, often improves robustness.
Implement regular model recalibration (every 1 to 3 months) using the most recent data to account for gradual shifts in consumption behavior.
Establish continuous performance monitoring (rolling MAE, residual analysis by day type: heatwaves, cold snaps, holiday bridges, etc.) to quickly detect any degradation.
Automate the entire pipeline: including RTE and weather data retrieval, feature engineering, inference, and forecast distribution.

This modular architecture—a stable base model supplemented by a residual corrector—offers the advantage of being relatively easy to maintain and evolve over time.