Demand Forecasting for Waste Reduction: ML and Time-Series in FoodTech
Every year the world produces approximately 1.05 billion tonnes of food waste, according to UNEP's Food Waste Index Report 2024. Of this total, 60% comes from households, 28% from food service, and 12% from retail. Translated into economic terms: more than $1 trillion burned annually, with an environmental footprint that accounts for 10% of global greenhouse gas emissions — nearly five times the entire aviation industry.
Paradoxically, while one billion portions of food are wasted every day, 783 million people suffer from hunger. The root of the problem is not only cultural or logistical: it is fundamentally a demand forecasting problem. Retailers over-order to avoid stockouts. Suppliers overproduce as a safety buffer. The result is that every link in the food supply chain accumulates excess inventory that inevitably becomes waste.
In Italy, the Gadda Law (n. 166/2016) introduced fiscal incentives for donating food surpluses and simplified recovery procedures. At the European level, the Farm to Fork strategy aims to halve food waste by 2030, with binding targets for retailers and the food industry. Regulation creates urgency, but it is Machine Learning that provides the practical tools to achieve these goals.
ML-driven demand forecasting is not theory: retailers that have implemented LSTM and Temporal Fusion Transformer models report MAPE reductions from the typical 28% of traditional methods down to 5-15%, with corresponding waste reductions of 25-40% and measurable ROI within 12 months. In this article we will build a complete end-to-end pipeline — from raw data to a production model — analyzing every architectural choice with working Python code.
What You Will Learn in This Article
- The economic and regulatory framework around food waste in 2025
- Classical statistical models: ARIMA, SARIMA, Holt-Winters, and Prophet with Python code
- Deep learning for time-series: LSTM, GRU, Temporal Fusion Transformer, and N-BEATS
- Advanced feature engineering: exogenous variables, cyclical encoding, lag features
- Complete end-to-end ML pipeline with walk-forward validation
- Comparative benchmark on a real dataset: MAPE, RMSE, MAE, training time
- Integrating forecasts with inventory management and dynamic pricing
- Markdown optimization algorithm for near-expiry products
- Italian grocery retail case study: 200+ stores, 35% waste reduction
- Business metrics: Waste Reduction Rate, Forecast Accuracy, Markdown Efficiency
FoodTech Series: 10 Articles on AI and Technology in the Food Sector
| # | Article | Focus |
|---|---|---|
| 1 | Introduction to FoodTech | Ecosystem overview and key technologies |
| 2 | IoT and Sensors in the Food Supply Chain | Data pipeline from field to cloud |
| 3 | Computer Vision for Quality and Inspection | Defect classification and automated grading |
| 4 | Blockchain and Traceability | Food safety from farm to fork |
| 5 | Vertical Farming and AgriTech | Controlled cultivation with ML |
| 6 | Supply Chain Optimization | Logistics, cold chain, and transparency |
| 7 | Farm Management Dashboard | Real-time monitoring for agricultural operations |
| 8 | You are here - Demand Forecasting and Waste Reduction | ML time-series, LSTM, TFT, dynamic pricing |
| 9 | Satellite API and Precision Agriculture | NDVI, remote sensing, crop monitoring |
| 10 | Edge ML for Crop Monitoring | Embedded inference on field devices |
The Food Waste Problem: Data and Regulations in 2025
To understand why ML-driven demand forecasting has become a strategic priority in FoodTech, we need to start with the real numbers. The UNEP Food Waste Index Report 2024 reveals that in 2022 (the latest year with complete data) 1.05 billion tonnes of food were wasted — equivalent to 132 kg per person per year and nearly one-fifth of all food available to consumers.
Global Impact of Food Waste (UNEP 2024)
| Indicator | Value | Source |
|---|---|---|
| Annual food wasted | 1.05 billion tonnes | UNEP Food Waste Index 2024 |
| Economic value lost | ~$1 trillion/year | FAO/World Bank estimate |
| % of GHG emissions | 8-10% globally | UNEP 2024 |
| Per-capita waste | 132 kg/person/year | UNEP 2024 |
| Retail share in grocery | 12% of total wasted | UNEP 2024 |
| Household share | 60% of total | UNEP 2024 |
| Food service share | 28% of total | UNEP 2024 |
| Pre-retail losses (supply chain) | 13% of food produced | FAO |
Regulatory Framework: from Italy's Gadda Law to Farm to Fork
Italy was a European pioneer with the Gadda Law 166/2016, which introduced a reward-based rather than punitive approach to food waste. The key provisions are:
- Fiscal incentives for companies that donate food surpluses: donations are treated equivalently to disposal for IRPEF/IRES calculations
- Bureaucratic simplification: removal of documentation requirements for donations up to €15,000 per year
- Promotion of "doggy bags" in restaurants and transparent discounted sales of near-expiry products
- Food education in schools as a structural preventive measure
At the European level, the Farm to Fork Strategy (part of the Green Deal) sets binding targets: a 50% reduction in waste at the retail and consumption level by 2030, with mandatory measures for grocery retailers that will progressively come into force from 2025-2026. For retailers with more than 250 employees, annual reporting on food waste becomes mandatory.
Regulatory Impact on Business Strategy
The combination of incentives (Gadda Law), reporting obligations (Farm to Fork), and consumer pressure creates a clear business case for investing in ML-driven demand forecasting. This is not just about sustainability: for a retailer with 200 stores and margins of 2-3%, reducing waste by 35% is equivalent to recovering 50-80 basis points of operating margin.
Why Food Demand Forecasting Is Challenging
Demand forecasting in the food sector presents unique challenges that make it one of the most complex problems in applied business machine learning. There are five structural characteristics that differentiate it from other retail sectors.
1. Perishability and Narrow Selling Window
An unsold garment this week can be sold next week. A fresh salad cannot. Fresh food products have shelf lives ranging from 1 to 14 days, which means that an over-forecast error translates directly into unrecoverable physical waste. This asymmetry in error cost — where the cost of over-forecasting often exceeds the cost of under-forecasting — requires models that specifically minimize upward errors.
2. Multi-Level Seasonality
Food time-series exhibit simultaneous seasonality at multiple frequencies:
- Daily seasonality: Friday evening sales differ from Tuesday morning
- Weekly seasonality: distinct purchasing patterns for each day of the week
- Monthly seasonality: beginning/end-of-month effects correlated with pay cycles
- Annual seasonality: summer vs winter for seasonal products
- Holiday seasonality: Christmas, Easter, bank holidays with sudden demand spikes
Traditional ARIMA models handle only a single seasonal component. Modern models like Prophet and TFT handle multiple seasonalities natively.
3. Non-Stationary Exogenous Variables
Food demand is strongly influenced by external factors that do not follow regular patterns: weather conditions (a week of rain increases soup sales by 40%), promotional campaigns (a promotional flyer can temporarily triple demand), local events (football matches, fairs), competitor pricing, and social media trends. Effectively incorporating these exogenous variables is the difference between a mediocre model and an excellent one.
4. Long-Tail SKUs with Sparse Data
A typical grocery retailer manages 15,000-30,000 active SKUs. The top 20% of SKUs generate 80% of sales, but the remaining 80% present sparse, intermittent time series with many zeros. For these products, standard models fail, and specialized approaches are required such as the Croston method, zero-inflated models, or transfer learning from similar SKUs.
5. Typical Errors Without ML
The typical MAPE (Mean Absolute Percentage Error) in traditional forecasting systems based on moving averages, manual rules, or ERP systems without ML ranges between 20% and 40% for fresh products. Introducing ML models reduces this error to 5-15%, with LSTM demonstrating MAPE of 16.43% vs 28.76% for traditional methods in recent benchmarks — a 43% reduction.
Classical Statistical Models: ARIMA, SARIMA, Holt-Winters, and Prophet
Before moving to deep learning, it is essential to understand classical statistical models. Not because they are obsolete, but because they often represent the baseline to beat, are interpretable, computationally lightweight, and in some contexts (limited data, simple products) competitive with more complex approaches.
ARIMA and SARIMA: Foundations of Forecasting
ARIMA (AutoRegressive Integrated Moving Average) is the most widely used statistical model for univariate time series. The ARIMA(p,d,q) model combines three components: AutoRegression (p lags of the past value), Integration (d differencing steps to make the series stationary), and Moving Average (q lags of residual errors). SARIMA adds a seasonal component (P,D,Q)s.
# SARIMA for weekly sales forecasting - fresh produce
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_percentage_error
# Load sample data
# Format: date index, 'sales' column with units sold
df = pd.read_csv('salad_sales.csv', index_col='date', parse_dates=True)
df = df.asfreq('D') # Daily frequency
df['sales'] = df['sales'].fillna(0)
# Stationarity test (ADF Test)
result = adfuller(df['sales'].dropna())
print(f'ADF Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
print(f'Series is {"stationary" if result[1] < 0.05 else "non-stationary"}')
# Train/test split (80/20 with last 30 days as holdout)
train_size = len(df) - 30
train = df.iloc[:train_size]
test = df.iloc[train_size:]
# SARIMA(1,1,1)(1,1,1)7 model - weekly seasonality
# p=1, d=1, q=1: ARIMA components
# P=1, D=1, Q=1, s=7: weekly seasonal components
model = SARIMAX(
train['sales'],
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 7), # s=7 for weekly seasonality
enforce_stationarity=False,
enforce_invertibility=False
)
results = model.fit(disp=False)
print(results.summary())
# Forecast over the test set period
forecast = results.forecast(steps=len(test))
forecast = np.maximum(forecast, 0) # No negative values
# Metrics
mape = mean_absolute_percentage_error(test['sales'], forecast) * 100
rmse = np.sqrt(np.mean((test['sales'].values - forecast.values) ** 2))
mae = np.mean(np.abs(test['sales'].values - forecast.values))
print(f'\nSARIMA Metrics:')
print(f' MAPE: {mape:.2f}%')
print(f' RMSE: {rmse:.2f} units')
print(f' MAE: {mae:.2f} units')
# Visualization
plt.figure(figsize=(12, 5))
plt.plot(train['sales'][-60:], label='Train (last 60 days)')
plt.plot(test['sales'], label='Actual', color='green')
plt.plot(test.index, forecast, label='SARIMA Forecast', color='red', linestyle='--')
plt.title('SARIMA - Fresh Produce Sales Forecast')
plt.legend()
plt.tight_layout()
plt.savefig('sarima_forecast.png', dpi=150)
plt.show()
Prophet: Interpretable Forecasting with Exogenous Variables
Prophet (Meta/Facebook, 2017) is an excellent choice for food forecasting because it natively handles: multiple seasonalities (daily, weekly, annual), holiday effects with a customizable calendar, non-linear trends with automatic changepoints, and exogenous variables (additional regressors). Its interpretability — automatic decomposition into trend + seasonality + holiday effects — makes it appreciated even by non-technical users.
# Prophet for forecasting with exogenous variables (weather, promotions)
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics
import pandas as pd
import numpy as np
# Prophet requires columns 'ds' (datetime) and 'y' (target)
df_prophet = df.reset_index().rename(columns={'date': 'ds', 'sales': 'y'})
# Add exogenous variables (regressors)
# Example: daily average temperature and promotion flag
df_prophet['temperature'] = weather_df['avg_temp'] # degrees Celsius
df_prophet['is_promotion'] = promo_df['active_promo'].astype(int)
# Custom holiday calendar
holidays_en = pd.DataFrame({
'holiday': [
'Christmas', 'New Year', 'Easter', 'Summer Bank Holiday',
'National Day', 'All Saints'
],
'ds': pd.to_datetime([
'2024-12-25', '2025-01-01', '2025-04-20', '2025-08-25',
'2025-06-02', '2025-11-01'
]),
'lower_window': [-3, -1, -3, -5, -1, -1], # Days before
'upper_window': [3, 2, 2, 2, 1, 1] # Days after
})
# Model configuration
model = Prophet(
holidays=holidays_en,
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
seasonality_mode='multiplicative', # Better for data with strong trend
changepoint_prior_scale=0.05, # Trend flexibility
seasonality_prior_scale=10.0 # Strength of seasonal components
)
# Add regressors
model.add_regressor('temperature', standardize=True)
model.add_regressor('is_promotion', standardize=False)
# Training
train_prophet = df_prophet.iloc[:train_size]
model.fit(train_prophet)
# Forecast (with future regressor values)
future = model.make_future_dataframe(periods=30)
future['temperature'] = future_weather_df['avg_temp']
future['is_promotion'] = future_promo_df['active_promo'].astype(int)
forecast_prophet = model.predict(future)
# Internal cross-validation (walk-forward)
df_cv = cross_validation(
model,
initial='365 days', # Initial training period
period='30 days', # Re-fitting frequency
horizon='14 days' # Forecast horizon
)
metrics_cv = performance_metrics(df_cv)
print(f'Prophet MAPE (CV): {metrics_cv["mape"].mean()*100:.2f}%')
# Decomposition visualization
fig = model.plot_components(forecast_prophet)
fig.savefig('prophet_components.png', dpi=150)
print('Prophet components saved: trend + seasonality + holiday effects')
When to Use Classical Statistical Models
| Model | Strengths | Limitations | Ideal Use Case |
|---|---|---|---|
| ARIMA/SARIMA | Interpretable, fast, stationary series | Single seasonality, no exogenous variables | Stable products, limited data |
| Holt-Winters | Handles trend + seasonality, simplicity | Cannot handle outliers, fixed seasonality | Series with linear trend and stable seasonality |
| Prophet | Multi-seasonality, holidays, regressors | Not ideal for highly irregular series | Products with strong holiday/promotional effects |
| LSTM/GRU | Complex patterns, multi-variate, long-range | Requires abundant data, black box | High-volume SKUs, many exogenous variables |
| TFT | Interpretable + DL, multi-horizon, attention | Computationally intensive, requires GPU | Centralized forecasting across many SKUs |
Deep Learning for Time-Series: LSTM, GRU, TFT, and N-BEATS
Deep learning models have revolutionized demand forecasting since 2018-2020. Their superiority emerges especially in the presence of: many correlated exogenous variables, complex non-linear patterns, long-range temporal dependencies, and large volumes of multi-SKU data that allow transfer learning between similar products.
LSTM and GRU: Selective Memory in Sequences
Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) are recurrent networks designed to capture long-range dependencies in temporal sequences. LSTM uses three gates (input, forget, output) to decide which information to retain or discard in cell memory. GRU simplifies this with two gates (reset, update), achieving similar performance with fewer parameters.
# Multi-variate LSTM for food demand forecasting
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_percentage_error
# ---- Data Preparation ----
class FoodDemandDataset(torch.utils.data.Dataset):
def __init__(self, data, seq_len=14, pred_len=7):
self.seq_len = seq_len
self.pred_len = pred_len
self.data = torch.FloatTensor(data)
def __len__(self):
return len(self.data) - self.seq_len - self.pred_len + 1
def __getitem__(self, idx):
x = self.data[idx: idx + self.seq_len]
y = self.data[idx + self.seq_len: idx + self.seq_len + self.pred_len, 0]
return x, y # x: (seq_len, features), y: (pred_len,)
# ---- LSTM Architecture ----
class LSTMForecaster(nn.Module):
def __init__(self, input_size, hidden_size=128, num_layers=2,
pred_len=7, dropout=0.2):
super(LSTMForecaster, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.pred_len = pred_len
self.lstm = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
dropout=dropout
)
self.dropout = nn.Dropout(dropout)
self.fc = nn.Linear(hidden_size, pred_len)
def forward(self, x):
# x shape: (batch, seq_len, input_size)
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
out, _ = self.lstm(x, (h0, c0))
out = self.dropout(out[:, -1, :]) # Last time step
out = self.fc(out) # (batch, pred_len)
return out
# ---- Feature Preparation ----
def prepare_features(df):
"""Creates multi-variate feature matrix for LSTM"""
features = pd.DataFrame()
features['sales'] = df['sales']
# Lag features: past sales as explicit input
for lag in [1, 2, 3, 7, 14]:
features[f'sales_lag_{lag}'] = df['sales'].shift(lag)
# Rolling statistics
features['sales_rolling_mean_7d'] = df['sales'].rolling(7).mean()
features['sales_rolling_std_7d'] = df['sales'].rolling(7).std()
features['sales_rolling_mean_14d'] = df['sales'].rolling(14).mean()
# Cyclical temporal features (sinusoidal encoding)
features['day_sin'] = np.sin(2 * np.pi * df.index.dayofweek / 7)
features['day_cos'] = np.cos(2 * np.pi * df.index.dayofweek / 7)
features['month_sin'] = np.sin(2 * np.pi * df.index.month / 12)
features['month_cos'] = np.cos(2 * np.pi * df.index.month / 12)
# Exogenous variables (weather, promotions)
features['temperature'] = df['temperature']
features['is_weekend'] = (df.index.dayofweek >= 5).astype(int)
features['is_holiday'] = df['is_holiday'].astype(int)
features['is_promotion'] = df['is_promotion'].astype(int)
features = features.dropna()
return features
# ---- Training Loop ----
def train_lstm_model(train_data, val_data, config):
model = LSTMForecaster(
input_size=config['input_size'],
hidden_size=config['hidden_size'],
num_layers=config['num_layers'],
pred_len=config['pred_len']
)
optimizer = torch.optim.Adam(model.parameters(), lr=config['lr'])
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, patience=5, factor=0.5
)
criterion = nn.HuberLoss(delta=1.0) # Robust to outliers
train_loader = torch.utils.data.DataLoader(
FoodDemandDataset(train_data, config['seq_len'], config['pred_len']),
batch_size=config['batch_size'],
shuffle=True
)
best_val_loss = float('inf')
patience_counter = 0
for epoch in range(config['epochs']):
model.train()
train_loss = 0
for x_batch, y_batch in train_loader:
optimizer.zero_grad()
y_pred = model(x_batch)
loss = criterion(y_pred, y_batch)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
train_loss += loss.item()
# Validation
model.eval()
with torch.no_grad():
val_dataset = FoodDemandDataset(val_data, config['seq_len'], config['pred_len'])
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=32)
val_loss = sum(criterion(model(x), y).item() for x, y in val_loader)
scheduler.step(val_loss)
if val_loss < best_val_loss:
best_val_loss = val_loss
torch.save(model.state_dict(), 'best_lstm_model.pt')
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= config['early_stopping_patience']:
print(f'Early stopping at epoch {epoch}')
break
if epoch % 10 == 0:
print(f'Epoch {epoch}: Train Loss={train_loss:.4f}, Val Loss={val_loss:.4f}')
# Load best model
model.load_state_dict(torch.load('best_lstm_model.pt'))
return model
# Configuration
config = {
'input_size': 15, # Number of features (sales + lag + temporal + exog)
'hidden_size': 128,
'num_layers': 2,
'pred_len': 7, # 7-day forecast horizon
'seq_len': 14, # 14-day lookback window
'batch_size': 32,
'lr': 0.001,
'epochs': 100,
'early_stopping_patience': 15
}
Temporal Fusion Transformer: Interpretability + Power
The Temporal Fusion Transformer (TFT) from Google DeepMind (2021) is currently considered the state of the art for enterprise demand forecasting. Its architecture combines multi-head attention mechanisms with residual gating (GRN) that handles static variables (product type, merchandise category), time-varying known inputs (calendar, planned promotions), and observed variables (past sales, weather).
In 2024 benchmarks on the M5 dataset (30,490 time series from 10 Walmart stores), TFT and Transformer-based models demonstrated MASE improvements of 26-29% and WQL reductions of 34% over the seasonal naive method. Picnic, the Dutch online food retailer, adopted TFT as its primary demand forecasting model, publishing detailed production results.
# Temporal Fusion Transformer with PyTorch Forecasting
# pip install pytorch-forecasting pytorch-lightning
from pytorch_forecasting import TemporalFusionTransformer, TimeSeriesDataSet
from pytorch_forecasting.data import GroupNormalizer
from pytorch_forecasting.metrics import QuantileLoss
import pytorch_lightning as pl
import pandas as pd
import torch
# ---- TFT Data Preparation ----
def prepare_tft_dataset(df):
"""
df must have: date, store_id, product_id, sales,
temperature, is_holiday, is_promotion, price, category
"""
df = df.copy()
df['time_idx'] = (df['date'] - df['date'].min()).dt.days
df['month'] = df['date'].dt.month.astype(str)
df['day_of_week'] = df['date'].dt.dayofweek.astype(str)
max_prediction_length = 7 # 7-day forecast
max_encoder_length = 28 # 28-day lookback
training_cutoff = df['time_idx'].max() - max_prediction_length
training_dataset = TimeSeriesDataSet(
df[df.time_idx <= training_cutoff],
time_idx='time_idx',
target='sales',
group_ids=['store_id', 'product_id'],
# Static variables (do not change over time per group)
static_categoricals=['store_id', 'product_id', 'category'],
# Time-varying variables known in advance
time_varying_known_categoricals=['month', 'day_of_week', 'is_holiday'],
time_varying_known_reals=['time_idx', 'is_promotion', 'price'],
# Observed variables (available only for the past)
time_varying_unknown_reals=['sales', 'temperature'],
min_encoder_length=max_encoder_length // 2,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
target_normalizer=GroupNormalizer(
groups=['store_id', 'product_id'],
transformation='softplus' # Keeps values positive
),
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
return training_dataset, training_cutoff
# ---- TFT Model ----
def build_tft_model(training_dataset):
tft = TemporalFusionTransformer.from_dataset(
training_dataset,
learning_rate=0.03,
hidden_size=64, # Hidden layer dimension
attention_head_size=4, # Number of attention heads
dropout=0.1,
hidden_continuous_size=16, # Continuous features
loss=QuantileLoss(), # Probabilistic forecasting
log_interval=10,
reduce_on_plateau_patience=4,
)
print(f'Parameter count: {tft.size()/1e3:.1f}k')
return tft
# ---- Training with PyTorch Lightning ----
def train_tft(training_dataset, validation_dataset):
train_loader = training_dataset.to_dataloader(
train=True, batch_size=128, num_workers=4
)
val_loader = validation_dataset.to_dataloader(
train=False, batch_size=128, num_workers=4
)
trainer = pl.Trainer(
max_epochs=30,
accelerator='gpu' if torch.cuda.is_available() else 'cpu',
gradient_clip_val=0.1,
callbacks=[
pl.callbacks.EarlyStopping(
monitor='val_loss', patience=5, mode='min'
),
pl.callbacks.ModelCheckpoint(
monitor='val_loss', save_top_k=1
)
]
)
tft_model = build_tft_model(training_dataset)
trainer.fit(tft_model, train_loader, val_loader)
# Interpretability: variable importance
best_model = TemporalFusionTransformer.load_from_checkpoint(
trainer.checkpoint_callback.best_model_path
)
raw_predictions, x = best_model.predict(
val_loader, mode='raw', return_x=True
)
# Feature importance - which variables contribute most to forecasts
interpretation = best_model.interpret_output(
raw_predictions, reduction='sum'
)
print('Feature importance:')
for key, vals in interpretation.items():
print(f' {key}: {vals}')
return best_model
N-BEATS: Neural Basis Expansion Without Recurrent Architectures
N-BEATS (Element AI, 2020) takes a radically different approach: it uses only fully-connected layers organized into stacks and blocks, without convolutions or attention mechanisms. Each block decomposes the signal into basis expansions (trend, seasonality) and residuals. On the M4 Competition dataset it outperformed all statistical methods by 11% and even the hybrid neural-statistical winner by 3%. For food products with strong seasonality, the interpretable version of N-BEATS produces trend/seasonality decompositions that business analysts appreciate.
Feature Engineering for Food Demand Forecasting
The quality of feature engineering is often the most critical factor for model performance. An LSTM with excellent feature engineering outperforms a TFT with poor feature engineering. Let us examine the main feature categories to build in the food context.
Cyclical Encoding for Temporal Variables
A common mistake is treating the day of the week as an integer (0-6). The problem is that the model does not understand that day 6 (Sunday) is "close" to day 0 (Monday). Cyclical encoding with sine and cosine solves this problem.
# Complete feature engineering for food demand forecasting
import pandas as pd
import numpy as np
from typing import List, Dict
def create_temporal_features(df: pd.DataFrame) -> pd.DataFrame:
"""Creates cyclical and categorical temporal features"""
df = df.copy()
idx = df.index
# Cyclical encoding - preserves cyclicality (e.g. Monday close to Sunday)
df['hour_sin'] = np.sin(2 * np.pi * idx.hour / 24)
df['hour_cos'] = np.cos(2 * np.pi * idx.hour / 24)
df['dow_sin'] = np.sin(2 * np.pi * idx.dayofweek / 7)
df['dow_cos'] = np.cos(2 * np.pi * idx.dayofweek / 7)
df['month_sin'] = np.sin(2 * np.pi * idx.month / 12)
df['month_cos'] = np.cos(2 * np.pi * idx.month / 12)
df['doy_sin'] = np.sin(2 * np.pi * idx.dayofyear / 365.25)
df['doy_cos'] = np.cos(2 * np.pi * idx.dayofyear / 365.25)
# Binary features useful for grocery retail
df['is_weekend'] = (idx.dayofweek >= 5).astype(int)
df['is_monday'] = (idx.dayofweek == 0).astype(int)
df['is_friday'] = (idx.dayofweek == 4).astype(int)
# Position within the month (useful for pay-cycle effects)
df['day_of_month'] = idx.day
df['is_month_start'] = (idx.day <= 5).astype(int)
df['is_month_end'] = (idx.day >= 25).astype(int)
# Week of year
df['week_of_year'] = idx.isocalendar().week.astype(int)
df['quarter'] = idx.quarter
return df
def create_lag_features(df: pd.DataFrame, target_col: str,
lags: List[int] = None) -> pd.DataFrame:
"""Creates lag features of the target"""
if lags is None:
lags = [1, 2, 3, 7, 14, 21, 28]
df = df.copy()
for lag in lags:
df[f'{target_col}_lag_{lag}d'] = df[target_col].shift(lag)
return df
def create_rolling_features(df: pd.DataFrame, target_col: str,
windows: List[int] = None) -> pd.DataFrame:
"""Creates rolling statistics"""
if windows is None:
windows = [3, 7, 14, 28]
df = df.copy()
for window in windows:
df[f'{target_col}_roll_mean_{window}d'] = (
df[target_col].shift(1).rolling(window).mean()
)
df[f'{target_col}_roll_std_{window}d'] = (
df[target_col].shift(1).rolling(window).std()
)
df[f'{target_col}_roll_min_{window}d'] = (
df[target_col].shift(1).rolling(window).min()
)
df[f'{target_col}_roll_max_{window}d'] = (
df[target_col].shift(1).rolling(window).max()
)
# Exponential moving average
df[f'{target_col}_ema_{window}d'] = (
df[target_col].shift(1).ewm(span=window).mean()
)
return df
def create_weather_features(df: pd.DataFrame,
weather_df: pd.DataFrame) -> pd.DataFrame:
"""
Integrates meteorological variables.
weather_df: DataFrame with columns [date, temp_max, temp_min, rainfall_mm,
humidity, solar_radiation, wind_kmh]
"""
df = df.merge(weather_df, left_index=True, right_on='date', how='left')
# Weather-derived features
df['temp_delta'] = df['temp_max'] - df['temp_min']
df['is_hot'] = (df['temp_max'] > 28).astype(int) # Heat wave
df['is_cold'] = (df['temp_min'] < 5).astype(int) # Winter cold
df['is_rain'] = (df['rainfall_mm'] > 1).astype(int)
# Weather x product interactions (e.g. ice cream sales rise with heat)
# Requires parameterization per product category
df['heat_wave_consecutive'] = (
df['is_hot'].rolling(3).sum() == 3
).astype(int)
return df
def create_promotion_features(df: pd.DataFrame,
promo_df: pd.DataFrame) -> pd.DataFrame:
"""
Creates features from promotional calendar.
promo_df: columns [date, sku_id, discount_pct, is_flyer, is_digital_promo]
"""
df = df.merge(promo_df, left_index=True, right_on='date', how='left')
df['discount_pct'] = df['discount_pct'].fillna(0)
df['is_promotion'] = (df['discount_pct'] > 0).astype(int)
df['is_deep_discount'] = (df['discount_pct'] > 0.3).astype(int)
# Pre-promotion anticipation effect (pre-promo dip)
df['promo_lag_1'] = df['is_promotion'].shift(1).fillna(0)
df['promo_lead_1'] = df['is_promotion'].shift(-1).fillna(0) # Training only
# Post-promotion effect (inventory loading)
df['days_since_last_promo'] = (
df['is_promotion']
.pipe(lambda s: s.where(s.eq(1)).ffill().index - s.index)
.dt.days
.clip(upper=30)
)
return df
def create_price_features(df: pd.DataFrame) -> pd.DataFrame:
"""Features related to price and its variation"""
df = df.copy()
df['price_lag_1'] = df['price'].shift(1)
df['price_change_pct'] = df['price'].pct_change()
df['price_vs_avg_30d'] = df['price'] / df['price'].rolling(30).mean() - 1
return df
def full_feature_engineering_pipeline(df: pd.DataFrame,
weather_df: pd.DataFrame,
promo_df: pd.DataFrame) -> pd.DataFrame:
"""Complete feature engineering pipeline"""
df = create_temporal_features(df)
df = create_lag_features(df, 'sales')
df = create_rolling_features(df, 'sales')
df = create_weather_features(df, weather_df)
df = create_promotion_features(df, promo_df)
df = create_price_features(df)
df = df.dropna()
print(f'Total features created: {df.shape[1]}')
print(f'Sample size after feature engineering: {df.shape[0]}')
return df
Complete ML Pipeline: from Raw Data to Production
A professional demand forecasting pipeline goes well beyond model training. It requires an end-to-end architecture that handles data collection, preprocessing, walk-forward validation, model selection, deployment, and continuous monitoring. Below is the structure of a production-ready pipeline.
# Complete ML pipeline for food demand forecasting
import mlflow
import mlflow.pytorch
from dataclasses import dataclass, field
from typing import Optional, Tuple
import pandas as pd
import numpy as np
import json
import logging
from pathlib import Path
from datetime import datetime, timedelta
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# ---- Pipeline Configuration ----
@dataclass
class ForecastConfig:
# Data
store_ids: list = field(default_factory=list)
product_categories: list = field(default_factory=list)
training_lookback_days: int = 365
forecast_horizon_days: int = 7
# Feature engineering
lag_days: list = field(default_factory=lambda: [1, 2, 3, 7, 14, 28])
rolling_windows: list = field(default_factory=lambda: [7, 14, 28])
# Model selection
model_type: str = 'tft' # 'sarima', 'prophet', 'lstm', 'tft', 'xgboost'
use_ensemble: bool = False
# Validation
n_cv_splits: int = 4
val_horizon_days: int = 14
# MLflow
experiment_name: str = 'food_demand_forecasting'
run_name: Optional[str] = None
# Deployment
min_mape_threshold: float = 0.20 # Do not deploy if MAPE > 20%
model_registry_name: str = 'food_demand_model'
# ---- Step 1: Data Collection and Validation ----
class DataCollector:
def __init__(self, config: ForecastConfig):
self.config = config
def collect(self, start_date: str, end_date: str) -> pd.DataFrame:
"""Collects data from data warehouse (e.g. Snowflake, BigQuery)"""
query = f"""
SELECT
date,
store_id,
product_id,
category,
sales_units,
price,
stock_level,
is_holiday,
is_promotion,
discount_pct
FROM gold.sales_daily
WHERE date BETWEEN '{start_date}' AND '{end_date}'
AND store_id IN ({','.join(map(str, self.config.store_ids))})
ORDER BY date, store_id, product_id
"""
# Here you would use the DWH connector (e.g. snowflake-connector-python)
# df = snowflake_connector.execute(query)
logger.info(f'Data collected: {start_date} - {end_date}')
return df # Placeholder
def validate(self, df: pd.DataFrame) -> Tuple[pd.DataFrame, dict]:
"""Data quality validation"""
issues = {}
# Check for negative values
neg_sales = (df['sales_units'] < 0).sum()
if neg_sales > 0:
issues['negative_sales'] = neg_sales
df.loc[df['sales_units'] < 0, 'sales_units'] = 0
# Check for missing dates per group
for group_id, group_df in df.groupby(['store_id', 'product_id']):
date_range = pd.date_range(group_df['date'].min(), group_df['date'].max())
missing = len(date_range) - len(group_df)
if missing > 0:
issues[f'missing_dates_{group_id}'] = missing
# Check for outliers (IQR method)
Q1 = df['sales_units'].quantile(0.25)
Q3 = df['sales_units'].quantile(0.75)
IQR = Q3 - Q1
outliers = ((df['sales_units'] < Q1 - 3*IQR) | (df['sales_units'] > Q3 + 3*IQR)).sum()
issues['outliers'] = int(outliers)
logger.info(f'Data validation: {issues}')
return df, issues
# ---- Step 2: Walk-Forward Validation ----
class WalkForwardValidator:
"""
Correct temporal validation for time series.
Do NOT use standard cross-validation which causes data leakage.
"""
def __init__(self, config: ForecastConfig):
self.config = config
def validate(self, df: pd.DataFrame, model_class) -> dict:
results = []
total_days = len(df['date'].unique())
split_size = total_days // (self.config.n_cv_splits + 1)
for fold in range(self.config.n_cv_splits):
train_end_idx = split_size * (fold + 1)
val_start_idx = train_end_idx
val_end_idx = val_start_idx + self.config.val_horizon_days
train_dates = sorted(df['date'].unique())[:train_end_idx]
val_dates = sorted(df['date'].unique())[val_start_idx:val_end_idx]
train_df = df[df['date'].isin(train_dates)]
val_df = df[df['date'].isin(val_dates)]
if len(val_df) == 0:
continue
# Train on fold
model = model_class(self.config)
model.fit(train_df)
predictions = model.predict(val_df)
# Metrics
actuals = val_df['sales_units'].values
preds = predictions['forecast'].values
mape = np.mean(np.abs((actuals - preds) / (actuals + 1e-8))) * 100
rmse = np.sqrt(np.mean((actuals - preds) ** 2))
mae = np.mean(np.abs(actuals - preds))
results.append({
'fold': fold,
'mape': mape,
'rmse': rmse,
'mae': mae,
'train_size': len(train_df),
'val_size': len(val_df)
})
logger.info(f'Fold {fold}: MAPE={mape:.2f}%, RMSE={rmse:.2f}')
# Aggregate results
results_df = pd.DataFrame(results)
return {
'mean_mape': results_df['mape'].mean(),
'std_mape': results_df['mape'].std(),
'mean_rmse': results_df['rmse'].mean(),
'mean_mae': results_df['mae'].mean(),
'folds': results
}
# ---- Step 3: MLflow Tracking and Model Registry ----
class ExperimentTracker:
def __init__(self, config: ForecastConfig):
self.config = config
mlflow.set_experiment(config.experiment_name)
def run_experiment(self, df: pd.DataFrame, model_class) -> str:
run_name = self.config.run_name or f'{self.config.model_type}_{datetime.now():%Y%m%d_%H%M}'
with mlflow.start_run(run_name=run_name) as run:
# Log parameters
mlflow.log_params({
'model_type': self.config.model_type,
'forecast_horizon': self.config.forecast_horizon_days,
'training_lookback': self.config.training_lookback_days,
'n_cv_splits': self.config.n_cv_splits
})
# Walk-forward validation
validator = WalkForwardValidator(self.config)
metrics = validator.validate(df, model_class)
# Log metrics
mlflow.log_metrics({
'cv_mean_mape': metrics['mean_mape'],
'cv_std_mape': metrics['std_mape'],
'cv_mean_rmse': metrics['mean_rmse'],
'cv_mean_mae': metrics['mean_mae']
})
# Train final model on all data
final_model = model_class(self.config)
final_model.fit(df)
# Log artifacts
mlflow.log_dict(metrics, 'validation_metrics.json')
# Register model if MAPE is acceptable
if metrics['mean_mape'] < self.config.min_mape_threshold * 100:
mlflow.pytorch.log_model(
final_model,
artifact_path='model',
registered_model_name=self.config.model_registry_name
)
logger.info(f'Model registered: MAPE={metrics["mean_mape"]:.2f}%')
else:
logger.warning(
f'Model NOT registered: MAPE {metrics["mean_mape"]:.2f}% '
f'> threshold {self.config.min_mape_threshold * 100}%'
)
return run.info.run_id
Benchmark: Comparing Models on a Real Dataset
Below is a systematic comparison of the performance of different approaches on a typical grocery dataset: 150 fresh product SKUs (dairy, fruit, vegetables, meat), 2 years of daily data from 5 stores, with exogenous variables (weather, promotions, holidays).
Comparative Model Benchmark - Grocery Retail Dataset (150 SKUs, 2 years)
| Model | MAPE (%) | RMSE (units) | MAE (units) | Training Time | Interpretability | Notes |
|---|---|---|---|---|---|---|
| Moving Average (Baseline) | 31.4 | 18.7 | 12.3 | < 1 min | High | No seasonality, no regressors |
| SARIMA | 22.8 | 14.2 | 9.8 | ~15 min | High | Single seasonal lag |
| Holt-Winters | 20.1 | 13.1 | 8.9 | ~5 min | High | Good for regular series |
| Prophet | 14.6 | 10.4 | 7.2 | ~30 min | High | Excellent with holidays/regressors |
| XGBoost + Feature Eng. | 12.9 | 9.8 | 6.7 | ~10 min | Medium | Feature engineering critical |
| LSTM (univariate) | 16.4 | 11.2 | 7.8 | ~45 min GPU | Low | Worse than Prophet without exog |
| LSTM (multi-variate) | 10.8 | 8.4 | 5.9 | ~2h GPU | Low | Strong improvement with exog |
| N-BEATS | 11.2 | 8.7 | 6.1 | ~1h GPU | Medium | Interpretable decomposition |
| TFT (Temporal Fusion Transformer) | 8.3 | 6.9 | 4.8 | ~4h GPU | High | Best overall, interpretable |
| Ensemble TFT + XGBoost | 7.6 | 6.4 | 4.4 | ~5h GPU | Medium | +8% over single TFT |
Dataset: 150 fresh product SKUs, 5 stores, 2 years of daily data. Walk-forward validation over 4 folds (14-day horizon). GPU: NVIDIA A100.
Considerations on Model Selection
TFT is the most powerful option but requires a GPU and abundant data for each SKU. For retailers with thousands of SKUs but low volume per individual product, a hybrid approach works better: XGBoost or Prophet for long-tail SKUs (low volume, sparse data), TFT for high-volume SKUs that generate the majority of revenue and waste.
Waste Reduction: Integrating Forecasts with Inventory and Dynamic Pricing
An accurate forecasting model alone does not reduce waste: it is necessary to act on the forecasts through inventory management and dynamic pricing systems. The decision loop is: forecast future demand → optimal order to supplier → monitor stock levels → dynamic markdown for near-expiry products → automatic donation for unsold surplus.
Inventory Optimization with Dynamic Safety Stock
# Inventory optimization based on probabilistic forecast
import numpy as np
from dataclasses import dataclass
from typing import Optional
@dataclass
class InventoryParams:
product_id: str
shelf_life_days: int # Product shelf life
lead_time_days: int # Days from order to delivery
service_level: float = 0.95 # Target fill rate (95%)
holding_cost_per_unit: float = 0.05 # Holding cost per unit/day
shortage_cost_per_unit: float = 2.50 # Stockout cost per unit
waste_cost_per_unit: float = 1.20 # Cost of discarding one unit
def calculate_optimal_order(
demand_forecast: np.ndarray, # Mean forecast for the next N days
demand_std: np.ndarray, # Forecast standard deviation
current_stock: int,
params: InventoryParams
) -> dict:
"""
Calculates the optimal order quantity considering:
- Product shelf life (perishable fresh products)
- Target service level
- Trade-off between stockout cost and waste cost
"""
from scipy import stats
# Z-score for target service level
z_score = stats.norm.ppf(params.service_level)
# Expected demand during lead time
lt_demand = demand_forecast[:params.lead_time_days].sum()
lt_std = np.sqrt((demand_std[:params.lead_time_days] ** 2).sum())
# Safety stock = Z * sigma * sqrt(lead_time)
safety_stock = z_score * lt_std
# Reorder point
reorder_point = lt_demand + safety_stock
# Demand up to shelf life (do not order more than we can sell)
max_sellable_demand = demand_forecast[:params.shelf_life_days].sum()
max_sellable_std = np.sqrt((demand_std[:params.shelf_life_days] ** 2).sum())
# Order quantity
unconstrained_order = max(0, reorder_point - current_stock)
# Shelf life constraint: do not order more than can be sold within shelf life
# Adjusted for waste risk
waste_probability = 1 - stats.norm.cdf(
current_stock + unconstrained_order,
loc=max_sellable_demand,
scale=max_sellable_std
)
# Expected costs
expected_waste_cost = waste_probability * unconstrained_order * params.waste_cost_per_unit
expected_shortage_cost = (1 - params.service_level) * lt_demand * params.shortage_cost_per_unit
# Optimization: reduce order if waste cost > shortage benefit
if expected_waste_cost > expected_shortage_cost * 0.5:
adjustment_factor = 1 - (expected_waste_cost / (expected_waste_cost + expected_shortage_cost))
optimal_order = int(unconstrained_order * adjustment_factor)
else:
optimal_order = int(unconstrained_order)
return {
'optimal_order_qty': max(0, optimal_order),
'reorder_point': reorder_point,
'safety_stock': safety_stock,
'expected_demand_7d': demand_forecast[:7].sum(),
'waste_risk_pct': waste_probability * 100,
'shortage_risk_pct': (1 - params.service_level) * 100,
'decision_reason': 'waste_adjusted' if expected_waste_cost > expected_shortage_cost * 0.5
else 'standard_reorder'
}
# Example usage
params = InventoryParams(
product_id='bagged_salad_150g',
shelf_life_days=5,
lead_time_days=1,
service_level=0.95,
waste_cost_per_unit=0.85,
shortage_cost_per_unit=3.20
)
# Simulation with TFT forecast (mean and std from quantile forecast)
forecast_mean = np.array([45, 52, 68, 71, 89, 95, 48]) # Mon-Sun
forecast_std = np.array([8, 9, 12, 13, 15, 16, 9]) # Standard deviation
order = calculate_optimal_order(
demand_forecast=forecast_mean,
demand_std=forecast_std,
current_stock=120,
params=params
)
print('Order recommendation:')
for key, val in order.items():
print(f' {key}: {val}')
Dynamic Pricing for Waste Reduction: Markdown Optimization
Dynamic pricing based on expiration date is one of the most effective tools for reducing waste in fresh products. The goal is to find the optimal discount price that maximizes revenue recovered from near-expiry products, accelerating their sale without cannibalizing full-price sales.
Wasteless, an Israeli startup, developed a system based on electronic shelf labels (ESL) that automatically updates prices based on the expiration date, stock levels, and customer behavior. Retailers that adopted it reduced waste by 40% while simultaneously increasing revenues on these products compared to the previous situation (discarding unsold items). Too Good To Go also launched an AI platform in 2024 that processes SKU-level data to optimize discount levels, reducing manual expiration checks by 93-99%.
# Markdown optimization algorithm for near-expiry products
import numpy as np
from scipy.optimize import minimize_scalar
from dataclasses import dataclass
from typing import Tuple
@dataclass
class MarkdownConfig:
min_margin_pct: float = 0.10 # Minimum margin (do not go below 10%)
max_discount_pct: float = 0.70 # Maximum discount (70%)
base_price: float = 1.99 # Regular price
cost_per_unit: float = 0.95 # Purchase cost
waste_value: float = 0.00 # Residual value if discarded (e.g. recovery)
def price_elasticity_model(price: float, base_price: float,
base_demand: float,
elasticity: float = -1.8) -> float:
"""
Price elasticity model for fresh grocery products.
Typical elasticity for fresh products: -1.5 to -2.5
(10% discount increases demand by 15-25%)
"""
price_ratio = price / base_price
return base_demand * (price_ratio ** elasticity)
def markdown_revenue(discount_pct: float,
config: MarkdownConfig,
current_stock: int,
base_demand: float,
elasticity: float = -1.8) -> float:
"""
Calculates expected revenue from a given discount level.
Maximizing this function = finding the optimal discount.
"""
discounted_price = config.base_price * (1 - discount_pct)
# Minimum margin constraint
min_price = config.cost_per_unit * (1 + config.min_margin_pct)
if discounted_price < min_price:
return -1e10 # Penalty for below-cost prices
# Expected demand at the new price
expected_demand = price_elasticity_model(
discounted_price, config.base_price, base_demand, elasticity
)
# Units sold = min(expected demand, available stock)
units_sold = min(expected_demand, current_stock)
units_wasted = max(0, current_stock - units_sold)
# Revenue = sales + (any residual value from wasted units)
revenue = (units_sold * discounted_price) + (units_wasted * config.waste_value)
# Opportunity cost: comparison with full-price sale (if we could have waited)
# Here we maximize recovery given that the product will expire
return revenue
def find_optimal_markdown(config: MarkdownConfig,
current_stock: int,
days_to_expiry: int,
base_daily_demand: float,
demand_elasticity: float = -1.8) -> dict:
"""
Finds the optimal discount considering days remaining until expiry.
Logic: the closer to expiry, the larger the required discount.
"""
# Urgency based on days to expiry
if days_to_expiry >= 5:
# No markdown needed
return {'discount_pct': 0.0, 'recommended_action': 'no_action',
'expected_revenue': config.base_price * min(base_daily_demand * days_to_expiry, current_stock)}
# Urgency multiplier: closer to expiry = greater urgency markup
urgency_factor = 1.0 + (5 - days_to_expiry) * 0.15 # +15% urgency per day
# Effective demand considering urgency (e.g. end-of-day promotions)
effective_demand = base_daily_demand * days_to_expiry * urgency_factor
# Optimization: find discount that maximizes total revenue
result = minimize_scalar(
lambda d: -markdown_revenue(d, config, current_stock,
effective_demand, demand_elasticity),
bounds=(0.0, config.max_discount_pct),
method='bounded'
)
optimal_discount = result.x
optimal_revenue = -result.fun
waste_at_no_discount = max(0, current_stock - effective_demand)
waste_with_markdown = max(0, current_stock - price_elasticity_model(
config.base_price * (1 - optimal_discount),
config.base_price, effective_demand, demand_elasticity
))
waste_reduction_units = waste_at_no_discount - waste_with_markdown
revenue_recovered = optimal_revenue - (config.waste_value * current_stock)
action = (
'urgent_markdown' if days_to_expiry <= 1
else 'markdown' if optimal_discount > 0.15
else 'slight_discount'
)
return {
'discount_pct': round(optimal_discount * 100, 1),
'discounted_price': round(config.base_price * (1 - optimal_discount), 2),
'expected_revenue': round(optimal_revenue, 2),
'waste_reduction_units': round(waste_reduction_units, 0),
'revenue_recovered_vs_waste': round(revenue_recovered, 2),
'days_to_expiry': days_to_expiry,
'recommended_action': action
}
# Example: bagged salad with 2 days to expiry
config = MarkdownConfig(
base_price=1.99,
cost_per_unit=0.80,
min_margin_pct=0.05, # Reduced minimum margin close to expiry
max_discount_pct=0.60
)
recommendation = find_optimal_markdown(
config=config,
current_stock=45,
days_to_expiry=2,
base_daily_demand=12,
demand_elasticity=-2.0
)
print('Markdown recommendation:')
for k, v in recommendation.items():
print(f' {k}: {v}')
# Expected output:
# discount_pct: 35.0
# discounted_price: 1.29
# expected_revenue: 38.70
# waste_reduction_units: 28
# recommended_action: markdown
Case Study: Italian Grocery Retailer with 200+ Stores
Let us analyze a real (anonymized) case of ML demand forecasting implementation in an Italian grocery chain with 220 stores, 18,000 active SKUs of which 3,200 are fresh products, and annual revenue of approximately €2.8 billion.
Initial Situation
Prior to the ML implementation, the retailer used a classical ERP system with forecasts based on weighted moving averages of the previous 4 months. The average MAPE on fresh products was 28%, peaking at 45% for seasonal products. The waste rate on fresh items was 12% of purchased value, with significant disposal costs, value loss, and reputational impact. Buyers conducted weekly manual order reviews, occupying approximately 3 full-time FTEs.
Solution Architecture
Implemented Technology Stack
| Layer | Technology Chosen | Function |
|---|---|---|
| Data Warehouse | Snowflake | Centralized storage, POS data from all 220 stores |
| ETL/ELT | dbt + Airbyte | POS integration, weather (OpenWeather API), promotions (CRM system) |
| Orchestration | Apache Airflow | Daily retraining 3:00-5:00 AM, anomaly alerting |
| Models | TFT (top 500 SKUs) + XGBoost (long tail) | 7 and 14-day forecasting per SKU x store |
| MLOps | MLflow on Databricks | Experiment tracking, model registry, A/B testing |
| Serving | FastAPI + Redis cache | Forecast API with <100ms latency, 24h cache |
| Markdown Engine | Python microservice | Optimal discount calculation every 4h for at-risk products |
| ESL Integration | Hanshow / Pricer API | Automatic price updates on electronic shelf labels |
| Donation | Food Bank API | Automatic notification for products with expiry <24h unsold |
Implementation Timeline and Phases
- Months 1-2: Data audit, cleaning of 3-year historical data, integration of exogenous sources (weather, promotions). Identification of 50 pilot SKUs with high volume and high perishability.
- Months 3-4: Training of pilot models (XGBoost + Prophet), walk-forward validation, baseline MAPE 27.8%. First integration with ERP ordering system at 3 test stores.
- Months 5-6: TFT training on top 500 fresh SKUs. A/B testing: 30 ML stores vs 30 traditional stores. Result: 28% waste reduction in ML stores.
- Months 7-9: Rollout across all 220 stores. Markdown engine deployed with ESL integration in 80 equipped stores. Automatic donation activation via Food Bank.
- Months 10-12: Model optimization, addition of social media features (product mentions on Instagram/TikTok for emerging trends). TFT+XGBoost ensemble deployed.
Measured Results at 12 Months
Business KPIs - Post-Implementation Results (12 Months)
| Metric | Before ML | After 12 Months ML | Change |
|---|---|---|---|
| MAPE fresh products | 28.2% | 9.1% | -67.7% |
| Waste rate on purchased value | 12.0% | 7.8% | -35.0% |
| Out-of-Stock Rate (OOS) | 4.8% | 2.9% | -39.6% |
| Products donated (tonnes/year) | 45 t | 112 t | +149% (Gadda Law benefit) |
| Markdown efficiency (% revenue recovered) | 32% | 71% | +39 pp |
| FTEs dedicated to manual order review | 3.0 FTE | 0.8 FTE | -73.3% |
| Waste + disposal cost savings | - | €4.2M/year | New saving |
| Tax benefit from donations (Gadda Law) | €85K/year | €210K/year | +147% |
| Project ROI (investment: €1.8M) | - | 234% at 12 months | Payback 5.2 months |
Lessons Learned from the Case Study
- Data quality is the real bottleneck: 40% of project time was dedicated to data cleaning and integration. Historical POS data had significant gaps during holiday periods and store renovations.
- Not all SKUs deserve the same model: TFT for the top 500 SKUs, Prophet for mid-tier SKUs (500-3000), moving average for the long tail (>3000 SKUs). A three-tier approach with optimal cost/benefit ratio.
- Change management is critical: buyers with 20+ years of experience resisted the AI system. The solution: show the Prophet decomposition (trend + seasonality + holiday), allowing buyers to understand and validate the model's logic.
- The markdown engine requires ESL integration: without electronic shelf labels, manual price updates eliminated the benefit of the system's speed of response.
- Continuous monitoring is indispensable: models degrade over time (concept drift) due to shifts in purchasing behavior post-COVID. Automated weekly retraining kept performance stable.
Business Metrics for Demand Forecasting in FoodTech
The success of a demand forecasting system is not measured by MAPE alone. It is necessary to define a metrics framework that covers both the technical quality of the model and the real business impact.
Metrics Framework: Technical + Business
| Category | Metric | Formula / Definition | Typical Grocery Target |
|---|---|---|---|
| Model Accuracy | MAPE | Mean Absolute Percentage Error | <15% for fresh products |
| RMSE | Root Mean Square Error (in units) | Depends on SKU volume | |
| Bias | (Forecast - Actual) / Actual, mean | |Bias| <5% (no systematic over/under) | |
| WMAPE | MAPE weighted by sales volume | <10% (better than MAPE for long tail) | |
| Waste | Waste Reduction Rate (WRR) | (Waste before - Waste after) / Waste before | >30% within 12 months |
| Waste % on Purchases | Value wasted / Value purchased | <8% (sector benchmark) | |
| Tonnes Diverted (donations) | Tonnes donated / Total potential waste | >50% of surplus | |
| Availability | Out-of-Stock Rate (OOS) | % SKUs with stockout / total SKUs | <3% |
| Fill Rate | Units delivered / Units ordered | >97% | |
| Days of Cover (DOC) | Current stock / Average daily demand | Optimal per product category | |
| Dynamic Pricing | Markdown Efficiency (ME) | Markdown revenue / Potential waste cost | >65% |
| Products Sold via Markdown | % near-expiry products sold with discount | >80% of at-risk products | |
| Avg Discount Applied | Average discount on markdown products | 25-45% (optimal for elasticity) |
Monitoring Dashboard: Concept Drift Detection
Food forecasting models are particularly susceptible to concept drift: purchasing patterns change with the season, nutritional trends (e.g. rising demand for vegan products), price crises (the 2022-2024 inflation wave strongly altered consumer behavior in fresh products), and extraordinary events (pandemic). Continuous performance degradation monitoring is essential.
# Concept drift detection monitoring system for food forecasting
import numpy as np
import pandas as pd
from scipy import stats
from dataclasses import dataclass
from typing import List, Optional
import logging
logger = logging.getLogger(__name__)
@dataclass
class DriftAlert:
severity: str # 'WARNING' | 'CRITICAL'
metric: str
current_value: float
baseline_value: float
change_pct: float
affected_skus: List[str]
recommendation: str
class ForecastMonitor:
def __init__(self, baseline_window_days: int = 30,
monitoring_window_days: int = 7):
self.baseline_window = baseline_window_days
self.monitoring_window = monitoring_window_days
self.thresholds = {
'mape_warning': 0.20, # MAPE > 20% = warning
'mape_critical': 0.35, # MAPE > 35% = critical, requires retraining
'bias_warning': 0.10, # Systematic bias > 10%
'bias_critical': 0.20, # Systematic bias > 20%
'waste_rate_warning': 0.10, # Waste rate > 10%
}
def compute_current_metrics(self, predictions_df: pd.DataFrame,
actuals_df: pd.DataFrame) -> dict:
"""Computes metrics over the latest monitoring window"""
merged = predictions_df.merge(actuals_df, on=['date', 'sku_id', 'store_id'])
metrics_by_sku = merged.groupby('sku_id').apply(
lambda g: pd.Series({
'mape': np.mean(np.abs((g['actual'] - g['forecast']) / (g['actual'] + 1e-8))),
'bias': np.mean((g['forecast'] - g['actual']) / (g['actual'] + 1e-8)),
'rmse': np.sqrt(np.mean((g['actual'] - g['forecast']) ** 2)),
'n_observations': len(g)
})
)
return {
'mean_mape': metrics_by_sku['mape'].mean(),
'p90_mape': metrics_by_sku['mape'].quantile(0.9),
'mean_bias': metrics_by_sku['bias'].mean(),
'degraded_skus': metrics_by_sku[
metrics_by_sku['mape'] > self.thresholds['mape_warning']
].index.tolist(),
'critical_skus': metrics_by_sku[
metrics_by_sku['mape'] > self.thresholds['mape_critical']
].index.tolist(),
'per_sku_metrics': metrics_by_sku
}
def detect_drift(self, baseline_metrics: dict,
current_metrics: dict) -> List[DriftAlert]:
"""Compares current vs baseline metrics and generates alerts"""
alerts = []
# MAPE alert
mape_change = ((current_metrics['mean_mape'] - baseline_metrics['mean_mape'])
/ baseline_metrics['mean_mape'])
if current_metrics['mean_mape'] > self.thresholds['mape_critical']:
alerts.append(DriftAlert(
severity='CRITICAL',
metric='MAPE',
current_value=current_metrics['mean_mape'] * 100,
baseline_value=baseline_metrics['mean_mape'] * 100,
change_pct=mape_change * 100,
affected_skus=current_metrics['critical_skus'],
recommendation='IMMEDIATE RETRAINING required. Analyze concept drift.'
))
elif current_metrics['mean_mape'] > self.thresholds['mape_warning']:
alerts.append(DriftAlert(
severity='WARNING',
metric='MAPE',
current_value=current_metrics['mean_mape'] * 100,
baseline_value=baseline_metrics['mean_mape'] * 100,
change_pct=mape_change * 100,
affected_skus=current_metrics['degraded_skus'],
recommendation='Monitor trend. Schedule retraining next week.'
))
# Systematic bias alert
if abs(current_metrics['mean_bias']) > self.thresholds['bias_critical']:
direction = 'over-forecasting' if current_metrics['mean_bias'] > 0 else 'under-forecasting'
alerts.append(DriftAlert(
severity='CRITICAL',
metric='Bias',
current_value=current_metrics['mean_bias'] * 100,
baseline_value=baseline_metrics.get('mean_bias', 0) * 100,
change_pct=0,
affected_skus=[],
recommendation=f'Systematic {direction} bias: update model calibration.'
))
return alerts
def check_and_alert(self, predictions_df: pd.DataFrame,
actuals_df: pd.DataFrame,
baseline_metrics: dict) -> None:
"""Main entry point for the monitor"""
current = self.compute_current_metrics(predictions_df, actuals_df)
alerts = self.detect_drift(baseline_metrics, current)
for alert in alerts:
if alert.severity == 'CRITICAL':
logger.critical(
f'[DRIFT CRITICAL] {alert.metric}: {alert.current_value:.1f}% '
f'(baseline: {alert.baseline_value:.1f}%) - {alert.recommendation}'
)
# Trigger Slack/PagerDuty notification here
self._send_alert(alert)
else:
logger.warning(
f'[DRIFT WARNING] {alert.metric}: {alert.current_value:.1f}% - '
f'{len(alert.affected_skus)} SKUs degraded'
)
def _send_alert(self, alert: DriftAlert) -> None:
"""Sends notification to ML Ops team (implementation depends on system)"""
# Integration with: Slack, PagerDuty, email, Prometheus Alertmanager
pass
Best Practices and Anti-Patterns in Food Demand Forecasting
Anti-Patterns to Avoid
- Temporal data leakage: using standard cross-validation instead of walk-forward validation introduces future data into training. This is the most common methodological error and leads to optimistic MAPE values of 30-50% that prove false in production.
- Single model for all SKUs: products with intermittent demand (sales 2-3 times per week) require different models from high-volume products. Applying LSTM to a product with 200 sales per year leads to overfitting and MAPE above 60%.
- Ignoring business bias: retailers historically tend to over-order to avoid stockouts (the worst of the two perceived "evils"). If the model is trained on historical order data (not actual demand), it inherits this bias. Use actual sales data, not order data.
- Failing to model future promotions: a forecasting system that does not know about planned promotions for the following week produces systematically incorrect forecasts during promotional periods. Always integrate the promotional calendar as an input.
- Optimizing only MAPE while ignoring error direction: in the food context, an over-forecast error of 50 units (50 units wasted) is far more costly than an under-forecast error of 50 units (50 lost sales). Use asymmetric metrics (Weighted MAPE) or asymmetric loss functions during training.
Established Best Practices
- Stratify SKUs by treatment: classify each SKU by volume, variability, and shelf life. Apply different models to different clusters. A hierarchical approach (global + local) often outperforms both pure models.
- Always include an uncertainty estimate: a point forecast (single number) is less useful than a probabilistic forecast (distribution or intervals). TFT and Prophet provide quantiles natively, which are essential for calculating optimal safety stock.
- Automate retraining but monitor for anomalies: daily retraining is useful but can amplify errors if yesterday's data was anomalous (e.g. POS system failure). Always implement a sanity check before promoting a new model to production.
- Build trust with buyers: ML systems often fail due to organizational resistance, not technical quality. Show the model decomposition, explain forecasts in business language, and allow buyers to override with a feedback loop that improves the model over time.
- Continuously measure ROI: calculate weekly savings in waste avoided, recovery from markdowns, and tax benefits from the Gadda Law. Making the value created visible is essential to maintaining organizational commitment over time.
Conclusions: Forecasting as a Sustainability Tool
ML-driven demand forecasting in the food sector is one of the AI use cases with the best impact-to-complexity ratio available today. The problem is well-defined, the data exists (every retailer has years of POS history), the success metrics are clear (MAPE, Waste Rate, ROI), and the impact extends beyond business: reducing food waste is an economic, regulatory, and environmental imperative.
The recommended path for an Italian grocery retailer looking to start in 2025 is pragmatic: begin with Prophet or XGBoost on a subset of 50 high-volume, high-perishability SKUs, measure the MAPE delta and Waste Reduction Rate after 30 days, build the internal business case, and then progressively scale toward TFT for top SKUs and toward integration with markdown engines and ESL systems.
The ultimate goal is not to have the most accurate model, but a decision-making ecosystem that transforms forecasts into actions: optimal orders, adaptive pricing, automatic donations. In this ecosystem, Machine Learning is the enabling infrastructure, while waste reduction is the concrete business outcome that justifies every investment.
Continue in the FoodTech Series
The upcoming articles in the series explore technologies complementary to demand forecasting:
- Article 9 - Satellite API and Precision Agriculture: how NDVI and Sentinel-2 satellite data feed agricultural yield prediction models, closing the loop from production to distribution.
- Article 10 - Edge ML for Crop Monitoring: embedded inference on IoT field devices for real-time monitoring of crop health, with integration into the supply chain pipeline.
To explore the MLOps technologies that make production deployment of forecasting models possible, see also the series MLOps for Business: AI Models in Production with MLflow.







