Forecasting Autoencoders

In my third year of the Data Science degree, four classmates and I ran a comparative study of autoencoder-based approaches for time-series forecasting. The setting was retail: clothing stock and logistics signals from Inditex (Zara group).

The question we cared about was practical: can representation learning (autoencoders) give us a forecasting edge, or do strong tabular/time-series baselines still dominate?

Data

We worked with 5+ million rows across 91 stores over 2 years, with two targets:

Ordinary clothes to ship to each store
New clothes to ship to each store

Because the dataset is large, we ran experiments on a subset of stores to keep training cycles reasonable.

One important constraint: many features came with limited business context, so we treated them as numerical signals and focused on robust modeling and evaluation rather than domain-specific feature interpretation.

Models evaluated

We tested two families of approaches.

Autoencoder variants

Hybrid autoencoders
Variational autoencoders (VAE)
Autoencoder embeddings + LightGBM

Baselines

LightGBM
SKForecast
LSTM

Experimental setup

We implemented a sliding-window strategy to respect temporal ordering (train on the past, validate on the future). This was essential to avoid optimistic results from leakage.

Findings

The headline outcome was clear: the strongest baseline was also the overall best performer.

Model type	Relative performance	Notes
LightGBM	Best	Strong accuracy with efficient training
LSTM	Good	Second best overall
AE + LightGBM	Decent	Better than pure AE variants
Hybrid autoencoder	Okay	Often regressed toward the mean
VAE	Mediocre	Struggled to capture patterns reliably

In many runs, the autoencoder-only approaches produced forecasts close to the target mean. That behavior is a common failure mode when the latent representation does not capture enough predictive structure (or when the training objective emphasizes reconstruction more than forecast utility).

Takeaways

For this dataset, a well-tuned LightGBM baseline was very hard to beat.
Autoencoders were interesting as feature extractors, but end-to-end AE forecasting needs more careful objective design to avoid mean-prediction collapse.
Compute budget matters: iteration speed strongly influenced how quickly we could improve models and debug failure modes.