In my third year of the Data Science degree, four classmates and I ran a comparative study of autoencoder-based approaches for time-series forecasting. The setting was retail: clothing stock and logistics signals from Inditex (Zara group).

The question we cared about was practical: can representation learning (autoencoders) give us a forecasting edge, or do strong tabular/time-series baselines still dominate?

Data

We worked with 5+ million rows across 91 stores over 2 years, with two targets:

  1. Ordinary clothes to ship to each store
  2. New clothes to ship to each store

Because the dataset is large, we ran experiments on a subset of stores to keep training cycles reasonable.

One important constraint: many features came with limited business context, so we treated them as numerical signals and focused on robust modeling and evaluation rather than domain-specific feature interpretation.

Models evaluated

We tested two families of approaches.

Autoencoder variants

  • Hybrid autoencoders
  • Variational autoencoders (VAE)
  • Autoencoder embeddings + LightGBM

Baselines

  • LightGBM
  • SKForecast
  • LSTM

Experimental setup

We implemented a sliding-window strategy to respect temporal ordering (train on the past, validate on the future). This was essential to avoid optimistic results from leakage.

Findings

The headline outcome was clear: the strongest baseline was also the overall best performer.

Model typeRelative performanceNotes
LightGBMBestStrong accuracy with efficient training
LSTMGoodSecond best overall
AE + LightGBMDecentBetter than pure AE variants
Hybrid autoencoderOkayOften regressed toward the mean
VAEMediocreStruggled to capture patterns reliably

In many runs, the autoencoder-only approaches produced forecasts close to the target mean. That behavior is a common failure mode when the latent representation does not capture enough predictive structure (or when the training objective emphasizes reconstruction more than forecast utility).

Takeaways

  • For this dataset, a well-tuned LightGBM baseline was very hard to beat.
  • Autoencoders were interesting as feature extractors, but end-to-end AE forecasting needs more careful objective design to avoid mean-prediction collapse.
  • Compute budget matters: iteration speed strongly influenced how quickly we could improve models and debug failure modes.