In my third year of the Data Science degree, four classmates and I ran a comparative study of autoencoder-based approaches for time-series forecasting. The setting was retail: clothing stock and logistics signals from Inditex (Zara group).
The question we cared about was practical: can representation learning (autoencoders) give us a forecasting edge, or do strong tabular/time-series baselines still dominate?
Data
We worked with 5+ million rows across 91 stores over 2 years, with two targets:
- Ordinary clothes to ship to each store
- New clothes to ship to each store
Because the dataset is large, we ran experiments on a subset of stores to keep training cycles reasonable.
One important constraint: many features came with limited business context, so we treated them as numerical signals and focused on robust modeling and evaluation rather than domain-specific feature interpretation.
Models evaluated
We tested two families of approaches.
Autoencoder variants
- Hybrid autoencoders
- Variational autoencoders (VAE)
- Autoencoder embeddings + LightGBM
Baselines
- LightGBM
- SKForecast
- LSTM
Experimental setup
We implemented a sliding-window strategy to respect temporal ordering (train on the past, validate on the future). This was essential to avoid optimistic results from leakage.
Findings
The headline outcome was clear: the strongest baseline was also the overall best performer.
| Model type | Relative performance | Notes |
|---|---|---|
| LightGBM | Best | Strong accuracy with efficient training |
| LSTM | Good | Second best overall |
| AE + LightGBM | Decent | Better than pure AE variants |
| Hybrid autoencoder | Okay | Often regressed toward the mean |
| VAE | Mediocre | Struggled to capture patterns reliably |
In many runs, the autoencoder-only approaches produced forecasts close to the target mean. That behavior is a common failure mode when the latent representation does not capture enough predictive structure (or when the training objective emphasizes reconstruction more than forecast utility).
Takeaways
- For this dataset, a well-tuned LightGBM baseline was very hard to beat.
- Autoencoders were interesting as feature extractors, but end-to-end AE forecasting needs more careful objective design to avoid mean-prediction collapse.
- Compute budget matters: iteration speed strongly influenced how quickly we could improve models and debug failure modes.