During the final stretch of the Zrive Applied Data Science program, I worked with two classmates to build a stock market asset ranking system for ETS Asset Management Factory.
The goal sounded simple on paper: produce a ranking of stocks by expected performance and demonstrate that the strategy can beat the S&P 500 in a realistic evaluation.
Problem framing
We framed the task as: given historical prices and financial variables, can we learn a signal that helps prioritize which assets are more likely to outperform the market?
In practice, the hard part is not training a model - it is setting up the experiment so you do not leak future information and the backtest resembles how the system would behave in production.
Approach
ETS provided a dataset covering historical prices and financial features. We built a pipeline with three focus areas:
1) Data and features
- Clean missing values and standardize inputs.
- Engineer technical indicators from raw price series.
2) Learning objective
We treated this as a classification problem: will this stock outperform the market over the next horizon? That objective is directly aligned with producing a ranking.
We tried multiple algorithms, and LightGBM ended up being the best fit for this type of structured tabular data.
3) Time-aware training and evaluation
We used a rolling (sliding window) training strategy to respect the time-series nature of the data. Each evaluation window only used information available up to that point, so the model never “peeks” into the future.
Results
On the held-out two-year test period, our strategy outperformed the benchmark.
| Metric | Our model | S&P 500 | Delta |
|---|---|---|---|
| CAGR (2-year test) | 15% | 10% | +5pp |
| Risk-adjusted returns | Higher | Baseline | - |
15% CAGR over the two-year testing period, vs. 10% for the S&P 500.
What I took away
- The modeling choice mattered less than the evaluation design; time-aware validation was the difference between a demo and a trustworthy result.
- For this kind of signal, LightGBM delivered a strong accuracy/compute tradeoff and made iteration fast.
Note: this is a student project and not investment advice.