0:00
/
0:00

I try modelling (part 2)

We discover a leak

Something seemed to good to be true so I asked Opus to look for possible leakage in the 200 features and it fond that earnings_win_rate and upside_downside_ratio (the model’s #1 and #2 features) were contaminated.

This means that those features revealed or suggested what actually happened to that quarter, making it easy for the model to guess what happened… It was as if the exam answers were stapled to the question paper. Or at least, clues to the answer.

The calculate_earnings_move_stats method included the current quarter’s outcome when computing these stats, meaning the model was indirectly seeing the target variable. We fixed that by ensuring this feature did not include the current call.

After fixing this all models were retrained.

  1. The intraday models (2h, 3h, 4h) were the most inflated :( — they dropped 0.10-0.12. This makes sense: earnings_win_rate (a stock’s historical tendency to go up after earnings) was directly leaking the current quarter’s outcome, and shorter horizons correlated more tightly with it.

  2. EOD models barely changed (-0.016 to -0.018). The EOD target has more noise from full-day trading, so the leaked feature was less predictive of it.

  3. 7d was negligible (-0.005). Over a week, the leaked feature adds almost nothing.

  4. 28d actually improved (+0.010). Cleaner data helped the model generalize better.

  5. All models now converge to the 0.62-0.64 range. The “4h is dramatically better than EOD” finding was fake. In reality, the intraday and EOD models perform similarly.

  6. earnings_win_rate dropped from #1 feature to #9 in the 4h model (importance 0.037 → 0.008). It’s still useful, just not dominant.

The resulting correlations still look promising though.

An example analyst from my old approach…

Correlation Coefficient 0.095 (No significant correlation)

…versus one of the latest XGboost EOD models…

Correlation Coefficient 0.229 (Weak positive correlation)

Time to trade

Once the leak was fixed, I set the most promising model off trading. In the meantime I’m off on holiday.

Can AI do that?

OHMO AI on Instagram: "For the past year, one of the biggest co…
Busy working

Part 3 tomorrow : a trading disaster and marginal gains.

Discussion about this video

User's avatar

Ready for more?