Backtest Engine — ^GSPC
Generic Event Backtester · Grid Search · Walk-Forward · AI Relevance
What does this page show? —
The Backtest Engine tests seasonal trading strategies on historical data. Choose an event (FOMC, OPEX, holiday, calendar day…), define entry/exit offsets and examine the equity curve, win rate and significance. 4 tabs: Single backtest, grid optimisation (all parameter combinations), walk-forward analysis (out-of-sample) and event relevance ranking. Sidebar: Ticker, period, event type, offsets, stop-loss and indicator filters — all combinable.
Backtest Parameters
Grid-Search Parameters
Walk-Forward Configuration
Optimised on in-sample, tested on out-of-sample → realistic performance without overfitting.
Relevance Calculation
Which events are statistically significant? t-test + effect size (Cohen's d) + win rate → relevance score.
📖 How to Read the Relevance Score
The relevance score combines three dimensions into an overall value between 0 and 1:
- Significance (50%): 1 - p-value
- Win Rate (30%): Share of trades > 0
- Effect Size (20%): Cohen's d, capped at 1.0
t-statistic: Signal strength vs. null. |t| > 2 = robust effect. p-value: Probability of chance. p < 0.05 = significant. Cohen's d: Effect size. d > 0.5 = medium, d > 0.8 = large.
Methodology
Backtest Logic
Each event date defines a trade: entry = event_date − days_before trading days, exit = event_date + days_after trading days. Minimum gap between trades = 5 days (prevents overlap).
Metrics
- Total Return: cumulative equity from 100 (compound).
- Sharpe Ratio: Avg Return / Std × √(trades per year). Risk-free = 0.
- Calmar Ratio: Annualized Return / Max Drawdown.
- Max Drawdown: Maximum equity decline in %.
- Profit Factor: Σ Wins / |Σ Losses|.
Stop-Loss
Stop is checked close-based (no intraday low/high in DB). Fixed: stop at entry × (1 - stop%). Trailing: stop at high_watermark × (1 - stop%), watermark tracks highest close.
Walk-Forward
Expanding window: optimise on the past, test on the future. Each fold runs a grid search on in-sample years, then applies the best parameters to out-of-sample years. The aggregated OOS performance is the most honest estimate of live performance.
Event Relevance Score
One t-test per event (H0: mean = 0). Score = 50% significance (1−p) + 30% win rate + 20% effect size (Cohen's d, capped at 1.0).