FinFeedAPI Glossary - Forecast Evaluation

Forecast Evaluation

Forecast evaluation measures how well probabilistic forecasts match outcomes using calibration diagnostics and proper scoring rules like Brier score and log loss.

back to all terms

Forecast evaluation is the process of measuring how good a set of probabilistic forecasts is once outcomes are known. In prediction markets, it’s commonly used to test whether market-implied probabilities were trustworthy and informative over time.

Rather than asking only “was the market right?”, forecast evaluation asks:

Were probabilities accurate on average?
Were they calibrated (did 70% events happen ~70% of the time)?
Did the market become more informative as resolution approached?

Why It Matters

Forecast evaluation turns raw probability data into evidence about forecasting quality. Teams use it to:

compare forecasting performance across topics, venues, or time periods
detect overconfidence and systematic bias
validate whether probabilities are usable for decision-making, research, or risk management

What metrics are used to evaluate probabilistic forecasts?

Forecast evaluation typically combines:

Proper scoring rules (single-number accuracy measures)
- Brier score for binary outcomes: ( (p - y)^2 ) where (y\in{0,1})
- Log loss (cross-entropy): (-[y\ln(p) + (1-y)\ln(1-p)])
Calibration diagnostics (do stated probabilities match frequencies?)
- calibration / reliability plots
- bucketed observed-vs-predicted comparisons (e.g., 0.1-wide probability bins)

Scoring rules summarize “how good” the probabilities were; calibration tools show where they were strong or weak.

How do you evaluate prediction-market forecasts over time (horizon effects)?

Because probabilities evolve, it’s common to evaluate forecasts at consistent timestamps such as:

a fixed horizon (e.g., T-30d, T-7d, T-24h)
a standardized “close” snapshot (e.g., last price before resolution)

This helps separate early signal from late consensus and highlights whether a market converged smoothly or only moved at the end.

What are common pitfalls in forecast evaluation?

Outcome leakage: accidentally using prices after the outcome became known.
Survivorship bias: evaluating only high-volume markets or only cleanly resolved events.
Class imbalance: simple hit-rate can look good on rare events; scoring rules are usually more informative.
Timestamp mismatch: using a probability snapshot that doesn’t reflect what was knowable at that time.

Real-World Example

A research team evaluates 500 resolved binary markets. They compute Brier score and log loss at T-7d and at the final pre-resolution probability. The results show strong late accuracy but weaker early calibration—suggesting the market is most useful close to resolution and needs better early information aggregation.

Forecast Evaluation and FinFeedAPI

If you’re evaluating prediction-market forecasts programmatically, FinFeedAPI’s Prediction Market API can provide time-stamped probability histories and resolution outcomes—key inputs for computing scoring rules, building calibration curves, and comparing performance across market cohorts.

Related Terms

Get your free API key now and start building in seconds!

Get API Key Read Docs