Now here's the long answer:
We are 98.5% accurate at predicting cancellations! (Actually, not really) Approximately 1.5% of flights are cancelled, which means that if we always predicted that no flight is ever cancelled, we’d be right 98.5% of the time. Obviously, that’s not a good measure of accuracy.
The problem with probabilities... is that we’re never completely right or completely wrong on any prediction. If we predicted that there is a 50% chance of a delay, and the flight does end up being delayed, were we right or wrong? What about a 60/40 prediction?
It’s not just about the likelihood of delays but the magnitude as well If we said that there is a very high likelihood of a 1-hour delay and the flight was delayed 55 minutes, is that a right or wrong prediction? What about 45 minutes? 35 minutes? It gets subjective very quickly.
Rather than ask “Are the predictions right?” we like to ask “Are the probabilities reliable?”
We also ask “Do our scores make sense? Are they correlated with delays?”
In conclusion: Our probabilities are reliable, and our scores are meaningful and useful.
Beyond that, we would be happy to work with you on your specific use case to identify the best metrics based on geography, what decisions are made based on the data, and at what time horizon those decisions get made.