Posted on March 10, 2016 @ 07:19:00 AM by Paul Meagher
This is my third blog related to the book Superforecasters: The Art and Science of Prediction (2015). In my last blog I discussed the importance of updating forecasts rather than just making a forecast and waiting until the forecasted outcome occurs or not. This naturally leads to the question of how we should evaluate our updated forecasts in light of agreements or discrepancies between the predicted and the expected outcomes. That is what this blog will attempt to do.
The forecasting example I have chosen to focus on is predicting what my book expenses will be for 2016. I came up with an exact estimate of $1920 but pointed out that assigning a probability to a point estimate is tricky and not very useful. Instead it is more useful to specify a prediction interval [$1920 + $60] and assign a probability to how likely it is that the forecasted outcome will fall within that interval (80% probability). Now we have a forecast that is sufficiently specified that we can begin to evaluate our forecasting ability.
We can evaluate our financial forecasting ability in terms of whether the probability we assign to an outcome accurately reflects the level of uncertainty we should have in that outcome. If you assign an outcome a high probability (100%) and it doesn't happen then you should be penalized more than if you assigned it a lower probability (60%). You are overconfident in our forecasting ability and when we score your forecast the math should reflect this. If you assign a high probability to an outcome and the outcome happens, then you shouldn't be penalized very much. The way our scoring system will work is that a higher score is bad and a score close to 0 is good. A high score measures the amount of penalty you incur for a poorly calibrated forecast. To feel the pain of a bad forecast we can multiplying the penalty score by $100 and the result would determine how much money you have to pay out for a bad forecast.
Before I get into the math for assessing how "calibrated" your estimates are, I should point out that this math does not address another aspect of our forecast that we can also evaluate in this case, namely, how good the "resolution" of our forecast is. Currently I am predicting that my 2016 book expenses will be $1920 + $60, however, as the end of 2016 approaches I might decide to increase the resolution of that forecast to $1920 + $30 (I might also change the midpoint) if it looks like I am still on track and that my forecast might be only off by the cost of 1 book (rather than 2). When we narrow the range of our financial forecasts and the outcome falls within the range then a scoring system should tell us that we have better resolving power in our forecasts.
The scoring system that I will propose will address calibration and resolution and has the virtue that it is very simple and can be applied using mental arithmetic. Some scoring systems can be so complicated that you need to sit down with a computer to use them. David V. Lindley has a nice discussion of Quadratic Scoring in his book Making Decisions (1991). The way Quadratic Scoring works is that you assign a probability to an outcome and if that outcome happens you score it using the equation (1p)^{2} where p is your forecast probability. If the predicted outcome does not happen, then you use the equation p^{2}. In both cases, a number less than 1 will result so Lindley advocates multiplying the value returned by 100.
So, if it turns out that my estimated book expenses for 2016 falls within the interval [$1920 + $60] and I estimated the probability to be 0.80 (80%) then to compute my penalty for not saying this outcome had a 100% probability, I use the equation (1p)^{2} = (1.8)^{.2} = .2^{2} = 0.04. Now if I multiply that by 100 I get a penalty score of 4. One way to interpret this is that I only have to pay out $4 dollars for my forecast because it was fairly good. Notice that if my probability was .9 (90%) my payout would be even less ($1), but if it was .6 (60%) it would be quite a bit bigger at $36. So not being confident when I should be results in a bigger penalty.
Conversely, if my estimated book expenses for 2016 didn't fall within the interval [$1960 + $60] and I estimated the probability to be 0.80 (80%) then to compute my penalty I use the second equation which is p^{2} = .8^{2} = .64. Now multiply this by 100 and I get a penalty score of $64 that I have to payout. If my probability estimate was lower, say .60 (60%), then my penalty would be .6^{2} = .36 x 100 = $36. So if I'm not so confident when I'm wrong that is better than being confident.
The quadratic scoring rule is summarized in this table:
Source: David Lindley, Making Decisions (1991), p. 24
I hope you will agree that the Quadratic Scoring Rule usefully reflects how penalties should be calculated when we compare our forecasted outcomes to actual outcomes. It measures how "calibrated" our probability assignments are to whether the events they predict actually happen. In cases where we are not predicting numerical outcomes this scoring system would be all we need to evaluate the goodness of our forecasts. Our prediction problem, however, is a numerical prediction problem so we also need to concern ourselves with how good the resolution of our forecast is.
Intuitively if our prediction interval is smaller and the actual outcome falls within this range then we consider this a better forecast than one that involves a prediction interval that is wider. My proposal is simply to measure the size of your range and add it to your quadratic score. So if my prediction interval is [$1920 + $60] with 80% confidence and I am correct then my overall score is 4 (see previous calculation) plus the range which is 120. Lets convert this all to dollars and our overall penalty is $4 + $120 = $124. If we narrow our prediction interval to $1920 + $30 then we get $4 + $60 = $64 as our penalty score.
In an ideal world we would make exact forecasts (+ 0 as our range) with complete confidence (100%) and the forecasted outcomes would happen exactly as predicted. In this universe our penalty scores would be 0. In the real world, however, our predictions often have calibration or resolution issues so most predictions involve a penalty score to some extent. It might help to think of this as a cost you have to pay to someone because your predictions are not as perfect as they could be.
With this scoring system you can check in on your forecasts at some midway point to see how you are doing. If you update your forecast what you are looking for is a reduced penalty score when you check up on your forecast again. How much your penalty score improves tells you if your updates are on the right track. Generally your penalty scores should go down if you update your forecasts on a regular basis like Superforecasters do. Superforecasters are quite interested in evaluating how their forecasts are progressing and using some simple math like this helps them figure out how well they are doing.
A book that is on my priority list to read is Simple Rules: How to Thrive In a Complex World (2015). They argue that it is often a mistake to use complex rules to solve complex problems (which forecasting problems often are). They document how simple rules are often effective substitutes and can be used more flexibly. It is possible to be more sophisticated in how we evaluate forecasts but this sophistication comes at a price  the inability to quickly and easily evaluate forecasts in the real world. We often don't need extra sophistication if our goal is to easily evaluate forecasts in order to get some useful feedback and produce better forecasts. I would challenge you to come up with a simpler method for evaluating financial forecasts that is as useful.
If you want to learn more about the motivations, applications and techniques for forecasting, I would recommend the open textbook Forecasting: Principles and Practice.
