Using SQL to analyze 5 years of my own NBA Predictions Game (part 2)
If you skipped Part 1, please click here to start from the beginning. Also, find me on LinkedIn here.
So, from dealing with all those predictions in five years, I had a feeling that my friends would send more conservative guesses in the early stages of the Playoffs (where higher-seeded teams with better campaigns played against lower-seeded teams with worse campaigns). In the following rounds, as only the best teams remains, predictions would gradually be less conservative, as matchups gets more balanced.
And by a Conservative Prediction, I mean a prediction where the participant thought that the team with the best campaign in the regular season would win that series. So, in average, predictions in the 1st round would be much more conservative than in the grand Final. But were them, really?
The Predictions table has a column indicating the conservative guess, with ‘1’ for true and ‘0’ for false. With this, I could use easily use some CASEs to make conditional calculations and reach the conservative predictions for each round and year:
So my feeling was wrong, as there’s not really a pattern for the percentage to decrease in each round within a year. BUT, when looking at the last column, my feeling then turns right, as the percentage does decreases. So I could not get a conclusion from that.
To reach that output, together with the CASEs, I used AVG and ROUND:
—
In total, 76.8% of all predictions made by all participants through five seasons of game were conservative. That’s really something. But does being conservative pays off?
What is the percentage of the conservative predictions that were successful in the end of each series? Let’s find out:
Within these five years, it’s possible to note that, in the 1st Round, it’s always better to play safe; in the conference semi-finals, however, some underdogs came to stun Those Who Not Believe (remember Trae Young’s Hawks versus the Sixers in 2021).
To reach that output, I queried for all the conservative predictions and then averaged by year, grouping by round:
—
I could go on with lots of queries more, but, for now, let’s close with a last one: from all the NBA Playoffs’ series in those five years, I wanted to know in which of them the participants were most confident that one team would easily beat its opponent.
In a best-of-seven series, when Team A wins 4 games and Team B wins none, they say that Team A swept Team B. So I wanted to know if I could rank all the series based on how the average predictions were most close to a sweep.
Yes, I could, but that required a bit of extra thinking:
So the thing here was: first, to calculate the average predictions of the scores of the higher and the lower-seeded teams; second, to subtract one average from the other to get a difference or an index that I could use to make a rank; third, to use a window function that could rank those results by absolute values; and fourth, to also return the real results of the series, in order to compare if the average predictions were close to reality. And, well:
This was surely the biggest query I built by myself, outside of the guidance from online courses. And just in case you had read this article down to this point and are wondering what were supposed to be the most balanced series according to my friends’ predictions, here they are:
—
Well, there it is. If you asked me a few months ago, I would never had expected that I would amuse myself by typing codes in the search of interesting facts in statistics. For me, those few queries were like small samples of the ridiculous amount of possibilities that SQL commands gives to someone who knows how to explore its powers.
Thank you for reading! If you feel like it, do drop me a message on LinkedIn and let’s talk more about SQL, basketball, analytics and… international container freight, maybe?