
Instead of predicting who wins a match, I trained a TensorFlow neural network to predict what happens on every single delivery — then ran the game thousands of times.
Most sports models try to predict the winner of a game by looking at the "top-down" view — e.g., historical wins, home-field advantage, and overall team strength. But cricket is a game of very small margins. A ball moving three centimetres to the left is the difference between a wicket and a six.
To capture that volatility, I built a bottom-up simulation model powered by a TensorFlow neural network. Instead of predicting who wins, I trained the AI to predict what happens on every single delivery.
The foundation of this project is a database spanning 17 years (2003–2021) of T20 cricket history with approximately 1,000,000 individual balls from 5,000+ matches involving 3,000+ players. The data includes leagues from the IPL, BBL, CPL, T20 Blast, Mzansi Super League, PSL, and BPL, sourced from publicly available domains such as ESPNCricinfo.
Choosing a "bottom-up" approach over a traditional "top-down" model was a strategic decision based on the inherent nature of the sport:
At the heart of this project is a multi-class classification problem. I trained a feed-forward neural network using TensorFlow to output the probability of 9 specific outcomes for any given ball: 0, 1, 2, 3, 4, 5, 6, Wide, or Wicket.
The model doesn't just look at the players; it looks at the Match State. It takes inputs of innings, over number, current runs/wickets, and the historical performance distributions of both the batter and bowler. The architecture uses two dense layers (50 nodes each) with ReLU activation, Batch Normalisation, and Dropout to ensure the model generalises well to new data. The output is a Softmax layer that ensures the combined probability of all 9 outcomes equals exactly 100%.
Once the model can predict a single ball, we need an engine to "play" the game. Using Python, I built a MatchSimulator that follows these steps:
By running this thousands of times, we move from a single "random" game to a statistical distribution of likely outcomes.
When I simulated 1,126 historical matches 500 times each, the results passed the "eye test." The distributions of total scores and wickets in the simulations closely mirrored real-world T20 data. The goal wasn't world-class prediction accuracy — T20 is arguably too volatile for that — but rather to see if we could model the granularity of the game.
The model excels at generating realistic simulations, even if the "winner" remains a coin flip. In fact, my analysis found that this model often outperformed professional bookmaker odds, which proved to be even less predictive than a model with no information at all.
The power of this AI approach isn't in telling you who will win, but in showing you all the different ways a match could unfold, one ball at a time.
Thanks for reading.
—Will
Related Reading
AI & Tech
When the mechanical act of writing code is no longer the bottleneck, the decades-old ratios that defined software development stop making sense. We are moving from a world of code scarcity to one of architectural abundance.
Read →AI & Tech
A tiny fraction of players — called Whales — generate the majority of revenue in free-to-play games. Understanding the psychology behind Gacha mechanics reveals a system engineered for addiction.
Read →