Will Booth
Will Booth
Home

Will Booth

Senior Product Manager

All articles
AI & Tech·January 2023·6 min

Predicting Ball-by-Ball Cricket Odds with AI

Instead of predicting who wins a match, I trained a TensorFlow neural network to predict what happens on every single delivery — then ran the game thousands of times.

AI Summary✦ Powered by Claude

Most sports models try to predict the winner of a game by looking at the "top-down" view — e.g., historical wins, home-field advantage, and overall team strength. But cricket is a game of very small margins. A ball moving three centimetres to the left is the difference between a wicket and a six.

To capture that volatility, I built a bottom-up simulation model powered by a TensorFlow neural network. Instead of predicting who wins, I trained the AI to predict what happens on every single delivery.

1,000,000+ Data Points

The foundation of this project is a database spanning 17 years (2003–2021) of T20 cricket history with approximately 1,000,000 individual balls from 5,000+ matches involving 3,000+ players. The data includes leagues from the IPL, BBL, CPL, T20 Blast, Mzansi Super League, PSL, and BPL, sourced from publicly available domains such as ESPNCricinfo.

Why Simulate the Ball, Not the Match?

Choosing a "bottom-up" approach over a traditional "top-down" model was a strategic decision based on the inherent nature of the sport:

  • Individual Battles: Cricket is defined by matchups. This methodology captures how a specific bowler's historical distribution clashes with a specific batter's tendencies.
  • Probabilistic Pivots: Matches turn on single moments. By simulating a game thousands of times, we can see how often those "centimetre-wide" margins flip the result.
  • Complex Querying: We can ask more than "Who wins?" We can ask "Who is likely to take the most wickets?" or "What is the probability of the Heat winning if Chris Lynn scores under 20?"
  • Granular Logic: It allows for conditional probabilities, such as a team's win percentage given they scored exactly 167 runs batting first.

The Brain: The Ball Prediction Model

At the heart of this project is a multi-class classification problem. I trained a feed-forward neural network using TensorFlow to output the probability of 9 specific outcomes for any given ball: 0, 1, 2, 3, 4, 5, 6, Wide, or Wicket.

The model doesn't just look at the players; it looks at the Match State. It takes inputs of innings, over number, current runs/wickets, and the historical performance distributions of both the batter and bowler. The architecture uses two dense layers (50 nodes each) with ReLU activation, Batch Normalisation, and Dropout to ensure the model generalises well to new data. The output is a Softmax layer that ensures the combined probability of all 9 outcomes equals exactly 100%.

The Engine: Running the Monte Carlo Simulation

Once the model can predict a single ball, we need an engine to "play" the game. Using Python, I built a MatchSimulator that follows these steps:

  • The Setup: Instantiate two teams and their historical stats.
  • The Toss: A simulated coin flip determines who bats first.
  • The Simulation Loop: The engine runs the Ball Prediction model and takes a random sample from the predicted probability distribution. If there is a 4% chance of a wicket and the random roll hits that 4%, the batter is out.
  • The Update: The engine updates the Match State (rotating strike, incrementing the over) and feeds it back into the model for the next delivery.

By running this thousands of times, we move from a single "random" game to a statistical distribution of likely outcomes.

Realism vs. Prediction

When I simulated 1,126 historical matches 500 times each, the results passed the "eye test." The distributions of total scores and wickets in the simulations closely mirrored real-world T20 data. The goal wasn't world-class prediction accuracy — T20 is arguably too volatile for that — but rather to see if we could model the granularity of the game.

The model excels at generating realistic simulations, even if the "winner" remains a coin flip. In fact, my analysis found that this model often outperformed professional bookmaker odds, which proved to be even less predictive than a model with no information at all.

The power of this AI approach isn't in telling you who will win, but in showing you all the different ways a match could unfold, one ball at a time.

Thanks for reading.
—Will