Spread the love


John Kenkel, Bees fans and a BIAS Coordinator in the USA, takes a closer look at a term we have all heard about, and several have sneered at, in the two weeks since Warburtongate hit the national newspapers… Namely, Mathematical Modelling and its implication for the Bees’ future success story.

As the smoke clears from the Bees manager situation and as the supporters come to grips with the loss of their beloved gaffer, many fans are left asking, “What Next?” The divorce, played out so publically in the press, seems to have not diminished the trust in the club’s ownership, but many questions related to the potential effectiveness of mathematical modelling exist. Fans are, to use a term I abhor, “cautiously optimistic.”

Sean Ingle wrote a fine piece for The Guardian outlining the success of the continental strategy, and modelling techniques at FC Midtylland. It gave Bees supporters hope that the model can work with solid, real world evidence. However, is it fair to assume that because the model worked in Denmark that it will work in England? Yes…and no.

What does it all mean?

Before analysing the issue in detail, it helps to really understand what the use of mathematical models actually is and what it means, to the extent that the majority of us don’t know stochastic modelling from quantitative system dynamics. At its most basic definition, mathematical modelling in sport is the use of various statistical analyses to identify and acquire assets (in this case players) that are undervalued and to sell assets that other teams overvalue at an inflated price.

The key to the entire success of this strategy relies on identifying key statistics that differentiate perception from reality. The more the models are provable on the pitch the better the results. I can hear the collective groan as this is read…”well duh.” But let’s dig a little deeper on that concept. When we speak of football statistics we always look at things like shots on goal, corners conceded, tackles won. These are all well and good tools, but are they good predictors of future performance, or are they a better representation of past performance? The real problem with these stats is that that are one dimensional and fairly poor at predictive analysis. Torres to Chelsea anyone?

When thinking about mathematical modelling, don’t think of it as counting shots, or time of possession, think about complexity (and I mean real complexity) in the order of hundreds of thousands of interrelated statistics that are constantly reviewed independently and in combination. You start to get to millions of permutations, e.g. the shot percentage on the right side of the pitch from 20-25 yards looking at multiple different player combinations and team formations after 20 minutes into the match, and then the model may spit out what the best formation is, who the best players are in that situation, who should take the shot, and roughly the percent of time that the shot would go in. More importantly, it can identify potential signings that outperform their peers under several scenarios, but don’t have stand out traditional statistics that everyone knows and loves. Again, I am way oversimplifying. However, at the end of the day if we can improve accuracy rates by just a few percentage points that could have a massive effect on table position and goal differentials.

Does this Moneyball thing work?

For empirical proof many point to the Oakland A’s in Major League Baseball, though some use it as an example of success and some use it as an example of failure. Breaking it down, Moneyball was introduce in the early 2000s on Oakland. Specifically in the years of 2001-2003. In the 2002 season the A’s finished in first place in their division, and had a manager that won over 60% of their games (a very good record in modern baseball). Oakland was so successful, in part, because they were excellent at identifying young talent. However, Oakland is a small market team, and about that time the young talent Oakland had developed were set for major salary increases in the near-future, which would have been unaffordable to the team. Rather than make poor financial decisions or sell some high salary players in order to afford others, the team decided to use modelling and analysis to identify undervalued players, while at the same time selling players with good “traditional” statistics, but where the models showed had relatively less impact on the win loss column, but would be in demand to other teams. This of course butted heads with their successful manager (though not as bad as the movie portrayed), who was classic baseball man. So the manager and head scout left and a new manager and scouts were brought in that brought into the system. Sound familiar?

Was it successful? For the 5-years following the shuffle the A’s finished first in their division three times, and second in their division twice. Furthermore in 2006, despite ranking 24th of 30 teams in player salaries they had the fifth best record that year and won their first playoff series in over 15 years while developing two Rookies of the Year (the award for the best first year player in baseball) in 2004 and 2005. Now the strategy came off the rails from 2007-2009, but the team with the same strategy has finished first in two of the last three years and made it to the second round of the playoffs both those years. In this case, it has been a very successful strategy.

I use the A’s analogy for background, as with everything, only results matter. However, this new way of thinking created an entirely new brand of sports science called Sabremetrics, which has fundamentally changed the way in which baseball players are evaluated at every level of the game.

Will Mathematical Modelling Work?

I think many of us will agree, math is hard. Computational statistical analysis and quantitative systems dynamics are not only hard, but they are mind-bogglingly complex. My firm uses system dynamics to model military equipment and government decision making on the battlefield. My colleagues have even worked with NFL (American Football) teams to look at models to support the player draft with great success. The data sets and tools are, in some cases, like looking at a foreign language using a different alphabet system. However, building the data set and models are not that most difficult part. It is the creativity of the people that use the data and look at thousands of permutations to find usable outputs that will have a material effect on the game. The true genius is in the people that can manipulate that data and build a useful tool. These people also need to know the intricacies of what happens on and off the pitch. They need to be students of the game as much as any manager or player, and maybe even more so. Matthew Benham didn’t become successful because he had a lot of data to sift through and look at the outputs that were the most useful. He and his colleagues used the data to run various manipulations and permutations that the untrained eye would never see or pick up on to gain a competitive advantage over those not using those tools.

There is a drawback though, people catch on after a while. Going back to the A’s, one of their initial massive successes was in valuing something called On-base Percentage (OBP), which is essentially how often a person gets on base through any set of circumstances. Long-standing baseball theory up until that time valued batting average (which only counts one way to get on base) and home runs much higher than OBP. So the A’s purged themselves of high priced sluggers and brought in cheaper players with lower batting averages, but got on base a lot more often. The success was immediate. The downside was after 2-3 years of this, other teams caught on and the A’s could no longer find value with that model. These models require constant refresh and new viewpoints for long-term success. In the case of the A’s they identified new value in much younger and more raw players. So instead of paying more for young, but more refined talent, they structured a lot of their draft strategy at looking at players coming out of high school (~18 year olds) as opposed to college players (~22 years old), and again they met with great success. You might again draw an analogy between that strategy and the Bees identifying Andre Gray from the Conference League.

How should supporters view the new strategy?

So the model can and does work! But, it can fail too if you don’t dedicate yourself to the innovation of the tools and undertake a constant review of what value is in the market. What supporters need to keep in mind is that the people that work through this data and come up with strategies and tactics are not a bunch of math geeks sitting in a cube printing out statistical spreadsheets for nuggets here and there. I think there is a propensity for those of us who don’t understand the concepts to write it off as a bunch of folks making decisions divorced from the game and the game’s histories, or in not understanding player dynamics and personalities. That couldn’t be further from the truth. While we may not see as many old-timers on the pitches of youth leagues and lower divisions using their gut and eye for talent as much as we used to, we will see people just as committed to the game and just as steeped in its history using much larger sets of data and less gut feel to identify and acquire talent. For better or for worse, the entire game is going in that direction at the professional level. We can conduct that train, follow the train, or continue to take the bus. My vote is to design the fastest train possible and ride it for as long as we can

We as a club can embrace this new tool, or rail against it, but at the end of the day our competition will get there with our without us. If we can take the lead and stay on the edge of analytical innovation, we can punch well above our weight over long-periods of time. That is not to say that we couldn’t win with a more traditional model, but that success may not be as long term and is often indelibly linked to one or two people that could leave at any time.

At the end of the day the stats say that we have a higher probability of success using these new fangled models. But like any free market, past success is no indication of future performance. We all know Benham is a gambler, but the mark of a good gambler is to only bet big when you have an advantage on the house. In this case, our advantage is innovation.

John Kenkel

He’s a wonderful YouTube clip that shows Mathematical Modelling at work