Advancing sports analytics through AI research

Creating testing environments to help progress AI research out of the lab and into the real world is immensely challenging. Given AI’s long association with games, it is perhaps no surprise that sports presents an exciting opportunity, offering researchers a testbed in which an AI-enabled system can assist humans in making complex, real-time decisions in a multiagent environment with dozens of dynamic, interacting individuals.

The rapid growth of sports data collection means we are in the midst of a remarkably important era for sports analytics. The availability of sports data is increasing in both quantity and granularity, transitioning from the days of aggregate high-level statistics and sabermetrics to more refined data such as event stream information (e.g., annotated passes or shots), high-fidelity player positional information, and on-body sensors. However, the field of sports analytics has only recently started to harness machine learning and AI for both understanding and advising human decision-makers in sports. In our recent paper published in collaboration with Liverpool Football Club (LFC) in JAIR, we envision the future landscape of sports analytics using a combination of statistical learning, video understanding, and game theory. We illustrate football, in particular, is a useful microcosm for studying AI research, offering benefits in the longer-term to decision-makers in sports in the form of an automated video-assistant coach (AVAC) system (Figure 1(A)).

Figure 1: (A) example illustration of an envisioned automated video-assistant coach interface, where attacking and defending players are detected, identified (in terms of player names), tracked, and subsequently passed into a predictive trajectory model that can be used to analyse potential intents or prescribed trajectories. (B) stylised example of event detection, with a specific target event (e.g., kick) together with the deep learning model output (‘Signal’) evolving throughout the game.

Football - an interesting opportunity for AI

In comparison to some other sports, football has been rather late with starting to systematically collect large sets of data for scientific analytics purposes aiming to progress teams’ gameplay. This is for several reasons, with the most prominent being that there are far less controllable settings of the game compared to other sports (large outdoor pitch, dynamic game, etc.), and also the dominant credo to rely mainly on human specialists with track records and experience in professional football. On these lines, Arrigo Sacchi, a successful Italian football coach and manager who never played professional football in his career, responded to criticism over his lack of experience with his famous quote when becoming a coach at Milan in 1987: “I never realised that to be a jockey you had to be a horse first.”

Football Analytics poses challenges that are well suited for a wide variety of AI techniques, coming from the intersection of 3 fields: computer vision, statistical learning and game theory (visualised in Figure 2). While these fields are individually useful for football analytics, their benefits become especially tangible when combined: players need to take sequential decision-making in the presence of other players (cooperative and adversarial) and as such game theory, a theory of interactive decision making, becomes highly relevant. Moreover, tactical solutions to particular in-game situations can be learnt based on in-game and specific player representations, which makes statistical learning a highly relevant area. Finally, players can be tracked and game scenarios can be recognised automatically from widely-available image and video inputs.

Figure 2: illustrative overview of the three key fields (Game Theory, Statistical Learning, and Computer Vision) that have played an important role in advancing the state of football analytics (with examples from literature listed in each associated domain, and associated overlapping frontiers indicated).

The AVAC system we envision is situated within the microcosm that is formed by the intersection of these three research fields (Figure 2). In our research in this exciting domain, we not only lay out a roadmap for scientific and engineering problems that can be tackled for years to come, but we also present new original results at the crossroads of game theoretic analysis, statistical learning, and computer vision to illustrate what this exciting area has to offer to football.

How AI could help football

Game theory plays an important role in the study of sports, enabling theoretical grounding of players’ behavioral strategies. In the case of football, many of its scenarios can actually be modeled as zero-sum games, which have been studied extensively since the inception of game theory. For example, here we model the penalty kick situation as a two-player asymmetric game, where the kicker’s strategies may be neatly categorised as left, center, or right shots. To study this problem, we augment game-theoretic analysis in the penalty kick scenario with Player Vectors, which summarise the playing styles of individual football players. With such representations of individual players, we are able to group kickers with similar playing styles, and then conduct game-theoretic analysis on the group-level (Figure 3). Our results show that the identified shooting strategies of different groups are statistically distinct. For example, we find that one group prefers to shoot to the left corner of the goal mouth, while another tends to shoot to the left and right corners more evenly. Such insights may help goalkeepers diversify their defense strategies when playing against different types of players. Building on this game-theoretic view, one can consider the durative nature of football by analysing it in the form of temporally-extended games, use this to advise tactics to individual players, or even go further to optimise the overall team strategy.

Figure 3: (A) and (B) visualise clusters of Player Vectors, for players in an example database of over 12000 penalty kicks. Using such a characterisation of player behaviours, one can visualise associated heatmaps of goals by kickers in various clusters, as illustrated in (C).

On the side of statistical learning, representation learning has yet to be fully exploited in sports analytics, which would enable informative summarisation of the behavior of individual players and football teams. Moreover, we believe that the interaction between game theory and statistical learning would catalyse advances in sports analytics further. In the above penalty kick scenario, for instance, augmenting the analysis with player-specific statistics (Player Vectors) provided deeper insights into how various types of players behave or make decisions about their actions in the penalty kick scenario. As another example of this, one can study 'ghosting', which refers to a particular data driven analysis of how players should have acted in hindsight in sports analytics (which bears connections to the notion of regret in online learning and game theory). The ghosting model suggests alternative player trajectories for a given play, e.g., based on the league average or a selected team. Predicted trajectories are usually visualised as a translucent layer over the original play, hence the term 'ghosting' (see Figure 4 for a visual example). Generative trajectory prediction models allow us to gain insights by analysing key situations of a game and how they might have played out differently. These models also bear potential in predicting the implications of a tactical change, a key player's injury, or substitution on the own team's performance along with the opposition's response to such a change.

Figure 4: Example of predictive modelling using football tracking data. Here, the ground truth data for the ball, attackers, and defenders is visualised in addition to defender predictions made by a sequential-predictive trajectory model.

Finally, we consider computer vision to be one of the most promising avenues for advancing the boundaries of state of the art sports analytics research. By detecting events purely from video, a topic that has been well-studied in the computer vision community (e.g., see the following survey and our paper for additional references), the potential range of application is enormous. By associating events with particular frames, videos become searchable and ever more useful (e.g., automatic highlight generation becomes possible). Football video, in turn, offers an interesting application domain for computer vision. The large numbers of football videos satisfies a prerequisite for modern AI techniques. While each football video is different, the settings do not vary greatly, which makes the task ideal for sharpening AI algorithms. Third-party providers also exist to furnish hand-labelled event data that can be useful in training video models and are time consuming to generate, so both supervised and unsupervised algorithms can be used for football event detection. Figure 1(B), for example, provides a stylised visualisation of a deep learning model trained with supervised methods to recognise target events (e.g., kicks) purely from video.

The application of advanced AI techniques to football has the potential to revolutionise the game across many axes, for players, decision-makers, fans, and broadcasters. Such advances will also be important as they also bear potential to further democratise the sport itself (e.g., rather than relying on judgement calls from in-person scouts/experts, one may use techniques such as computer vision to quantify skillsets of players from under-represented regions, those from lower-level leagues, etc.). We believe that the development of increasingly advanced AI techniques afforded by the football microcosm might be applicable to broader domains. To this end, we are co-organising (with several external organisers) an IJCAI 2021 workshop on AI for Sports Analytics later this year, which we welcome interested researchers to attend. For researchers interested in this topic, publicly available datasets have been made available both by analytics companies such as StatsBomb (dataset link) and the wider research community (dataset link). Furthermore, the paper provides a comprehensive overview of research in this domain.

Paper and related links:

Work done as a collaboration with contributors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Yi Yang, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, and Demis Hassabis.