Big data in soccer. What many viewers know from statistics on TV or in newspapers is also used by professional soccer clubs for match analysis. Steffen Lang, a doctoral student at the Chair of Performance Analysis and Sports Informatics, has developed a method that automatically determines the "in-game status", whether the ball is in play or not, in soccer matches. The results of the study were published in the journal Scientific Reports under the title "Predicting the in-game status in soccer with machine learning using spatiotemporal player tracking data". The journal has an impact factor of 5.0.
"Ball status is an important piece of information in soccer for structuring and analyzing a game," Lang explains. Until now, this information has been collected "by hand" by service providers in professional soccer. For amateur clubs and semi-professional clubs, however, this information is mostly not available due to the high manual effort involved. However, many of these clubs already have automatic tracking systems, for example based on GPS, which continuously collect the positions of players. The goal of the project was to use this position data to determine the in-game status.
"This is important groundwork. Based on the information derived by our model, further analyses can be performed automatically. For example, it can be used to automatically extract from the position data when corners or free kicks took place," explains PD Dr. Daniel Link, who supervises Steffen Lang's doctoral project. The project was carried out in collaboration with the Chair of Decentralized Information Systems and Data Management at the TUM School of Computation, Information and Technology.
The researchers used data from 102 Bundesliga games and four machine learning methods. Forty-five games served as a training set to train models and establish generalization for additional games. The best models were finally applied to the remaining 47 games and achieved 93 percent accuracy in their determination. In total, over eight billion data points were used for training - big data!
In the future, Lang expects even greater accuracy and various enhancements to the methodology. In the future, amateur clubs will also be able to survey ball positions, for example, by placing a chip in the ball. "But even then, you still don't know if the game is broken or not. But if I add this information to our training algorithm, we expect to become very, very accurate." The Research Associate also explains that based on the predictions, further automatic classifications can be made. For example, in the next step, game events such as passes, tackles and goal kicks can be calculated. "The more data we can generate by machine, the more amateur sports can catch up. I'm convinced that in the future, no one will have to determine data by hand."
Lang's doctoral thesis focuses on just those positional data, primarily in soccer. But he also explains that the methods used are fundamentally transferable to other sports, "because there is also position data in field hockey or handball, not only in competition but also in training. We are at the beginning of data-driven training science," says Lang.
To the publication in the journal "Scientific Reports"
To the homepage of the Chair of Performance Analysis and Sports Informatics
Contact:
PD Dr. Daniel Link
Chair of Performance Analysis and Sports Informatics
Georg-Brauchle Ring 60/62
80992 München
phone: 089 289
e-mail: daniel.link(at)tum.de
Steffen Lang
Chair of Performance Analysis and Sports Informatics
Georg-Brauchle Ring 60/62
80992 München
phone: 089 289 24503
e-mail: steffen.lang(at)tum.de
Text: Bastian Daneyko
Photos: Pixabay/private