Generative AI for Sports and Entertainment

Using large language models to support some of the world’s most prestigious sports and entertainment events

Overview

Narration is an essential part of sports games. However, for large-scale events such as the Wimbledon tennis tournament, with around 250 singles matches across 19 courts over 13 days, producing hundreds of hours of video footage, it is impractical for commentators to create narrations for every match in a timely manner. To address this challenge, we closely worked with IBM Consulting to create a novel system that produces automatic commentary for tennis matches using generative AI. Our system consists of the following stages:

We first extract play-by-play metadata using a computer vision module that understands every detail of the game: court and net detection, player and ball tracking, player pose, fine-grained shot classification (backhand, forehand, volley, …), and shot direction. This metadata is combined with additional information from other modalities, such as audio-based crowd cheering measurement, match data scoring, radar-measured ball speed, and more.
The rich metadata described above is then fed as input to a large language model, which is fine-tuned to produce commentary in natural language as output. The large language model is a 3B encoder-decoder model pre-trained at IBM with trillions of tokens. We fine-tuned the model for commentary generation using a novel layered LoRA architecture (see the publication from CVPR 2023 below.).

Our system was showcased to clients and fans around the world as part of the 2023 Wimbledon and US Open tennis tournaments.