Last-Touch Determination in Soccer
Using Multimodal Inference
from Video and Location-Based Time-Series Data
Abstract
This study proposes a multimodal decision support method aimed at reducing the burden on referees in soccer ball-out determinations. It integrates enlarged images of the ball's surroundings from a single-view camera feed with positional time-series data from object tracking into a Large-Scale Visual Language Model (LVLM). By combining this with physical velocity and trajectory changes as evidence, it suppresses misperceptions (hallucinations) prone to occur with visual information alone, simultaneously achieving high-accuracy last touch determination and explanation generation.
Paper
Video
Material
Citation
-
Yuma Hirose
Last-Touch Determination in Soccer Using Multimodal Inference from Video and Location-Based Time-Series Data
Master Thesis, February 2026