Efficient Animal Behavior Analysis and Video Summarization via Gaze Target Estimation Models

Suguru Takahashi

Shizuoka University

Makoto Okabe

Shizuoka University

Abstract

I propose a method to improve the efficiency of video analysis using large-scale language models (LLM) by using animal (dog) gaze as an indicator. First, I developed a highly accurate gaze estimation model with DINOv2 as a backbone using a custom-built dog dataset of approximately 5,000 images. Next, I applied spatiotemporal clustering to the estimated gaze coordinates to identify and extract frames where the animal focused its attention, identifying semantic turning points. I demonstrated that this adaptive sampling enables advanced summarization that accurately preserves behavioral context while reducing costs.

Paper

Master's Thesis (2026)

Video

Material

Master's Thesis Presentation

Citation

Suguru Takahashi
Efficient Animal Behavior Analysis and Video Summarization via Gaze Target Estimation Models
Master Thesis, February 2026