Shizuoka University
Okabe Lab

Efficient Animal Behavior Analysis and Video Summarization via Gaze Target Estimation Models

Suguru Takahashi

Shizuoka University

Makoto Okabe

Shizuoka University

研究のTeaser画像

Abstract

I propose a method to improve the efficiency of video analysis using large-scale language models (LLM) by using animal (dog) gaze as an indicator. First, I developed a highly accurate gaze estimation model with DINOv2 as a backbone using a custom-built dog dataset of approximately 5,000 images. Next, I applied spatiotemporal clustering to the estimated gaze coordinates to identify and extract frames where the animal focused its attention, identifying semantic turning points. I demonstrated that this adaptive sampling enables advanced summarization that accurately preserves behavioral context while reducing costs.

Paper

Master's Thesis (2026)

Video

Material

Master's Thesis Presentation

Citation

  • Suguru Takahashi
    Efficient Animal Behavior Analysis and Video Summarization via Gaze Target Estimation Models
    Master Thesis, February 2026