ICAF: Predicting the Visual Focus of Attention in Multi-Person Discussion Videos

Chongyang Bai1 Srijan Kumar2 Jure Leskovec2 V.S. Subrahmanian1
1 Dartmouth College 2 Stanford University

ICAF is an algorithm to extract who-look-at-whom networks from multi-person videos. Given the close-up videos of a group of people talking with each other, ICAF is able to predict who-look-at-whom probabilities and speaking probabilities over time. ICAF uses collective classification to predict all people's focus of attention simutaneously. We also propose the light supervised ICAF, which trains the model by the predicted speaking segments of a single person as other people's visual focus of attention labels. This can be used to extract networks from videos without annotation. The networks can further be used to analyse social behaviors: dominance, trust, like, deception and so on.


The demo shows the overhead view (top left), the associated interaction graph (top right), and each person's close-up video (bottom). In each close-up video, the black box shows the probabilities of this person looking at others and his laptop, and the bottom shows the predicted focus of attention as well as his speaking probability. The dynamic graph shows their spatial locations as well as look-at-whom interactions, where the self-loop represents looking at laptop.


Here is the look-at-whom interaction networks as well as the speaking probabilities extracted using ICAF.


Predicting the Visual Focus of Attention in Multi-Person Discussion Videos. C. Bai, S. Kumar, J. Leskovec, M. Metzger, J.F. Nunamaker, V.S. Subrahmanian, International Joint Conference on Artificial Intelligence (IJCAI), 2019.

The following BibTeX citation can be used:

title={Predicting the Visual Focus of Attention in Multi-Person Discussion Videos},
author={Bai, Chongyang and Kumar, Srijan and Leskovec, Jure and Metzger, Miriam and Nunamaker, Jay and Subrahmanian, VS},
booktitle={IJCAI 2019},
organization={International Joint Conferences on Artificial Intelligence}